參考資料
Google I/O 2009 - Transactions Across Datacenters
youtube video
slide
Two Phase Commit Protocol (2PC)
http://www.cs.fsu.edu/~xyuan/cop5611/lecture15.html
http://www.cs.iastate.edu/~cs554/NOTES/Ch8-5.pdf
Three Phase Commit Protocol (3PC)
http://courses.cs.vt.edu/~cs5204/fall00/distributedDBMS/sreenu/3pc.html
兩階段提交,2PC
Assumptions
The protocol works in the following manner: One node is designated the coordinator, which is the master site, and the rest of the nodes in the network are called cohorts. Other assumptions of the protocol include stable storage at each site and use of a write ahead log by each node. Also, the protocol assumes that no node crashes forever, and eventually any two nodes can communicate with each other. The latter is not a big deal since network communication can typically be rerouted. The former is a much stronger assumption; suppose the machine blows up!
有限狀態自動機(上半部分是coordinator,下半部分是participants)

Actions by coordinator:
write START_2PC to local log;
multicast VOTE_REQUEST to all participants;
while not all votes have been collected {
wait for any incoming vote;
if timeout {
write GLOBAL_ABORT to local log;
multicast GLOBAL_ABORT to all participants;
exit;
}
record vote;
}
if all participants sent VOTE_COMMIT and coordinator votes COMMIT {
write GLOBAL_COMMIT to local log;
multicast GLOBAL_COMMIT to all participants;
} else {
write GLOBAL_ABORT to local log;
multicast GLOBAL_ABORT to all participants;
}
When the coordinator crashes in state S, and then recovers to S:
- S=WAIT: retransmit VOTE_REQUEST
- S=ABORT: retransmit GLOBAL_ABORT
- S=COMMIT: retransmit GLOBAL_COMMIT
Actions by participant:
write INIT to local log;
wait for VOTE_REQUEST from coordinator;
if timeout {
write VOTE_ABORT to local log;
exit;
}
if participant votes COMMIT {
write VOTE_COMMIT to local log;
send VOTE_COMMIT to coordinator;
wait for DECISION from coordinator;
if timeout {
multicast DECISION_REQUEST to other participants;
wait until DECISION is received; /* remain blocked */
write DECISION to local log;
}
if DECISION == GLOBAL_COMMIT {
write GLOBAL_COMMIT to local log;
} else if DECISION == GLOBAL_ABORT {
write GLOBAL_ABORT to local log;
}
} else {
write VOTE_ABORT to local log;
send VOTE_ABORT to coordinator;
}
Actions for handling decision requests: (excuted by seperate thread)
while true {
wait until any incoming DECISION_REQUEST is received; /* remain blocked */
read most recently recorded STATE from the local log;
if STATE == GLOBAL_COMMIT {
send GLOBAL_COMMIT to requesting participant;
} else if STATE == INIT or STATE == GLOBAL_ABORT {
send GLOBAL_ABORT to requesting participant;
} else {
skip; /* participant remains blocked */
}
}
When a participant crashes in state S, and then recovers to S:
- S=INIT: abort and inform coordinator
- S=READY: contact other participants
- S=ABORT: enter into ABORT state
- S=COMMIT: enter into COMMIT state
三階段提交,3PC
為了消除兩階段提交中額外處理DECISION_REQUEST線程,同時避免向其他participants詢問當前狀態,引入三階段提交協議
Assumptions
each site uses the write-ahead-log protocol
atmost one site can fail during the execution of the transaction
有限狀態自動機(左邊是participants,右邊是coordinator)
