Paxos algorithm
Encyclopedia
|
| Tutorials | Encyclopedia | Dictionary | Directory |
|
Paxos algorithm
Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures.[1] Consensus protocols are the basis for the state machine approach to distributed computing, as suggested by Leslie Lamport[2] and surveyed by Fred Schneider[1]. The state machine approach is a technique for converting an algorithm into a fault-tolerant, distributed implementation. Ad-hoc techniques may leave important cases of failures unresolved. The principled approach proposed by Lamport et. al. ensures all cases are handled safely. The Paxos family of protocols includes a spectrum of tradeoffs between the number of processors, number of message delays before learning the agreed value, the activity level of individual participants, number of messages sent, and types of failures. The convergent property of the Paxos family is their safety from inconsistency. [3][4][5][6][7] Safety propertiesIn order to guarantee safety, Paxos defines three safety properties and ensures they are always held, regardless of the pattern of failures: Nontriviality
ConsistencyLiveness(C;L)
PreliminariesIn order to simplify the presentation of Paxos, the following assumptions and definitions are made explicit. Techniques to broaden the applicability are known in the literature, and are not covered in this article; please see references for further reading. Processors
Network
Number of processors
RolesPaxos describes the actions of the processes by their roles in the protocol; Client, Acceptor, Proposer, Learner, and Leader. In typical implementations, a single processor may play one or more roles at the same time. This does not affect the correctness of the protocol ? it is usual to coalesce roles to improve the latency and/or number of messages in the protocol. Client
Acceptor
Proposer
Learner
Leader
Quorums
Choice
Typical deploymentIn most deployments of Paxos, each participating process acts in three roles; Proposer, Acceptor and Learner [9]. This reduces the message complexity significantly, without sacrificing correctness:
By merging roles the protocol "collapses" into an efficient client-master-replica style deployment typical of the database community. The benefit of the Paxos family (including implementations with merged roles) is the guarantee of its Safety Properties. A typical implementation's message flow is covered in Typical Multi-Paxos Deployment. Basic PaxosThis protocol is the most basic of the Paxos family; it is not the protocol which is typically implemented in a deployment (see Multi-Paxos). Each instance of the Basic Paxos protocol decides on a single output value. The protocol proceeds over several rounds, a successful round has two phases: Phase 1a: Prepare
Phase 1b: Promise
Phase 2a: Accept!
Phase 2b: Accepted
Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number. Here is a graphic representation of the Basic Paxos protocol. Note that the values returned in the Promise message (Va, Vb, Vc) are typically null for the first round of each instance, they are shown below for completeness. Message flow: Basic Paxos(one instance, one successful round)
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(N)
| |<---------X--X--X | | Promise(N,{Va,Vb,Vc})
| X--------->|->|->| | | Accept!(N,Vn)
| |<---------X--X--X------>|->| Accepted(N,Vn)
|<---------------------------------X--X Response
| | | | | | |
Error cases in basic PaxosThe simplest error cases are the failure of a redundant Learner, or failure of an Acceptor when a Quorum of Acceptors remains live. In these cases, the protocol requires no recovery. No additional rounds or messages are required, as shown below: Message flow: Basic Paxos, failure of Acceptor(Quorum size = 2 Acceptors)
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(N)
| | | | ! | | !! FAIL !!
| |<---------X--X | | Promise(N,{Va,Vb,Vc})
| X--------->|->| | | Accept!(N,Vn)
| |<---------X--X--------->|->| Accepted(N,Vn)
|<---------------------------------X--X Response
| | | | | |
Message flow: Basic Paxos, failure of redundant Learner
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(N)
| |<---------X--X--X | | Promise(N,{Va,Vb,Vc})
| X--------->|->|->| | | Accept!(N,Vn)
| |<---------X--X--X------>|->| Accepted(N,Vn)
| | | | | | ! !! FAIL !!
|<---------------------------------X Response
| | | | | |
The next failure case is when a Proposer fails after proposing a value, but before agreement is reached. Ignoring Leader election, an example message flow is as follows: Message flow: Basic Paxos, failure of Proposer(re-election not shown, one instance, two rounds)
Client Leader Acceptor Learner
| | | | | | |
X----->| | | | | | Request
| X------------>|->|->| | | Prepare(N)
| |<------------X--X--X | | Promise(N,{Va,Vb,Vc})
| | | | | | |
| | | | | | | !! Leader fails during broadcast !!
| X------------>| | | | | Accept!(N,Vn)
| ! | | | | |
| | | | | | | !! NEW LEADER !!
| X--------->|->|->| | | Prepare(N+1)
| |<---------X--X--X | | Promise(N+1,{Vn})
| X--------->|->|->| | | Accept!(N+1,Vn)
| |<---------X--X--X------>|->| Accepted(N+1,Vn)
|<---------------------------------X--X Response
| | | | | | |
The most complex case is when multiple Proposers believe themselves to be Leaders. For instance the current leader may fail and later recover, but the other Proposers have already re-elected a new leader. The recovered leader has not learned this yet and attempts to begin a round in conflict with the current leader. Message flow: Basic Paxos, dueling Proposers(one instance, four unsuccessful rounds)
Client Proposer Acceptor Learner
| | | | | | |
X----->| | | | | | Request
| X------------>|->|->| | | Prepare(N)
| |<------------X--X--X | | Promise(N,{Va,Vb,Vc})
| ! | | | | | !! LEADER FAILS
| | | | | | | !! NEW LEADER (knows N)
| X--------->|->|->| | | Prepare(N+1)
| |<---------X--X--X | | Promise(N+1,{Va,Vb,Vc})
| | | | | | | | !! OLD LEADER recovers
| | | | | | | | !! OLD LEADER tries N+1, denied
| X------------>|->|->| | | Prepare(N+1)
| |<------------X--X--X | | Nak(N+1)
| | | | | | | | !! OLD LEADER tries N+2
| X------------>|->|->| | | Prepare(N+2)
| |<------------X--X--X | | Promise(N+2,{Va,Vb,Vc})
| | | | | | | | !! NEW LEADER proposes, denied
| | X--------->|->|->| | | Accept!(N+1,Vn)
| | |<---------X--X--X | | Nak(N+2)
| | | | | | | | !! NEW LEADER tries N+3
| | X--------->|->|->| | | Prepare(N+3)
| | |<---------X--X--X | | Promise(N+3,{Va,Vb,Vc})
| | | | | | | | !! OLD LEADER proposes, denied
| X------------>|->|->| | | Accept!(N+2,Vn)
| |<------------X--X--X | | Nak(N+3)
| | | | | | | | ... and so on ...
Multi-PaxosA typical deployment of Paxos requires a continuous stream of agreed values acting as commands to a distributed state machine. If each command is the result of a single instance of the Basic Paxos protocol, a significant amount of overhead would result. If the leader is relatively stable, phase 1 becomes unnecessary. Thus, it is possible to skip phase 1 for future instances of the protocol with the same leader. To achieve this, the instance number is included along with each value. Multi-Paxos reduces the failure-free message delay (proposal to learning) from 4 delays to 2 delays. Message flow: Multi-Paxos, start( first instance with new leader)
Client Proposer Acceptor Learner
| | | | | | | --- First Request ---
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare(N)
| |<---------X--X--X | | Promise(N,I,{Va,Vb,Vc})
| X--------->|->|->| | | Accept!(N,I,Vn)
| |<---------X--X--X------>|->| Accepted(N,I,Vn)
|<---------------------------------X--X Response
| | | | | | |
Message flow: Multi-Paxos, steady-state(subsequent instances with same leader) Client Proposer Acceptor Learner | | | | | | | --- Following Requests --- X-------->| | | | | | Request | X--------->|->|->| | | Accept!(N,I+1,W) | |<---------X--X--X------>|->| Accepted(N,I+1,W) |<---------------------------------X--X Response | | | | | | | Typical Multi-Paxos deploymentThe most common deployment of the Paxos family is Multi-Paxos [9], specialized for participating processors to each be Proposers, Acceptors and Learners. The message flow may be optimized as depicted here: Message flow: Collapsed Multi-Paxos, start(first instance with new leader)
Client Servers
| | | | --- First Request ---
X-------->| | | Request
| X->|->| Prepare(N)
| |<-X--X Promise(N,I,{Va,Vb,Vc})
| X->|->| Accept!(N,I,Vn)
| |<-X--X Accepted(N,I)
|<--------X | | Response
| | | |
Message flow: Collapsed Multi-Paxos, steady state(subsequent instances with same leader) Client Servers X-------->| | | Request | X->|->| Accept!(N,I+1,W) | |<-X--X Accepted(N) |<--------X | | Response | | | | OptimizationsA number of optimizations reduce message complexity and size. These optimizations are summarized below:
Cheap PaxosCheap Paxos extends Basic Paxos to tolerate F failures with F+1 main processors and F auxiliary processors by dynamically reconfiguring after each failure. This reduction in processor requirements comes at the expense of liveness; if too many main processors fail in a short time, the system must halt until the auxiliary processors can reconfigure the system. During stable periods, the auxiliary processors take no part in the protocol.
Message flow: Cheap Multi-Paxos3 main Acceptors, 1 Auxiliary Acceptor, Quorum size = 3, showing failure of one main processor and subsequent reconfiguration
{ Acceptors }
Proposer Main Aux Learner
| | | | | | -- Phase 2 --
X----------->|->|->| | | Accept!(N,I,V)
| | | ! | | --- FAIL! ---
|<-----------X--X--------------->| Accepted(N,I,V)
| | | | | -- Failure detected (only 2 accepted) --
X----------->|->|------->| | Accept!(N,I,V) (re-transmit, include Aux)
|<-----------X--X--------X------>| Accepted(N,I,V)
| | | | | -- Reconfigure : Quorum = 2 --
X----------->|->| | | Accept!(N,I+1,W) (Aux not participating)
|<-----------X--X--------------->| Accepted(N,I+1,W)
| | | | |
Fast PaxosFast Paxos generalizes Basic Paxos to reduce end-to-end message delays. In Basic Paxos, the message delay from client request to learning is 3 message delays. Fast Paxos allows 2 message delays, but requires the Client to send its request to multiple destinations. Intuitively, if the leader has no value to propose, then a client could send an Accept! message to the Acceptors directly. The Acceptors would respond as in Basic Paxos, sending Accepted messages to the leader and every Learner achieving two message delays from Client to Learner. If the leader detects a collision, it resolves the collision by sending Accept! messages for a new round which are Accepted as usual. This coordinated recovery technique requires four message delays from Client to Learner. The final optimization occurs when the leader specifies a recovery technique in advance, allowing the Acceptors to perform the collision recovery themselves. Thus, uncoordinated collision recovery can occur in three message delays (and only two message delays if all Learners are also Acceptors). Message flow: Fast Paxos, non-conflictingClient Leader Acceptor Learner | | | | | | | | | X--------->|->|->|->| | | Any(N,I,Recovery) | | | | | | | | X------------------->|->|->|->| | | Accept!(N,I,W) | |<---------X--X--X--X------>|->| Accepted(N,I,W) |<------------------------------------X--X Response(W) | | | | | | | | Message flow: Fast Paxos, conflicting proposalsConflicting proposals with uncoordinated recovery. Note: the protocol does not specify how to handle the dropped client request. Client Leader Acceptor Learner | | | | | | | | | | | X------->|->|->|->| | | Any(N,I,Recovery) | | | | | | | | | | | | | | | | | | !! Concurrent conflicting proposals | | | | | | | | | !! received in different order | | | | | | | | | !! by the Acceptors | X--------------?|-?|-?|-?| | | Accept!(N,I,V) X-----------------?|-?|-?|-?| | | Accept!(N,I,W) | | | | | | | | | | | | | | | | | | !! Acceptors disagree on value | | |<-------X--X->|->|----->|->| Accepted(N,I,V) | | |<-------|<-|<-X--X----->|->| Accepted(N,I,W) | | | | | | | | | | | | | | | | | | !! Detect collision & recover | | |<-------X--X--X--X----->|->| Accepted(N+1,I,W) |<---------------------------------X--X Response(W) | | | | | | | | | Message flow: Fast Paxos, collapsed roles(merged Acceptor/Learner roles) Client Servers | | | | | | | | X->|->|->| Any(N,I,Recovery) | | | | | | | | | | | | !! Concurrent conflicting proposals | | | | | | !! received in different order | | | | | | !! by the Servers | X--------?|-?|-?|-?| Accept!(N,I,V) X-----------?|-?|-?|-?| Accept!(N,I,W) | | | | | | | | | | | | !! Servers disagree on value | | X--X->|->| Accepted(N,I,V) | | |<-|<-X--X Accepted(N,I,W) | | | | | | | | | | | | !! Detect collision & recover |<-----------X--X--X--X Response(W) | | | | | | Generalized PaxosGeneralized consensus explores the relationship between the operations of a distributed state machine and the consensus protocol used to maintain consistency of that state machine. The main discovery involves optimizations of the consensus protocol when conflicting proposals could be applied to the state machine in any order. ie: The operations proposed by the conflicting proposals are commutative operations of the state machine. In such cases, the conflicting operations can both be accepted, avoiding the delays required for resolving conflicts and re-proposing the rejected operation. This concept is further generalized into ever-growing sets of commutative operations, some of which are known to be stable (and thus may be executed). The protocol tracks these sets of operations, ensuring that all proposed commutative operations of one set are stabilized before allowing any non-commuting operation to become stable. Example
Read(A) Write(A) Read(B) Write(B) Read(A) | | X | | | Write(A)| X | X | | | Read(B) | | | | X | Write(B)| | | X | X |
1:Read(A) 2:Read(B) 3:Write(B) 4:Read(B) 5:Read(A) 6:Write(A) 7:Read(A)
{ 1:Read(A), 2:Read(B), 5:Read(A) }
{ 3:Write(B), 6:Write(A) }
{ 4:Read(B), 7:Read(A) }
Message flow: Generalized Paxos (example)Responses not shown. Note: message abbreviations differ from previous message flows due to specifics of the protocol, see [6] for a full discussion.
{ Acceptors }
Client Leader Acceptor Learner
| | | | | | | | !! New Leader Begins Round
| | X----->|->|->| | | Prepare(N)
| | |<-----X--X--X | | Promise(N,null)
| | X----->|->|->| | | Phase2Start(N,null)
| | | | | | | |
| | | | | | | | !! Concurrent commuting proposals
| X--------?|-----?|-?|-?| | | Propose(ReadA)
X-----------?|-----?|-?|-?| | | Propose(ReadB)
| | X------X-------------->|->| Accepted(N,<ReadA,ReadB>)
| | |<--------X--X-------->|->| Accepted(N,<ReadB,ReadA>)
| | | | | | | |
| | | | | | | | !! No Conflict, both accepted
| | | | | | | | Stable = <ReadA, ReadB>
| | | | | | | |
| | | | | | | | !! Concurrent conflicting proposals
X-----------?|-----?|-?|-?| | | Propose(<WriteB,ReadA>)
| X--------?|-----?|-?|-?| | | Propose(ReadB)
| | | | | | | |
| | X------X-------------->|->| Accepted(N,<WriteB,ReadA> . <ReadB>)
| | |<--------X--X-------->|->| Accepted(N,<ReadB> . <WriteB,ReadA>)
| | | | | | | |
| | | | | | | | !! Conflict detected, leader chooses
| | | | | | | | commutative order:
| | | | | | | | V = <ReadA, WriteB, ReadB>
| | | | | | | |
| | X----->|->|->| | | Phase2Start(N+1,V)
| | |<-----X--X--X-------->|->| Accepted(N+1,V)
| | | | | | | | Stable = <ReadA, ReadB> .
| | | | | | | | <ReadA, WriteB, ReadB>
| | | | | | | |
| | | | | | | | !! More conflicting proposals
X-----------?|-----?|-?|-?| | | Propose(WriteA)
| X--------?|-----?|-?|-?| | | Propose(ReadA)
| | | | | | | |
| | X------X-------------->|->| Accepted(N+2,<WriteA> . <ReadA>)
| | |<--------X--X-------->|->| Accepted(N+2,<ReadA> . <WriteA>)
| | | | | | | |
| | | | | | | | !! Leader chooses order W
| | X----->|->|->| | | Phase2Start(N+2,W)
| | |<-----X--X--X-------->|->| Accepted(N+2,W)
| | | | | | | | Stable = <ReadA, ReadB> .
| | | | | | | | <ReadA, WriteB, ReadB> .
| | | | | | | | <WriteA, ReadA>
| | | | | | | |
Generalized Paxos vs. Fast Multi-Paxos
Byzantine PaxosPaxos may also be extended to support arbitrary failures of the participants, including lying, fabrication of messages, collusion with other participants, selective non-participation, etc. These types of failures are called Byzantine Failures, after the solution popularized by Lamport. [10] Byzantine Paxos [7][5] adds an extra message (Verify) which acts to distribute knowledge and verify the actions of the other processors: Message flow: Byzantine Multi-Paxos, steady stateClient Proposer Acceptor Learner | | | | | | | X-------->| | | | | | Request | X--------->|->|->| | | Accept!(N,I,V) | | X<>X<>X | | Verify(N,I,V) - BROADCAST | |<---------X--X--X------>|->| Accepted(N,V) |<---------------------------------X--X Response(V) | | | | | | | Fast Byzantine Paxos removes this extra delay, since the client sends commands directly to the Acceptors [5]. Note the Accepted message in Fast Byzantine Paxos is sent to all Acceptors and all Learners, while Fast Paxos sends Accepted messages only to Learners): Message flow: Fast Byzantine Multi-Paxos, steady stateClient Acceptor Learner | | | | | | X----->|->|->| | | Accept!(N,I,V) | X<>X<>X------>|->| Accepted(N,I,V) - BROADCAST |<-------------------X--X Response(V) | | | | | | The failure scenario is the same for both protocols; Each Learner waits to receive F+1 identical messages from different Acceptors. If this does not occur, the Acceptors themselves will also be aware of it (since they exchanged each other's messages in the broadcast round), and correct Acceptors will re-broadcast the agreed value: Message flow: Fast Byzantine Multi-Paxos, failure
Client Acceptor Learner
| | | ! | | !! One Acceptor is faulty
X----->|->|->! | | Accept!(N,I,V)
| X<>X<>X------>|->| Accepted(N,I,{V,W}) - BROADCAST
| | | ! | | !! Learners receive 2 different commands
| | | ! | | !! Correct Acceptors notice error and choose
| X<>X<>X------>|->| Accepted(N,I,V) - BROADCAST
|<-------------------X--X Response(V)
| | | ! | |
Production use of Paxos
ReferencesSee alsoExternal links
Source: Wikipedia | The above article is available under the GNU FDL. | Edit this article
|
|
top
©2008-2009 TutorGig.com. All Rights Reserved. Privacy Statement