paxos | simple CASPaxos implementation written in rust on top | Architecture library
kandi X-RAY | paxos Summary
kandi X-RAY | paxos Summary
Currently, this is an implementation of CASPaxos built on top of the sled lightweight database kit. It is being grown into a more featureful replication library that is mindful of modern consensus research.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of paxos
paxos Key Features
paxos Examples and Code Snippets
Community Discussions
Trending Discussions on paxos
QUESTION
I am using community edition aerospike docker image. Our k8s cluster does not allow running containers as root. hence i started following this doc for running as non root.
But when run the image i am getting below error
...ANSWER
Answered 2021-Dec-04 at 03:53I think this is not Kubernetes related but just aerospike You are using uid/gid
QUESTION
In order to import the Java project in following Github repository : (https://github.com/drewhannay/paxos#setting-up-the-project-in-eclipse) and follow the steps as explained in the repository as follows:
Setting up the project in Eclipse:
- Click "New" in Eclipse
- Select "Java Project"
- Project Name: "Paxos"
- Uncheck "Use Default Location" and enter the path to this directory
- Click "Finish". Eclipse will set up the project correctly using the existing source code.
- Right click the project and click "Configure Build Path"
- Click "Add JARs..."
- Select the guava-13.0.1.jar file from the lib directory within the project
- You're done!
I have downloaded the .ZIP
file of the project and extracted it in Eclipse workspace.
I do know what does it mean by saying "enter the path to this directory" . Which directory exactly?
...ANSWER
Answered 2021-Dec-03 at 09:57By "this directory" it is probably referring to the location of the cloned repository.
The cloned repository is the local copy of the repository that you have on your machine.
The simplest method is to press the "Code ▼ > Download zip" button, and extract the zip file. This method will not necessarily work for all projects.
You can alternatively install Git, and run git clone https://github.com/drewhannay/paxos.git
in the terminal. This method is better if you actually plan on modifying the project, and you will thank yourself for it later.
An easier method, that still uses Git (suggested by howlger), is to drop the link into Eclipse, and it will guide you from there. In my experience, this doesn't always work with all types of project, but it may have been fixed in newer versions.
QUESTION
The Derecho system (open-source C++ library for data replication, distributed coordination, Paxos -- ultra-fast) is built around asynchronous RDMA networking primitives. Senders can write to receivers without pausing, using RDMA transfers into receiver memory. Typically this is done in two steps: we transfer the data bytes in one operation, then notify the receiver by incrementing a counter or setting a flag: "message 67 is ready for you, now". Soon the receiver will notice that message 67 is ready, at which point it will access the bytes of that message.
Intended semantic: "seeing the counter updated should imply that the receiver's C++ code will see the bytes of the message." In PL terms, we need a memory fence between the update of the guard and the bytes of the message. The individual cache-lines must also be sequentially consistent: my guard will go through values like 67, 68, .... and I don't want any form of mashed up value or non-monotonic sequencing, such as could arise if C++ reads a stale cache line, or mistakenly holds a stale value in memory. Same for the message buffer itself: these bytes might overwrite old bytes and I don't want to see some kind of mashup.
This is the crux of my question: I need a weak atomic that will impose [exactly] the needed barrier, without introducing unwanted overheads. Which annotation would be appropriate? Would the weak atomic annotation be the same for the "message" as for the counter (the "guard")?
Secondary question: If I declare my buffer with the proper weak atomic, do I also need to say that it is "volatile", or will C++ realize this because the memory was declared weakly atomic?
...ANSWER
Answered 2021-Sep-11 at 15:51An atomic counter, whatever its type, will not guarantee anything about memory not controlled by the CPU. Before the RDMA transfer starts, you need to ensure the CPU's caches for the RDMA region are flushed and invalidated, and then of course not read from or write to that region while the RDMA transfer is ongoing. When the RDMA device signals the transfer is done, then you can update the counter.
The thread that is waiting for the counter to be incremented should not reorder any loads or stores done after reading the counter, so the correct memory order is std::memory_order_acquire
. So basically, you want Release-Acquire ordering, although there is nothing to "release" in the thread that updates the counter.
You don't need to make the buffers volatile
; in general you should not rely on volatile
for atomicity.
QUESTION
I have read the paper Paxos made simple.
And after hard thinking, I come to this conclusion:
The Paxos protocol always guarantees the majority of servers accept the same value at the same turn of the proposal and so that finally, we can derive the consensus from the majority of servers.
For example, table 1 shows that there are three servers(S1, S2, S3), and the N1, N2... denotes the proposal, there are many gaps in the proposals but for every proposal, we can derive the consensus by the majority servers.
...ANSWER
Answered 2021-Feb-22 at 23:22From what you have said I think you are seeing whether or not Consensus on a sequence of values is obtained from looking at the values accepted in increasing proposals. Follow this, the log is obtained from the proposal numbers ((N1, A), (N2, B), (N3, C), (N4, B)). I think this is the case because you refer to gaps in the sequence of proposals.
If this is the case, then I think there might be some confusion.
In Paxos, there are two things to worry about:
Proposals (or ballots, attempts to choose a value)
Log values (values agreed on by the acceptors and then executed by a state machine)
For the log, there must be fault-tolerant agreement on each value. A value can be executed only if all previous values have also been executed, therefore there must be no gaps in the log if you wish to execute all values. This is a liveness issue though and has nothing to do with safety. You can safely learn of all the values in the log and see them as "committed". If it were a machine executing commands issued by clients, you could tell a client their request for a command will occur but you couldn't tell them the result of executing their command as other commands that you do not know of will be executed before it.
The log uses the basic Paxos protocol to provide the fault-tolerant agreement on a single value. This is where the idea of committing a value is useful. The basic Paxos protocol outlined in Paxos Made Simple allows for agreement on only a single value and it uses a totally ordered sequence of proposals (ballots)to agree on that single value.
The main thing here is that there are can be multiple proposals in deciding on one value to be at a single index in the log.
Following that, there might be gaps in the sequence of proposals promised or accepted by acceptors in Paxos but this is okay and part of the algorithm's operation. The Paxos algorithm ensures that if a value is agreed upon in one proposal, all higher-numbered proposals will see the same value proposed and therefore agreed upon. Please note this is different from the example in Tables 1 and 2 of the question. In that example, if a majority of servers (acceptors) had accepted A in proposal N1, then all later proposals would also accept value N1.
This ability to agree on multiple proposals but with the same is necessary for fault tolerance, in particular for liveness. Imagine if a proposal Nl was accepted by a majority of acceptors, and then all but one acceptor crashed before any notification of acceptances of Nl could be made. A value has been agreed upon but no one knows that. Now, to recover from this, another proposer should issue a higher-numbered proposal. Let's call this proposal Nh.
Making a proposal has two phases: Promise and Accept. In the first phase, the proposer must obtain promises from a majority (or quorum) of acceptors on the proposal Nh. The purpose of this is to determine whether or not a value was (possibly) agreed upon in a lower-numbered proposal. In this case, we will learn that one acceptor had accepted a value in a lower-numbered proposal, Nl. (The other acceptors who responded will report they've accepted nothing.) Despite only one acceptor reporting they've accepted a value in Nl, the proposer must (for agreement-safety reasons) assume it was possibly chosen. In the second phase, then, the proposer will issue acceptance requests to the acceptors for the proposal Nh, and will include the value of the earlier proposal Nl. If at least a majority (quorum) of acceptors accept this, then the proposal is chosen and all things being well, learners will learn that value to the value committed for a single index in the log. Otherwise, the process of issuing proposals for that to decide on the value at the log index will continue again.
Finally, the last point to raise is there is also no requirement that a proposal has a majority of acceptance and this is important because there might not be a majority of acceptors non-faulty at the time a proposal is issued. To motivate this consider if a proper, or too many acceptors fail during a proposal.
Imagine if in an instance of the Paxos algorithm there are there acceptors, and one of them accepts a value in proposal Na. Then before Na can be accepted by a majority, the proposer of Na fails and any messages in transit over the network are lost. In this case, proposers need to be able to make new proposals and see a value chosen. In the scenario, another proposer will begin a higher numbered proposal Nb. It will go through the promise phase as before and determine no value could possibly be chosen after talking to the two acceptors who didn't accept anything. Now the proposer of Nb can propose any value and see it accepted by a majority and agreed upon. Proposing any value here is safe because the proposer of the Na, should it come back online, will not have their value accepted by a majority because a majority of acceptors had promised and accepted on a higher-numbered proposal.
Michael :)
QUESTION
So, I’m studying Paxos and the Professor made this question:
Assume that acceptors do not change their vote. In other words, if they vote for value v in round i, they will not send learn messages with value different from v in larger rounds. Show that Paxos does not work any more (it can reach a livelock).
I’ve reasoned about this for the entire day, but I’m not understanding how can the livelock arises and so my colleagues.
Does anyone have a clue?
...ANSWER
Answered 2021-Jan-14 at 21:25Assume there were network failures such that every acceptor accepted a different value. Without the ability to change their value in future rounds, no progress could ever be made and you have a "livelock".
QUESTION
Consider the following flow :
Proposer prepares a message, gets a promise, sends a proposal with some value, gets it accepted. All fine.
After acceptance, another proposer comes along and prepares a message with a higer id, and the same flow continues.
Is this a valid flow of a single round of the paxos algorithm? Or is this actually multi paxos?
...ANSWER
Answered 2020-Dec-11 at 20:01After acceptance if another proposer comes along and prepares a message, it will receive at least one reply containing the previously accepted value. The Paxos rules then require that the second proposer MUST propose the previously accepted value. The value it wants to propose is overridden by the first value. This ensures that only a single value can be chosen for a single instance of the Paxos algorithm.
QUESTION
I am reading Lamport's Paxos Mode Simple, and I get confused with the meaning "value" here.
For example, Lamport says:
If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v
I don't know what value v means here:
- Does it mean different value of a certain variable, such as variable x's value can be 1 or 42?
- Or is it something like one log entry in Raft, such as
x=1
ory=42
?
I think the first interpretion is right, and basic Paxos can't determine multiple values, it just Propose-Accept-Chosen, and the whole basic Paxos instance finishes its mission.
However, I am not for sure.
...ANSWER
Answered 2020-Dec-08 at 17:32Your second interpretation is correct ("It's like one log entry in Raft").
You are also correct that Basic Paxos can't choose multiple values, it just chooses one, like a single log entry in Raft. To choose a series of values you need to chain multiple Basic Paxos instances together, like in Multi-Paxos or Raft.
QUESTION
I have a cluster of cassandra (v3.11.4) with 14 nodes and I wanna to add a new node. The machine has 256GB memory and I set heap size (max and min) to 64GB . But I cannot add a new node due to memory error! What is the exact problem and What I need to do?
The last line of logs are as follows:
...ANSWER
Answered 2020-Jun-14 at 13:49The error message says that the JVM failed to commit memory, because mmap
syscall returned with the error code 12 (ENOMEM).
This typically happens when the process reaches one of the OS memory limits:
The number of memory mappings exceeded
vm.max_map_count
sysctl limit. This is a quite common problem with Cassandra, since Cassandra tends to mmap thousands files, and the default limit is rather low - around 65K.How to check:
wc -l /proc//maps
whereis Java process ID.
How to fix:
sudo sysctl vm.max_map_count=1000000
Total amount of the process' virtual memory exceeded
RLIMIT_AS
How to check:
ulimit -v
,cat /proc//status | grep Vm
How to fix:
ulimit -v unlimited
before starting Cassandra in the same shell.Overcommit is disabled, and the overcommit ratio is too low.
How to check:
sysctl vm.overcommit_memory
,sysctl vm.overcommit_ratio
,cat /proc/meminfo
How to fix:
sudo sysctl vm.overcommit_memory=0
QUESTION
I've quite understood what the Raft is and implemented it in MIT6.824 distributed system
. I also know what's the basic Paxos, I've not implemented this yet, so I can't grab all details of it. For Multi-Paoxs
, I'm even more confused, i.e., WHY it can eliminate lots of Prepare RPC? I know the answer should be Multi-Paxos can have a fixed leader along with noMoreAccepted
response from other peers to determine if reduce Prepare
RPC. But, I can't get it in detailed level, why and how it works
I want to get more some recommendations, articles, sample code or anything that can help for Multi-Paxos,
- I've read
Paxos made live
andPaxos made simple
, those two papers can give me a basic idea about what's Paxos and how it works - I've watched https://www.youtube.com/watch?v=YbZ3zDzDnrw&t=3035s&ab_channel=DiegoOngaro several times as well, it's a great talk, but it does not involve too many details
ANSWER
Answered 2020-Nov-14 at 07:12To answer your specific question:
WHY it can eliminate lots of Prepare RPC?
In the paper Paxos Made Simple page 10 it says:
A newly chosen leader executes phase 1 for infinitely many instances of the consensus algorithm—in the scenario above, for instances 135–137 and all instances greater than 139.
That is saying that if a leader broadcasts Prepare(135,n)
which is a prepare for instance 135 using ballot number n
then it is valid that this can be defined as applying to all instances >=135 that are not yet fixed. We can reason that it is safe for any node to be "spamming" out prepare messages for an infinite number of the unfixed positions in our log stream. This is because for each position each acceptor uses the rules of Paxos for that position. We can compress that infinite set of prepare messages down to a single one that applies to all higher unfixed positions. We then eliminate all but one prepare message for the term of a stable leader. So it is fantastic optimisation.
You asked about any example code. I wrote an implementation of multi-paxos using functional programming in Scala that aims to be true to the paper Paxos Made Simple over at https://github.com/trex-paxos/trex. The core state is PaxosData, the message protocol is at the bottom of PaxosProtcol and the algorithm is a set of message matching functions in PaxosAlgorithm. The algorithm takes the current immutable state and an immutable message as input and outputs the next immutable state for the node. Common behaviours are written as partial functions that have full unit tests. These partial functions are composed into complete functions used by leaders, followers and candidate leaders. There is a write up at this blog.
It adds additional messages to the basic set as optimisations speed up log replication. Those involve some implementation details that Lamport does not get into in his paper. An example is that negative acknowledgements are used to pass information between nodes to try to avoid interrupting a stable leader due to only one failed network link between a node and the leader. TRex tries to keep those features to a minimum to create a basic but complete solution.
An answer that you might find helpful about Multi-Paxos is this one that discusses why Multi-Paxos is called that https://stackoverflow.com/a/26619261/329496
There is also this one about how the original Part-Time Parliament paper uses a leader and also describes a stable leader running multi-Paxos https://stackoverflow.com/a/46012211/329496
Finally, you might enjoy my defence of Paxos post The Trial Of Paxos Algorithm.
QUESTION
The Spanner paper makes it clear that commit timestamps are chosen for read-write transactions by the coordinator leader. However, I'm not sure of where the timestamps are chosen for read-only transactions.
The documentation here says:
API Layer will pick the read timestamp by using the current TrueTime.
But where is that API layer situated? Does that refer to the location proxy that the paper says clients use to locate the relevant spanservers? The paper says it uses TT.now().latest but I can't tell where that gets invoked.
I had assumed that the timestamp could be chosen by any of the Paxos leaders involved in the multi-site read, but apparently not. Can someone please help clarify?
...ANSWER
Answered 2020-Sep-03 at 22:08But where is that API layer situated? Does that refer to the location proxy that the paper says clients use to locate the relevant spanservers?
That's correct, it's in the API proxy.
I had assumed that the timestamp could be chosen by any of the Paxos leaders involved in the multi-site read, but apparently not.
To negotiate a timestamp, strong read requests must contact the Paxos leader for each Spanner group involved in that read.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install paxos
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page