kademlia | A golang implementation of the Kademlia DHT | Stream Processing library

 by   cfromknecht Go Version: Current License: MIT

kandi X-RAY | kademlia Summary

kandi X-RAY | kademlia Summary

kademlia is a Go library typically used in Data Processing, Stream Processing applications. kademlia has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

A golang implementation of the Kademlia DHT.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              kademlia has a low active ecosystem.
              It has 19 star(s) with 3 fork(s). There are 1 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 1 open issues and 0 have been closed. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of kademlia is current.

            kandi-Quality Quality

              kademlia has no bugs reported.

            kandi-Security Security

              kademlia has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              kademlia is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              kademlia releases are not available. You will need to build from source code and install.

            Top functions reviewed by kandi - BETA

            kandi has reviewed kademlia and discovered the below as its top functions. This is intended to give you an instant insight into kademlia implemented functionality, and help decide if they suit your requirements.
            • Start the kademlia
            • NewKademlia creates a new Kademlia structure
            • parseFlags parses the command line flags and returns the first contact .
            • HandleRPC handles RPC requests
            • NewRoutingTable creates a new routingTable
            • NewRandomNodeID returns a node id
            • dialContact returns rpc . Client
            • NewNodeID creates a NodeID from a hex string .
            • NewContact returns a new Contact .
            • NewKBucket returns a new empty bucket .
            Get all kandi verified functions for this library.

            kademlia Key Features

            No Key Features are available at this moment for kademlia.

            kademlia Examples and Code Snippets

            No Code Snippets are available at this moment for kademlia.

            Community Discussions

            QUESTION

            Find Node Operation in Kadelmia: why does it pick elements from bucket instead of looking through the entire routing table?
            Asked 2021-Mar-28 at 13:28

            The first step of the find node operation is as follows (as described in the paper):

            The lookup initiator starts by picking α nodes from its closest non-empty k-bucket (or, if that bucket has fewer than α entries, it just takes the α closest nodes it knows of).

            Why does it pick the elements directly from the bucket, as opposed to looking for k closest elements across all elements in all buckets? I believe the latter is what happens in step 2 of the algorithm, and can be seen in the visualization here.

            ...

            ANSWER

            Answered 2021-Mar-28 at 13:26

            I guess this is simply under the assumption that α ≤ k. Under that condition you will get the k closest nodes automatically from the closest bucket, or if the bucket contains fewer than α nodes the bracketed condition will apply

            (or, if that bucket has fewer than α entries, it just takes the α closest nodes it knows of)

            Also note that you're looking at the pre-proceedings version of the paper, which does not contain the full kademlia description. You can find the full paper here.

            Source https://stackoverflow.com/questions/66837036

            QUESTION

            What does it mean by Kademlia keys are used to identify nodes as well as data?
            Asked 2020-Jan-09 at 19:56

            Okay, I've been reading articles and the paper about Kademlia recently to implement a simple p2p program that uses kademlia dht algorithm. And those papers are saying, those 160-bit key in a Kademlia Node is used to identify both nodes (Node ID) and the data (which are stored in a form of tuple).

            I'm quite confused on that 'both' part.

            As far as my understanding goes, each node in a Kademlia binary tree uniquely represents a client(IP, port) who each holds a list of files.

            Here is the general flow on my understanding.

            1. Client (.exe) gets booted
            2. Creates a node component
            3. Newly created node joins the network (bootstrapping)
            4. Sends find_node(filehash) to k-closest nodes
              • Let's say hash is generated by hashing file binary named file1.txt
            5. Received nodes each finds the queried filehash in its different hash table
              • Say, a hash map that has a list of files(File Hash, file location)
            6. Step 4,5 repeated until the node is found (meanwhile all associated nodes are updating the buckets)

            Does this flow look all right?

            Additionally, bootstrapping method of Kademlia too confuses me. When the node gets created (user executes the program), It seems like it uses bootstrapping node to fill up the buckets. But then what's bootstrapping node? Is it another process that's always running? What if the bootstrapping node gets turned off?

            Can someone help me better understand the concept?

            Thanks for the help in advance.

            ...

            ANSWER

            Answered 2020-Jan-09 at 19:56

            Does this flow look all right?

            It seems roughly correct, but your wording is not very precise.

            Each node has a routing table by which it organizes the neighbors it knows about and another table in which it organizes the data it is asked to store by others. Nodes have a quasi-random ID that determines their position in the routing keyspace. The hashes of keys for stored data don't precisely match any particular node ID, so the data is stored on the nodes whose ID is closest to the hash, as determined by the distance metric. That's how node IDs and key hashes are used for both.

            When you perform a lookup for data (i.e. find_value) you ask the remote nodes for the k-closest neighbor set they have in their routing table, which will allow you to home in on the k-closest set for a particular target key. The same query also asks the remote node to return any data they have matching that target ID.

            When you perform a find_node on the other hand you're only asking them for the closest neighbors but not for data. This is primarily used for routing table maintenance where you're not looking for any data.

            Those are the abstract operations, if needed an actual implementation could separate the lookup from the data retrieval, i.e. first perform a find_node and then use the result set to perform one or more separate get operations that don't involve additional neighbor lookups (similar to the store operation).

            Since kademlia is UDP-based you can't really serve arbitrary files because those could easily exceed reasonable UDP packet sizes. So in practice kademlia usually just serves as a hash table for small binary values (e.g. contact information, public keys and such). Bulk operations are either performed by other protocols bootstrapped off those values or by additional operations beyond those mentioned in the kademlia paper.

            What the paper describes is only the basic functionality for a routing algorithm and most basic key value storage. It is a spherical cow in a vacuum. Actual implementations usually need additional features or work around security and reliability problems faced on the public internet.

            But then what's bootstrapping node? Is it another process that's always running? What if the bootstrapping node gets turned off?

            That's covered in this question (by example of the bittorrent DHT)

            Source https://stackoverflow.com/questions/59657964

            QUESTION

            Can two nodes exchange messages directly?
            Asked 2019-Aug-23 at 19:03

            I'm doing some research on Kademlia based decentralized networks. After bootstrapping a new node, instead of broadcasting messages to the nearest nodes, can a message be sent to a specific node identified by its ID? (Even if that means to relay the message to multiple peers before reaching the destination).

            ...

            ANSWER

            Answered 2019-Aug-23 at 19:03

            Kademlia is an abstract routing algorithm combined with a set of operations required to build a distributed hash table. The concept of broadcasts does not exist in kademlia-as-algorithm.

            But concrete implementations can add features on top of this foundation. Since kademlia provides the iterative find_node procedure (there's no forwarding!) you can locate a node and then exchange any number and type of additional messages for which they have mutual support.

            Source https://stackoverflow.com/questions/57624081

            QUESTION

            How to represent a kademlia routing table as data structure
            Asked 2019-Jul-14 at 20:17

            The kademlia paper talks about the the organization of buckets, splitting, merging and finding the correct bucket to insert in abstract, concise and confusing terms.

            §2.2 talks about a fixed set of 160 buckets with each bucket covering a fixed subset of the keyspace. But later chapters involve additional splitting and buckets covering different parts of the keyspace. That don't fit well into a fixed list

            What is the correct way to organize buckets?

            Meta: Since the confusion is reflected in many questions and partial information has been scattered over many answers this Q&A are intended to provide an easily linked clarification

            ...

            ANSWER

            Answered 2019-Feb-28 at 22:14

            The confusion stems from different versions of the paper.

            Flat layout

            This is from the pre-print version and mostly used to outline basic properties of kademlia in a theoretical manner and still reflected in §2.2 and §3 of the full version.

            Many real-world implementations implement this approach but they don't implement bucket splitting, merging or node multihoming.

            It involves putting contacts into the ith bucket that shares i prefix bits with the node. Which means the layout uses distances relative to the node's own ID.

            Tree-based layout

            This is described in section §2.4.

            To implement refinements such as handling highly unbalanced trees described towards the end of §2.4 or deeper non-local splitting described in §4.2 one needs to associate each bucket with the keyspace range it covers, this can be expressed similar to CIDR ranges, i.e. a start ID and the number of prefix bits shared to mask off the tail of the ID.

            Splitting a bucket is performed by increasing the number of prefix bits by one and setting the added bit to 0 and 1 respectively for two new buckets.

            Unlike the flat layout this structure does not involve distances relative to the node's own ID, although some decisions are based on whether the node's own ID would fall into a bucket.

            Since the number of buckets in such a routing table varies over time it has to represented in a resizable data structure, this is mentioned in §2.4. Since access can't be done by a fixed index anymore since the exact bucket that will cover any specific node ID is not known until the prefix-ranges have been examined some kind of O(log n) search is needed if one wants to avoid scanning the whole bucket list each time.
            Sorting the buckets by the lowest ID that the bucket would cover is a natural approach to achieve this. BTrees or sorted arrays combined with binary search can be used to achieve this.

            Regardless which approach you take, populating a response to find_node requests with the correct set of contacts that match the request's target is not trivial since any single bucket may be insufficient to fill it and thus multiple buckets need to be traversed. It may be simpler to scan the whole routing table for the best available candidates for the reply.

            Source https://stackoverflow.com/questions/51161731

            QUESTION

            Better understanding Kademlia's XOR Integer Metric
            Asked 2019-Jun-25 at 02:09

            I'm trying to better-grasp Kademlia's XOR distance metric so I've written a small dummy program to try and understand better. I'm also not using a 160-bit number as my key here, but rather a sha256 hash of some user identifier.

            Here's my xor distance function. Is this more or less correct? I'm XORing each byte– appending that to a buffer rawBytes and converting that byte buffer into an integer.

            ...

            ANSWER

            Answered 2019-Jun-25 at 02:09

            It's not correct because

            You have to use the math/big package for usage like that. Here is my revised version of your snippet:

            Source https://stackoverflow.com/questions/53166625

            QUESTION

            In Kademlia, why is it recommended to have 160-bit node IDs and keys and not 128-bit?
            Asked 2019-Jun-08 at 16:38

            The Kademlia paper states that nodes are assigned random 160-bit IDs as well as the keys. Is this a strict restriction? Can I still go ahead and use a 128-bit keyspace if that's good enough from me?

            ...

            ANSWER

            Answered 2019-Jun-08 at 16:38

            The length was chosen because SHA1, used as hash function for the hash table keys, outputs 160bits and that was the most widely used hash function at the time.

            The routing algorithm itself does not require that specific length to work, all it needs is the key space being large enough to avoid collisions in randomly chosen IDs. 128bit IDs would provide 64bits of collision space, which should be sufficient unless you intend to address grey goo.

            But in addition to the routing algorithm itself cryptographic concerns may also be relevant. Networks that use encryption benefit from node IDs doubling as the node's public key and commonly deployed ECC algorithms require public keys of at least 256 bits. Additionally resistance against (currently hypothetical) quantum attacks has inflated recommended hash function sizes well beyond 128 bits since they would cut collision resistance to N/3 down from N/2 for classical attacks.

            Source https://stackoverflow.com/questions/56431524

            QUESTION

            How Distributed Hash Table in IPFS and Bittorrent prevent abuse?
            Asked 2018-Nov-16 at 17:34

            My understanding is that IPFS and Bittorrent Mainline DHT are built on top of a Distributed hash Table (Kademlia). They use the file hash as Kademlia key to find a list of peer that might have this file.

            1- What I don't understand is if this is all decentralized who remove from the DHT peer that no longer host a file content?

            2- What prevent someone from storing large amount of data for free inside the DHT?

            3- What prevent someone from disrupting the network by adding large number of invalid peer for a popular file.

            4- What prevent a bad actor from joining the DHT ring and not following the routing protocol thus preventing discovery message from reaching correct nodes.

            ...

            ANSWER

            Answered 2018-Nov-16 at 17:34

            Not sure why this was downvoted. These are excellent questions.

            1- What I don't understand is if this is all decentralized who remove from the DHT peer that no longer host a file content?

            I think that DHT entries are regularly re-broadcast. So if a peer goes away, its DHT entries will no longer be broadcast and the network will forget about the data it provides unless some other node has it.

            2- What prevent someone from storing large amount of data for free inside the DHT?

            Unless you re-publish or somebody else is interested in the data, it will vanish. The amount of data that you can store directly in a DHT entry is limited. So you can make other nodes store some of your data by putting data directly into DHT entries, but the effort outweighs the benefits.

            3- What prevent someone from disrupting the network by adding large number of invalid peer for a popular file.

            I think there are some mechanisms envisioned in IPFS to protect the DHT against attacks. However, I don't think the current implementation is all that sophisticated. I don't think that current IPFS would deal well with a large scale distributed DDOS attack.

            4- What prevent a bad actor from joining the DHT ring and not following the routing protocol thus preventing discovery message from reaching correct nodes.

            I think a single node would be insufficient to do much damage, because a node will ask multiple peers. You would have to have multiple nodes to do significant damage.

            But IPFS as it is now would not survive a sophisticated attack by state actors.

            Source https://stackoverflow.com/questions/53267939

            QUESTION

            Kademlia XOR Distance as an Integer
            Asked 2018-Nov-05 at 19:26

            In the Kademlia paper it mentions using the XOR of the NodeID interpreted as an integer. Let's pretend my NodeID1 is aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d and my NodeID2 is ab4d8d2a5f480a137067da17100271cd176607a1. What's the appropriate way to interpret this as an integer for comparison of NodeID1 and NodeID2? Would I convert these into BigInt and XOR those two BigInts? I saw that in one implementation. Could I also just convert each NodeID into decimal and XOR those values?

            I found this question but I'm trying to better understand exactly how this works.

            Note: This isn't for implementation, I'm just trying to understand how the integer interpretation works.

            ...

            ANSWER

            Answered 2018-Nov-05 at 19:26

            For a basic kademlia implementation you only need 2 bit arithmetic operations on the IDs: xor and comparison. For both cases the ID conceptually is a 160bit unsigned integer with overflow, i.e. modulo 2^160 arithmetic. It can be decomposed into a 20bytes or 5×u32 array, assuming correct endianness conversion in the latter case. The most common endianness for network protocols is big-endian, so byte 0 will contain the most significant 8 bits out of 160.

            Then the xor or comparisons can be applied on a subunit by subunit basis. I.e. xor is just an xor for all the bytes, the comparison is a binary array comparison.

            Using bigint library functions are probably sufficient for implementation but not optimal because they have size and signedness overhead compared to implementing the necessary bit-twiddling on fixed-sized arrays.

            A more complete implementation may also need some additional arithmetic and utility functions.

            Could I also just convert each NodeID into decimal and XOR those values?

            Considering the size of the numbers decimal representation is not particularly useful. For the human reader heaxadecimal or the individual bits are more useful and computers operates on binary and practically never on decimal.

            Source https://stackoverflow.com/questions/53138176

            QUESTION

            Understanding Kademlia find_node and adding nodes to the routing table
            Asked 2018-Jun-26 at 18:50

            I'm reading through the Kademlia white paper and trying to implement the routing table piece.

            I'm using a 160bit address space and have an array of 160 k-buckets. From what I understand this implementation would store node ids in the buckets by how many leading zeros bits the node id has. I.e. bucket[0] would have node ids with 160 leading zeros (only 1 node) and bucket[159] would have nodes with no leading zeros (50% of the entire address space).

            Question Using this implementation, when finding the closest k-nodes to a target nodeId would I just count the leading zeros for the target and return everything in that k-bucket?

            Using this implementation I see no place/need to use the XOR that Kademlia is built off of so I don't think my implementation is correct.

            ...

            ANSWER

            Answered 2018-Jun-26 at 18:50

            First a headsup: the paper you are linking to is the pre-proceedings version only containing the basic sketch without later refinements. The 160-bucket array routing table layout is a simplified approached for the proof of the paper, later revisions introduce a more sophisticated tree-based table.

            I.e. bucket[0] would have node ids with 160 leading zeros (only 1 node) and bucket[159] would have nodes with no leading zeros (50% of the entire address space).

            Well, you can do it this way, but it's simpler to just count the leading zeros in the xor distance and use that as index. I.e. 0 shared prefix bits = no (0) leading zeroes = buckets[0] = bucket furthest from your own ID.

            Question Using this implementation, when finding the closest k-nodes to a target nodeId would I just count the leading zeros for the target and return everything in that k-bucket?

            The following is assuming that your're asking how to answer a remote node's queries.

            The buckets in the flat routing table layout are organized with respect to your own node ID. When answering queries for some arbitrary target ID then this is not necessarily aligned with the closeness towards that target. So the simplest approach is to just scan all populated buckets in your routing table and calculate the N closest nodes relative to the query's target address and then return those as a response. Avoiding a full scan would involve some arithmetic on the xor metric to find the correct local buckets, but I have only done that for the tree-based layout, not the flat layout.

            Source https://stackoverflow.com/questions/51010967

            QUESTION

            Use a DHT for a gossip protocol?
            Asked 2018-May-23 at 16:42

            I've been digging about DHTs and especially kademlia for some time now already. I'm trying to implement a p2p network working on a Kademlia DHT. I want to be able to gossip a message to the whole network. from my research for that gossip protocols are used, but it seems odd to add another completely new protocol to spread messages when I already use the dht to store peers. Is there a gossip protocol that works over or with a DHT topology like Kademlia ?

            ...

            ANSWER

            Answered 2018-May-23 at 16:42

            How concerned are you about efficiency? As a lower bound someone has to send a packet to all N nodes in the network to propagate an update to all nodes.

            The most naive approach is to simply forward every message to all entries in your routing table. This will not do since it obviously leads to forwarding storms.

            The second most naive approach is to forward updates, i.e. newer data. This will result in N * log(N) traffic.

            If all your nodes are trusted and you don't care about the last quantum of efficiency you can already stop here.

            If nodes are not trusted you will need a mechanism to limit who can send updates and to verify packets.

            If you also care about efficiency you can add randomized backoff before forwarding and tracking which routing table entry already has which version to prune unnecessary forwarding attempts.

            If you don't want to gossip with the whole network but only a subset thereof you can implement subnetworks which interested nodes can join, i.e. subscribe to. Bittorrent Enhancement Proposal 50 describes such an approach.

            Source https://stackoverflow.com/questions/50485796

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install kademlia

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/cfromknecht/kademlia.git

          • CLI

            gh repo clone cfromknecht/kademlia

          • sshUrl

            git@github.com:cfromknecht/kademlia.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular Stream Processing Libraries

            gulp

            by gulpjs

            webtorrent

            by webtorrent

            aria2

            by aria2

            ZeroNet

            by HelloZeroNet

            qBittorrent

            by qbittorrent

            Try Top Libraries by cfromknecht

            tpec

            by cfromknechtGo

            certcoin

            by cfromknechtGo

            dtls

            by cfromknechtGo

            OZcoin

            by cfromknechtGo

            lattice

            by cfromknechtC++