murmur3 | A rust implementation of murmur3 | Hashing library
kandi X-RAY | murmur3 Summary
kandi X-RAY | murmur3 Summary
This is a rust implementation of the fast, non-cryptographic hash murmur3. See the API docs for example code.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of murmur3
murmur3 Key Features
murmur3 Examples and Code Snippets
Community Discussions
Trending Discussions on murmur3
QUESTION
I have built the FFmpeg with libx264 into static libs, here is my directory tree.
...ANSWER
Answered 2021-Feb-22 at 07:15I made a mistake in the build script:
QUESTION
I want to use pyhash
murmur3 128 bits algorithm in my program.
It has 2 different variants: murmur3_x64_128
and murmur3_x86_128
.
Is it referring to python platform or unix platform?
Because I have to write an if else
condition for my program to select the optimized variant on runtime.
Eg usage: (both are working on my system) (but, my python and linux both are 64 bit)
...ANSWER
Answered 2020-Apr-26 at 17:50It is referring to the platform of your machine, not Python. As you note, they are not the same hash, and can not be used interchangeably.
murmur3_x64_128
has better performance than murmur3_x86_128
on 64-bit platforms, but has pretty bad performance on 32-bit platforms that do not have native 64-bit operations.
murmur3_x86_128
has equal performance on both platforms.
QUESTION
ANSWER
Answered 2019-Oct-05 at 15:05The whole point of the musl-gcc
wrapper script is to invoke gcc
with the include and library paths adjusted to isolate it from the host include and library ecosystem (which are assumed to be glibc-based). That includes the kernel headers for your host system. If you want to use any libraries (including "header-only libraries" like the kernel headers) with musl-gcc
, you need to build a version against musl instead of glibc and install it in the musl include/library path.
For kernel headers, they don't actually depend on the libc or have any library files; it's just the headers. So you can probably get by with copying (or symlinking) the linux
, asm
, and asm-generic
directories from /usr/include
to the musl include dir. Alternatively you can install them from kernel sources.
If you find you need any significant amount of third-party library stuff, though, it makes more sense to just drop musl-gcc
and use a real cross-compiler. You can get prebuilt binary ones if you're willing to trust them from musl.cc, or build your own (takes about 15 minutes on a typical system nowadays) with musl-cross-make. This will give you kernel headers automatically, as well as a full set of GCC target libraries that let you build C++ software, OpenMP-using software, etc.
QUESTION
A distributed system can generate unique ids either by Flake or cryptographic ids (e.g., 128 bit murmur3).
Wonder what are the pros and cons of each method.
...ANSWER
Answered 2019-May-16 at 22:08I'm going to assume 128-bit ids, kind-a like UUIDs. Let's start at a baseline, though
TL;DR: Use random ids. If and only if you have database performance issues try flake ids.
Auto-increment idsAuto-increment ids are when your backend system assigns a unique, densely-packed id to each new entity. This is usually done by a database, but not always.
The clear advantage is that the id is guaranteed unique to your system, though 128 bits is probably overkill.
The first disadvantage is that you leak information every time you expose your id. You leak what other ids there are (an attacker can easily guess what to look for). You also leak how busy your system is (your competition now knows how many ids you create in a time period and can infer, say financial information).
The second disadvantage is that your backend is no longer as scalable. You are tied to some slow, less scalable id generator that will always be a bottleneck in a large system.
Random idsRandom ids are when you just generate 128 random bytes. v4 UUIDs 122-bit random ids (e.g. 2bbfb5ba-f5a2-11e7-8c3f-9a214cf093ae
). These are also practically unique.
Random ids get rid of both of the disadvantages of auto-increment ids: they leak no information and are infinitely scalable.
The disadvantage comes when storing ids in b-trees (à la databases) because they randomize the memory/disk pages that the tree accesses. This may be a source of slow-downs to your system.
To me this is still the ideal id scheme, and you should have a good reason to move off of it. (i.e. profiler data).
Flake idsFlake ids are random ids with except that the high k
bits are taken from the lower bits of a timestamp. For example, you may get the following three ids in a row, where the top bits are really close together.
- 2bbfb5baf5a211e78c3f9a214cf093ae
- 2bbf9d4ec10c41049fb1671d6616b213
- 2bc6bb66e5964fb59050fcf3beed51b1
While you may leak some information, it isn't much if your k
and timestamp granularity are designed well.
But if you mal-design the ids they can be less-than-helpful, either too infrequently updated—leading the b-trees to rely on the top random bits negating the usefulness—or too frequently—where you thrash the database because your updates.
Note: By time granularity, I mean how frequently the low bits of a timestamp change. Depending on your data throughput, you probably want this to be hour, deca-minutes, or minutes. It's a balance.
If you see the ids otherwise semantic-less (i.e. never infer anything from the top bits) then you can change any of these parameters at any time without interruption—even going back to purely random where k = 0
.
I'm assuming by this you mean ids have some semantic information encrypted in them. Maybe like hashids?
Disadvantages abound:
- You'll have different length ids for different data, unless you have a fixed-length protocol.
- You'll be tempted to add more and more info to the ids.
- Look random, but no mitigation to add flake-like timestamps to the front
- Ids become tied to the system that made it. You may start asking that system for decrypted versions of the id instead of just asking for the data it points to.
- Your system burns time decrypting ids to extract data.
- You add encryption problems
- what happens if the secret-key is leaked? (Better not have too sensitive of data in there, customer name, or heaven forbid a credit card number)
- coordinating key rotation.
- Small ids like hashid can be brute-forced attack.
As you can see, I am not a fan of semantic ids in general. There are a few places where I use them, though I call them tokens. These don't get stored as keys in a database (or likely not stored anywhere).
For example I use encryption for pagination tokens: encrypted {last-id / context}
of a pagination API. I prefer this over having the client pass the last element of the prior page because we keep the database context hidden from the user. It's simpler for everyone, and the encryption is little more than obfuscation (no sensitive information).
QUESTION
The Token function in my driver doesn't support a composite partition key, but it works very well with a single partition key, it takes a binary in 8 bits form as an input and pass it to murmur3 hash function and extract the 64-signed-little-integer (Token) from the result of murmur3 and ignore any extra binary buffer.
So my hope is to generate the binary for a composite partition key and then pass it to murmur3 as usual, an algorithm or bitwise operations will be really helpful or at least a source in any programming language.
I don't mean murmur3 part, only the token side which converts/mixes the composite partition key and outputs raw bytes in binary form.
...ANSWER
Answered 2018-Sep-29 at 02:36Take a look at the drivers since they have generate the token to find the correct coordinator. https://github.com/datastax/java-driver/blob/8be7570a3c7fbba773ae2581bbf26e8196e7d6fb/driver-core/src/main/java/com/datastax/driver/core/Token.java#L112
Its slightly different than the typical murmur3 due to a bug when it was made and inability to change it without breaking existing clusters. So I would recommend copying it from them or better yet, use the existing drivers to find the token.
QUESTION
After going through multiple websites, partition key in cassandra is responsible for identifying the node in the cluster where it stores data. But I don't understand on what parameter number of partitions are created(like keyspace responsible for Replication Factor) in cassandra..! or it creates partitions based on murmur3 without being able to specifying partitions explicitly
Thanks in Advance
...ANSWER
Answered 2018-Mar-16 at 03:58Cassandra by default uses partitioner based on Murmur3 hash that generates values in range from -2^63 to 2^63-1. Each node in cluster is responsible for particular range of hash values, and data with partition key hashed to values in this range go to that node(s). I recommend to read documentation about Cassandra/DSE architecture - it will make things easier to understand.
QUESTION
I was doing a basic repartion on dataset. I have data like below in file test.csv
...ANSWER
Answered 2018-Feb-16 at 18:21The mistake you've made, is the assumption that hashing is done on a Scala string. In practice Spark hashes on unsafe byte array directly.
So the expression is equivalent to
QUESTION
Currently we are using random partitioner and we want to update that to murmur3 partitioner. I know we can achive this by using sstable2json and then json2sstable to convert your SSTables manually. Then I can use sstableloader or we need to create new cluster with murmur3 and write an application to pull all the data from old cluster and write to a new cluster.
is there a other easy way to achieve this?
...ANSWER
Answered 2018-Feb-06 at 19:29There is no easy way, its a pretty massive change so might want to check on if its absolutely necessary (do some benchmarks, its likely undetectable). Its more a kind of change to make if your switching to a new cluster anyway.
To do it live: Create a new cluster thats murmur3, write to both clusters. In background read and copy data to new cluster while the writes are duplicated. Once background job is complete flip reads from old cluster to new cluster and then you can decommission old cluster.
Offline: sstable2json->json2sstable is pretty inefficient mechanism. Will be a lot faster if you use an sstable reader and use sstable writer (ie edit SSTableExport in cassandra code to write a new sstable instead of dumping output). If you have smaller dataset the cqlsh COPY command may be viable.
QUESTION
If I apply a hash index on a varchar, what algorithm will postgres 10 use to hash the value? Will it be MD5? Murmur3? FNV-1? I am unable to find this documented anywhere.
...ANSWER
Answered 2018-Feb-01 at 10:16You can find the correct function with this query:
QUESTION
I have an Elastic search Index with the following mappings:
...ANSWER
Answered 2017-Dec-19 at 14:16The issue you are seeing is probably because you are not marking the segment_aggregate type as nested.
By default, all fields are independently indexed. Even though the JSON structure looks like you are associating specific values inside the inner object in segment_aggregate together, really ES is creating an index of values for segment_aggregate.segment_name and a separate index for segment_aggregate.segment_value.
This means when you do a search like this (assuming query string):
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install murmur3
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page