Support
Quality
Security
License
Reuse
kandi has reviewed YCSB and discovered the below as its top functions. This is intended to give you an instant insight into YCSB implemented functionality, and help decide if they suit your requirements.
Yahoo! Cloud Serving Benchmark
default
* To get here, use https://ycsb.site
* [Our project docs](https://github.com/brianfrankcooper/YCSB/wiki)
* [The original announcement from Yahoo!](https://labs.yahoo.com/news/yahoo-cloud-serving-benchmark/)
Getting Started
---------------
1. Download the [latest release of YCSB](https://github.com/brianfrankcooper/YCSB/releases/latest):
```sh
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
tar xfvz ycsb-0.17.0.tar.gz
cd ycsb-0.17.0
```
2. Set up a database to benchmark. There is a README file under each binding
directory.
3. Run YCSB command.
On Linux:
```sh
bin/ycsb.sh load basic -P workloads/workloada
bin/ycsb.sh run basic -P workloads/workloada
```
On Windows:
```bat
bin/ycsb.bat load basic -P workloads\workloada
bin/ycsb.bat run basic -P workloads\workloada
```
Running the `ycsb` command without any argument will print the usage.
See https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload
for a detailed documentation on how to run a workload.
See https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties for
the list of available workload properties.
Building from source
--------------------
YCSB requires the use of Maven 3; if you use Maven 2, you may see [errors
such as these](https://github.com/brianfrankcooper/YCSB/issues/406).
To build the full distribution, with all database bindings:
mvn clean package
To build a single database binding:
mvn -pl site.ycsb:mongodb-binding -am clean package
ArangoDB Request Load balancing
upstream arangodb {
server coord-1.my.domain:8529;
server coord-2.my.domain:8529;
server coord-3.my.domain:8529;
}
server {
listen *:80 default_server;
server_name _; # Listens for ALL hostnames
proxy_next_upstream error timeout invalid_header;
location / {
proxy_pass http://arangodb;
}
}
Bash Script - Python can't find a path
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
-----------------------
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
-----------------------
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
-----------------------
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
Init static variable in synchronized block for all the threads, read without synchronized
public class DB {
private static final Client client = new Client();
public insert(Object data) {
client.insert(data); // Guaranteed to be initialized once class loading is complete.
}
}
public class DB {
private static class Holder {
private static final Client client = new Client();
}
public insert(Object data) {
Holder.client.insert(data); // Holder.client is initialized on first access.
}
}
public insert(Object data) {
init();
client.insert(data);
}
-----------------------
public class DB {
private static final Client client = new Client();
public insert(Object data) {
client.insert(data); // Guaranteed to be initialized once class loading is complete.
}
}
public class DB {
private static class Holder {
private static final Client client = new Client();
}
public insert(Object data) {
Holder.client.insert(data); // Holder.client is initialized on first access.
}
}
public insert(Object data) {
init();
client.insert(data);
}
-----------------------
public class DB {
private static final Client client = new Client();
public insert(Object data) {
client.insert(data); // Guaranteed to be initialized once class loading is complete.
}
}
public class DB {
private static class Holder {
private static final Client client = new Client();
}
public insert(Object data) {
Holder.client.insert(data); // Holder.client is initialized on first access.
}
}
public insert(Object data) {
init();
client.insert(data);
}
how to configure spark sql thrift server
sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port 8088
YCSB JDBS driver: java.lang.ClassNotFoundException
db.driver=com.mysql.jdbc.Driver # The JDBC driver class to use.
db.url=jdbc:mysql://master:3306/ycsb # The Database connection URL.
db.user=root # User name for the connection.
db.passwd=
# The JDBC driver class to use.
db.driver=com.mysql.jdbc.Driver
# The Database connection URL.
db.url=jdbc:mysql://master:3306/ycsb
# User name for the connection.
db.user=root
db.passwd=
-----------------------
db.driver=com.mysql.jdbc.Driver # The JDBC driver class to use.
db.url=jdbc:mysql://master:3306/ycsb # The Database connection URL.
db.user=root # User name for the connection.
db.passwd=
# The JDBC driver class to use.
db.driver=com.mysql.jdbc.Driver
# The Database connection URL.
db.url=jdbc:mysql://master:3306/ycsb
# User name for the connection.
db.user=root
db.passwd=
-----------------------
<dependency>
<groupId>com.mysql.driver</groupId>
<artifactId>mysqldriver</artifactId>
<version>5.1.46</version>
<scope>system</scope>
<systemPath>/PATH/YCSB/jdbc/lib/mysql-connector-java-5.1.46-bin.jar</systemPath>
</dependency>
How to map MongoDB data in Spark for kmeans?
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
.option("uri","mongodb://127.0.0.1/ycsb.usertable")
.load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd
# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
# Feed into KMeans
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession
spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()
rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
-----------------------
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
.option("uri","mongodb://127.0.0.1/ycsb.usertable")
.load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd
# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
# Feed into KMeans
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession
spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()
rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
-----------------------
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
.option("uri","mongodb://127.0.0.1/ycsb.usertable")
.load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd
# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
# Feed into KMeans
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession
spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()
rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
-----------------------
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
.option("uri","mongodb://127.0.0.1/ycsb.usertable")
.load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd
# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
# Feed into KMeans
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession
spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()
rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
Huge runtime difference between running YCSB with and without encryption with Workload E
recordcount=10000000
operationcount=100000
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100
scanlengthdistribution=uniform
Cannot find or load main class com.yahoo.ycsb.Client
"foostore" : "com.yahoo.ycsb.db.FooStoreClient",
cd ycsb-foostore-binding-0.13.0-SNAPSHOT/
/bin/ycsb load foostore -thread 1 -P workload/worloada -s
-----------------------
"foostore" : "com.yahoo.ycsb.db.FooStoreClient",
cd ycsb-foostore-binding-0.13.0-SNAPSHOT/
/bin/ycsb load foostore -thread 1 -P workload/worloada -s
QUESTION
ArangoDB Request Load balancing
Asked 2020-Dec-01 at 21:32I'm currently doing benchmarks for my studies using YCSB on an ArangoDB cluster (v3.7.3), that I set up using the starter (here).
I'm trying to understand if and how a setup like that ( I'm using 4 VMs e.g.) helps with balancing request load? If I have nodes A,B,C and D and I tell YCSB the IP of node A, all the requests go to node A...
That would mean that a cluster is unnecessary if you want to balance request load, wouldn't it? It would just make sense for data replication.
How would I handle the request load then? I'd normally do that in my application, but I can not do that if I use existing tools like YCSB... (or can I?)
Thanks for the help!
ANSWER
Answered 2020-Dec-01 at 21:32I had this problem as well and ended up solving it by standing-up nginx in front of my cluster, providing a stable, language-independent way to distribute query load. I found nginx surprisingly simple to configure, but take a look at the upstream module for more details.
upstream arangodb {
server coord-1.my.domain:8529;
server coord-2.my.domain:8529;
server coord-3.my.domain:8529;
}
server {
listen *:80 default_server;
server_name _; # Listens for ALL hostnames
proxy_next_upstream error timeout invalid_header;
location / {
proxy_pass http://arangodb;
}
}
It's not ideal for everyone but works well for those times when you just need to load-balance and don't want to write a bunch of code (which ends up being quite slow) to resolve coordinators.
I've asked ArangoDB for a native proxy server solution, but I would bet it's low on their to-do list as it could be tricky to support, given the huge number of configuration options.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Save this library and start creating your kit
Save this library and start creating your kit