kandi background
Explore Kits

YCSB | Yahoo! Cloud Serving Benchmark

 by   brianfrankcooper Java Version: 0.17.0 License: Apache-2.0

 by   brianfrankcooper Java Version: 0.17.0 License: Apache-2.0

Download this library from

kandi X-RAY | YCSB Summary

YCSB is a Java library. YCSB has build file available, it has a Permissive License and it has medium support. However YCSB has 123 bugs and it has 1 vulnerabilities. You can download it from GitHub, Maven.
<!-- Copyright (c) 2010 Yahoo! Inc., 2012 - 2016 YCSB contributors. All rights reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -→.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • YCSB has a medium active ecosystem.
  • It has 3792 star(s) with 1882 fork(s). There are 217 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 196 open issues and 657 have been closed. On average issues are closed in 53 days. There are 73 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of YCSB is 0.17.0
YCSB Support
Best in #Java
Average in #Java
YCSB Support
Best in #Java
Average in #Java

quality kandi Quality

  • YCSB has 123 bugs (18 blocker, 4 critical, 42 major, 59 minor) and 1499 code smells.
YCSB Quality
Best in #Java
Average in #Java
YCSB Quality
Best in #Java
Average in #Java

securitySecurity

  • YCSB has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • YCSB code analysis shows 1 unresolved vulnerabilities (1 blocker, 0 critical, 0 major, 0 minor).
  • There are 117 security hotspots that need review.
YCSB Security
Best in #Java
Average in #Java
YCSB Security
Best in #Java
Average in #Java

license License

  • YCSB is licensed under the Apache-2.0 License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
YCSB License
Best in #Java
Average in #Java
YCSB License
Best in #Java
Average in #Java

buildReuse

  • YCSB releases are available to install and integrate.
  • Deployable package is available in Maven.
  • Build file is available. You can build the component from source.
  • Installation instructions are not available. Examples and code snippets are available.
  • YCSB saves you 15213 person hours of effort in developing the same functionality from scratch.
  • It has 30366 lines of code, 1447 functions and 335 files.
  • It has high code complexity. Code complexity directly impacts maintainability of the code.
YCSB Reuse
Best in #Java
Average in #Java
YCSB Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed YCSB and discovered the below as its top functions. This is intended to give you an instant insight into YCSB implemented functionality, and help decide if they suit your requirements.

  • Initialize AzureCosmos client .
  • Initialize keys and tags .
  • Entry point to YCSB .
  • Sets up table .
  • This method scans a table .
  • Update url options .
  • Write the contents of the object to the S3 object .
  • Load classes and tables .
  • Export measurement measurements .
  • Single item mutation .

YCSB Key Features

Yahoo! Cloud Serving Benchmark

default

copy iconCopydownload iconDownload
* To get here, use https://ycsb.site
* [Our project docs](https://github.com/brianfrankcooper/YCSB/wiki)
* [The original announcement from Yahoo!](https://labs.yahoo.com/news/yahoo-cloud-serving-benchmark/)

Getting Started
---------------

1. Download the [latest release of YCSB](https://github.com/brianfrankcooper/YCSB/releases/latest):

    ```sh
    curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
    tar xfvz ycsb-0.17.0.tar.gz
    cd ycsb-0.17.0
    ```

2. Set up a database to benchmark. There is a README file under each binding
   directory.

3. Run YCSB command.

    On Linux:
    ```sh
    bin/ycsb.sh load basic -P workloads/workloada
    bin/ycsb.sh run basic -P workloads/workloada
    ```

    On Windows:
    ```bat
    bin/ycsb.bat load basic -P workloads\workloada
    bin/ycsb.bat run basic -P workloads\workloada
    ```

  Running the `ycsb` command without any argument will print the usage.

  See https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload
  for a detailed documentation on how to run a workload.

  See https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties for
  the list of available workload properties.


Building from source
--------------------

YCSB requires the use of Maven 3; if you use Maven 2, you may see [errors
such as these](https://github.com/brianfrankcooper/YCSB/issues/406).

To build the full distribution, with all database bindings:

    mvn clean package

To build a single database binding:

    mvn -pl site.ycsb:mongodb-binding -am clean package

ArangoDB Request Load balancing

copy iconCopydownload iconDownload
upstream arangodb {
  server coord-1.my.domain:8529;
  server coord-2.my.domain:8529;
  server coord-3.my.domain:8529;
}

server {
  listen                *:80 default_server;
  server_name           _; # Listens for ALL hostnames
  proxy_next_upstream   error timeout invalid_header;
  
  location / {
    proxy_pass          http://arangodb;
  }
}

Bash Script - Python can't find a path

copy iconCopydownload iconDownload
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
-----------------------
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
-----------------------
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"
-----------------------
> "'logs/tp${throughput}/load-test${test}-workloada.txt'"
> "logs/tp${throughput}/load-test${test}-workloada.txt"
> ""'logs/tp${throughput}/run-test${test}-workload${workload}.txt'"
> "logs/tp${throughput}/run-test${test}-workload${workload}.txt"

Init static variable in synchronized block for all the threads, read without synchronized

copy iconCopydownload iconDownload
public class DB {

  private static final Client client = new Client();

  public insert(Object data) {
    client.insert(data);  // Guaranteed to be initialized once class loading is complete.
  }
}
public class DB {
  private static class Holder {
    private static final Client client = new Client();
  }

  public insert(Object data) {
    Holder.client.insert(data);  // Holder.client is initialized on first access.
  }
}
  public insert(Object data) {
    init();
    client.insert(data);
  }
-----------------------
public class DB {

  private static final Client client = new Client();

  public insert(Object data) {
    client.insert(data);  // Guaranteed to be initialized once class loading is complete.
  }
}
public class DB {
  private static class Holder {
    private static final Client client = new Client();
  }

  public insert(Object data) {
    Holder.client.insert(data);  // Holder.client is initialized on first access.
  }
}
  public insert(Object data) {
    init();
    client.insert(data);
  }
-----------------------
public class DB {

  private static final Client client = new Client();

  public insert(Object data) {
    client.insert(data);  // Guaranteed to be initialized once class loading is complete.
  }
}
public class DB {
  private static class Holder {
    private static final Client client = new Client();
  }

  public insert(Object data) {
    Holder.client.insert(data);  // Holder.client is initialized on first access.
  }
}
  public insert(Object data) {
    init();
    client.insert(data);
  }

how to configure spark sql thrift server

copy iconCopydownload iconDownload
sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port 8088 

YCSB JDBS driver: java.lang.ClassNotFoundException

copy iconCopydownload iconDownload
db.driver=com.mysql.jdbc.Driver # The JDBC driver class to use.
db.url=jdbc:mysql://master:3306/ycsb # The Database connection URL.
db.user=root # User name for the connection.
db.passwd=
# The JDBC driver class to use.
db.driver=com.mysql.jdbc.Driver 
# The Database connection URL.
db.url=jdbc:mysql://master:3306/ycsb 
# User name for the connection.
db.user=root 
db.passwd=
-----------------------
db.driver=com.mysql.jdbc.Driver # The JDBC driver class to use.
db.url=jdbc:mysql://master:3306/ycsb # The Database connection URL.
db.user=root # User name for the connection.
db.passwd=
# The JDBC driver class to use.
db.driver=com.mysql.jdbc.Driver 
# The Database connection URL.
db.url=jdbc:mysql://master:3306/ycsb 
# User name for the connection.
db.user=root 
db.passwd=
-----------------------
<dependency>
      <groupId>com.mysql.driver</groupId>
      <artifactId>mysqldriver</artifactId>
      <version>5.1.46</version>
      <scope>system</scope>
      <systemPath>/PATH/YCSB/jdbc/lib/mysql-connector-java-5.1.46-bin.jar</systemPath>
</dependency> 

How to map MongoDB data in Spark for kmeans?

copy iconCopydownload iconDownload
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
               .option("uri","mongodb://127.0.0.1/ycsb.usertable")
               .load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd

# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

# Feed into KMeans 
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array 
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession

spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()

df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()

rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
-----------------------
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
               .option("uri","mongodb://127.0.0.1/ycsb.usertable")
               .load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd

# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

# Feed into KMeans 
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array 
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession

spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()

df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()

rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
-----------------------
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
               .option("uri","mongodb://127.0.0.1/ycsb.usertable")
               .load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd

# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

# Feed into KMeans 
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array 
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession

spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()

df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()

rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters
-----------------------
df = spark.read.format("com.mongodb.spark.sql.DefaultSource")
               .option("uri","mongodb://127.0.0.1/ycsb.usertable")
               .load()
# Drop _id column and get RDD representation of the DataFrame
rowRDD = df.drop("_id").rdd

# Convert RDD of Row into RDD of numpy.array
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

# Feed into KMeans 
clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
parsedRdd = rowRDD.map(lambda row: array([x for x in row]))
from numpy import array 
from pyspark.mllib.clustering import KMeans
import org.apache.spark.sql.SparkSession

spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/ycsb.usertable") \
.config("spark.mongodb.output.uri", "mongodb:/127.0.0.1/ycsb.usertable") \
.getOrCreate()

df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource").load()

rowRDD = df.drop("_id").rdd
parsedRdd = rowRDD.map(lambda row: array([int(x) for x in row]))

clusters = KMeans.train(parsedRdd, 2, maxIterations=10, initializationMode="random")
clusters.clusterCenters

Huge runtime difference between running YCSB with and without encryption with Workload E

copy iconCopydownload iconDownload
recordcount=10000000
operationcount=100000
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100
scanlengthdistribution=uniform

Cannot find or load main class com.yahoo.ycsb.Client

copy iconCopydownload iconDownload
"foostore" : "com.yahoo.ycsb.db.FooStoreClient",
cd ycsb-foostore-binding-0.13.0-SNAPSHOT/
/bin/ycsb load foostore -thread 1 -P workload/worloada -s
-----------------------
"foostore" : "com.yahoo.ycsb.db.FooStoreClient",
cd ycsb-foostore-binding-0.13.0-SNAPSHOT/
/bin/ycsb load foostore -thread 1 -P workload/worloada -s

Community Discussions

Trending Discussions on YCSB
  • ArangoDB Request Load balancing
  • Bash Script - Python can't find a path
  • How to benchmark 2 Redis nodes with YCSB
  • Cassandra - HDD vs. SSD usage makes no difference in throughput
  • KUDU for JDBC replication purposes, but not for Off-loaded Analytics
  • Init static variable in synchronized block for all the threads, read without synchronized
  • Why the TiDB performance drop for 10 times when the updated field value is random?
  • Low read throughput in Cloud Spanner
  • Correlation between throughtput and latency when benchmarking with YCSB
  • Hadoop with MongoDB storage
Trending Discussions on YCSB

QUESTION

ArangoDB Request Load balancing

Asked 2020-Dec-01 at 21:32

I'm currently doing benchmarks for my studies using YCSB on an ArangoDB cluster (v3.7.3), that I set up using the starter (here).

I'm trying to understand if and how a setup like that ( I'm using 4 VMs e.g.) helps with balancing request load? If I have nodes A,B,C and D and I tell YCSB the IP of node A, all the requests go to node A...

That would mean that a cluster is unnecessary if you want to balance request load, wouldn't it? It would just make sense for data replication.

How would I handle the request load then? I'd normally do that in my application, but I can not do that if I use existing tools like YCSB... (or can I?)

Thanks for the help!

ANSWER

Answered 2020-Dec-01 at 21:32

I had this problem as well and ended up solving it by standing-up nginx in front of my cluster, providing a stable, language-independent way to distribute query load. I found nginx surprisingly simple to configure, but take a look at the upstream module for more details.

upstream arangodb {
  server coord-1.my.domain:8529;
  server coord-2.my.domain:8529;
  server coord-3.my.domain:8529;
}

server {
  listen                *:80 default_server;
  server_name           _; # Listens for ALL hostnames
  proxy_next_upstream   error timeout invalid_header;
  
  location / {
    proxy_pass          http://arangodb;
  }
}

It's not ideal for everyone but works well for those times when you just need to load-balance and don't want to write a bunch of code (which ends up being quite slow) to resolve coordinators.

I've asked ArangoDB for a native proxy server solution, but I would bet it's low on their to-do list as it could be tricky to support, given the huge number of configuration options.

Source https://stackoverflow.com/questions/64959668

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install YCSB

You can download it from GitHub, Maven.
You can use YCSB like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the YCSB component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

DOWNLOAD this Library from

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

  • © 2022 Open Weaver Inc.