YCSB | Yahoo! Cloud Serving Benchmark | Runtime Evironment library

by brianfrankcooper Java Version: 0.17.0 License: Apache-2.0

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | YCSB Summary

YCSB is a Java library typically used in Server, Runtime Evironment applications. YCSB has build file available, it has a Permissive License and it has medium support. However YCSB has 123 bugs and it has 1 vulnerabilities. You can download it from GitHub, Maven.

Yahoo! Cloud Serving Benchmark

Support

Quality

Security

License

Reuse

Support

YCSB has a medium active ecosystem.

It has 4516 star(s) with 2166 fork(s). There are 213 watchers for this library.

It had no major release in the last 12 months.

There are 236 open issues and 701 have been closed. On average issues are closed in 103 days. There are 95 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of YCSB is 0.17.0

Quality

YCSB has 123 bugs (18 blocker, 4 critical, 42 major, 59 minor) and 1499 code smells.

Security

YCSB has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

YCSB code analysis shows 1 unresolved vulnerabilities (1 blocker, 0 critical, 0 major, 0 minor).

There are 117 security hotspots that need review.

License

YCSB is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

YCSB releases are available to install and integrate.

Deployable package is available in Maven.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

YCSB saves you 15213 person hours of effort in developing the same functionality from scratch.

It has 30366 lines of code, 1447 functions and 335 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed YCSB and discovered the below as its top functions. This is intended to give you an instant insight into YCSB implemented functionality, and help decide if they suit your requirements.

Initialize AzureCosmos client .
Initialize keys and tags .
Entry point to YCSB .
Sets up table .
This method scans a table .
Update url options .
Write the contents of the object to the S3 object .
Load classes and tables .
Export measurement measurements .
Single item mutation .

Get all kandi verified functions for this library.

YCSB Key Features

No Key Features are available at this moment for YCSB.

YCSB Examples and Code Snippets

How to tidying my Java code because it has too many looping

Java

Lines of Code : 110

License : Strong Copyleft (CC BY-SA 4.0)

Copy


//Test class
public class Test {
    public static void main(String[] args) {
        Node root = new Node(1, "test1", new Node[]{
                new Node(2, "test2", new Node[]{
                        new Node(5, "test6", new Node[]{})

convert StreamBuilder to StreamProvider

Lines of Code : 71

License : Strong Copyleft (CC BY-SA 4.0)

Copy

dependencies:
  flutter:
    sdk: flutter
  provider: ^6.0.2

import 'package:flutter/material.dart';
import 'package:provider/provider.dart';

void main() {
  runApp(
    const MyApp());
}

class MyApp extends StatelessWidget {
  const My

How to make colored buttons tkinter

Lines of Code : 15

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from tkinter import *

# Define the Window
root = Tk()
# Add a Title to The Window
root.title('Colored Button')
# Geometry of window; width by length in pixels
root.geometry('300x200')
# Define the Button; fg is the foreground, bg is the b

Best suited data structure for prefix matching search

Lines of Code : 130

License : Strong Copyleft (CC BY-SA 4.0)

Copy

class TrieNode {
    constructor(data=null) {
        this.children = {}; // Dictionary, 
        this.data = data; // Non-null when this node represents the end of a valid word
    }
    addWord(word, data) {
        let node = this; // t

CustomPaint Erase shape on hit with fade effect

Lines of Code : 101

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import 'package:flutter/material.dart';

void main() {
  runApp(const MyApp());
}

class MyApp extends StatelessWidget {
  const MyApp({Key? key}) : super(key: key);

  // This widget is the root of your application.
  @override
  Widget b

Artifactory Docker Image in ECS Fargate has bad permissions

Lines of Code : 26

License : Strong Copyleft (CC BY-SA 4.0)

Copy

ubuntu@ip-10-0-1-29:/mnt/efs/fs1$ ls -la
total 40
drwxr-xr-x 10 root root 6144 Apr  6 21:40 .
drwxr-xr-x  3 root root 4096 Apr  5 07:40 ..
drwxr-xr-x  2 1030 1030 6144 Apr  6 21:40 artifactory
drwxr-xr-x  9 1030 1030 6144 Apr  5 07:26 back

How to chain animation and non-animation functions in Flutter?

Lines of Code : 286

License : Strong Copyleft (CC BY-SA 4.0)

Copy

    shuffle.onPressed() {
      disable user input;
      iterate over the grid {
        if (cell contains a text value) {
          push Text widget key onto a stack (List);
          trigger the hide animation (pass callback #1);

How do i create hyperlinks for every button in a list? [Tkinter]

Lines of Code : 139

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from tkinter import *
import webbrowser

root = Tk()
root.title('Scrollbar text box')
root.geometry("600x500")


#my exercise list

FullExerciseList = [
    "Abdominal Crunches", 
    "Russian Twist",
    "Mountain Climber",
    "Heel Touc

Permission denied when attempting to edit docker container files

Lines of Code : 31

License : Strong Copyleft (CC BY-SA 4.0)

Copy

docker compose exec adminer ash

docker compose exec --user root adminer ash

FROM adminer 

COPY ./0-upload_large_dumps.ini \
     /usr/local/etc/php/conf.d/0-upload_large_dumps.ini
## ^-- c

Using yaml in bash script

Lines of Code : 10

License : Strong Copyleft (CC BY-SA 4.0)

Copy

#!/bin/bash
Database="$(yq e '.Database' t_partitions.yaml)"
Table="$(yq e '.Table' t_partitions.yaml)"
Partitions="$(yq e '.Partitions' t_partitions.yaml)"

mysql -u root -p -e "
use $Database; 
alter table $Table truncate partition $Part

Community Discussions

Trending Discussions on YCSB

ArangoDB Request Load balancing

Bash Script - Python can't find a path

How to benchmark 2 Redis nodes with YCSB

Cassandra - HDD vs. SSD usage makes no difference in throughput

KUDU for JDBC replication purposes, but not for Off-loaded Analytics

Init static variable in synchronized block for all the threads, read without synchronized

Why the TiDB performance drop for 10 times when the updated field value is random?

Low read throughput in Cloud Spanner

Correlation between throughtput and latency when benchmarking with YCSB

Hadoop with MongoDB storage

QUESTION

ArangoDB Request Load balancing

Asked 2020-Dec-01 at 21:32

I'm currently doing benchmarks for my studies using YCSB on an ArangoDB cluster (v3.7.3), that I set up using the starter (here).

I'm trying to understand if and how a setup like that ( I'm using 4 VMs e.g.) helps with balancing request load? If I have nodes A,B,C and D and I tell YCSB the IP of node A, all the requests go to node A...

That would mean that a cluster is unnecessary if you want to balance request load, wouldn't it? It would just make sense for data replication.

How would I handle the request load then? I'd normally do that in my application, but I can not do that if I use existing tools like YCSB... (or can I?)

Thanks for the help!

...

ANSWER

Answered 2020-Dec-01 at 21:32

I had this problem as well and ended up solving it by standing-up nginx in front of my cluster, providing a stable, language-independent way to distribute query load. I found nginx surprisingly simple to configure, but take a look at the upstream module for more details.

Source https://stackoverflow.com/questions/64959668

QUESTION

Bash Script - Python can't find a path

Asked 2020-Sep-25 at 15:46

Please I need help!

This is my first bash script, and it's calling a python script at some points. But I always get this output at line 28 and 40:

...

ANSWER

Answered 2020-Sep-25 at 15:38

ogs/tp1 may be there but this is not same as 'logs/tp1/'. You should remove the single quotes.

Source https://stackoverflow.com/questions/64067147

QUESTION

How to benchmark 2 Redis nodes with YCSB

Asked 2020-Aug-10 at 09:08

I want to benchmark 2 redis nodes with yahoo's YCSB. Because I can't run those nodes in cluster mode (redis has a minimum of 6 nodes in order to run it in cluster mode) I created a master and a slave node using the slaveof no one for the master node and the slaveof for the slave node. But when I try to load the keys from YCSB to redis , with cluster.mode=true in ycsb command, i get an error because YCSB reads the config file of redis master and sees that cluster mode is disabled. Note that in order to run the slaveof command we mustn't be in cluster mode. Does anyone know a workaround on that?

...

ANSWER

Answered 2020-Aug-10 at 09:08

In my understanding, you want to performance test of Redis scalability. so, you want to set the cluster flag to true.

Unfortunately, the current Redis implementation of YSCB does not support master-slave mode of Redis(The YCSB using Jedis 2.9.0 for Redis connection client.).

If you still want to you want to performance test of Redis scalability, There is two options.

The first one is upgrade Jedis version to 3.X or higher and then rewriting RedisClient.java of YCSB(https://github.com/brianfrankcooper/YCSB/blob/master/redis/src/main/java/site/ycsb/db/RedisClient.java) for your own testing scenario.

The second one is to separate the case for cluster mode and master-slave mode of Redis. Redis cluster need at least 6 nodes. but, you can reduce 3 node. Just setting up 6 nodes and then shutting down slave nodes. it's not recommended for production but enough for tests.

You can test it 3,6,9... nodes.

I hope it helps for you.

Source https://stackoverflow.com/questions/63308821

QUESTION

Cassandra - HDD vs. SSD usage makes no difference in throughput

Asked 2020-Mar-22 at 10:34

The Context
I'm currently running tests with Apache Cassandra on a single node cluster. I've ensured the cluster is up and running using nodetool status, I've done a multitude of reads and writes that suggest as such, and I'm confident my cluster is set up properly. I am now attempting to speed up my throughput by mounting a SSD onto the directory where Cassandra writes its data to.

My Solution
The write location of Cassandra data is generally to /var/lib/cassandra/data, however I've since switched mine using cassandra.yaml to write to another location, where I've mounted my SSD. I've ensured that Cassandra is writing to this location by checking the size of the data directory's contents through watch du -h and other methods. The directory I've mounted the SSD on includes table data, commitlog, hints, a nested data directory, and saved_caches.

The Problem
I've been using YCSB benchmarks (see https://github.com/brianfrankcooper/YCSB) to test the average throughput and ops/sec of Cassandra. I've noticed no difference in the average throughput when mounting HDD vs. SSD on the location where Cassandra writes its data to. I've analyzed disk access through dstat -cd --disk-util --disk-tps and found HDD caps out on CPU usage in multiple instances whereas SSD only spikes to around 80% on several occassions.

The Question
How can I speed up the throughput of Cassandra using a SSD over a HDD? I assume this is the correct place to mount my SSD, but does Cassandra not utilize its extra processing power? Any help would be greatly appreciated!

...

ANSWER

Answered 2020-Mar-22 at 10:34

SSD should always win over the HDD in terms of latency, etc. It's just a law of physics. I think that your test simply didn't provide enough load on the system. Another problem could be that you mount only data to SSD, but not the commit logs - on HDDs they should be always put onto a separate disk to avoid clashes with data load. On SSDs they could be put on the same disk as data - please point all directories to SSD to see a difference.

I recommend to perform a comparison by using following tools:

perfscripts - it uses fio tool to emulate Cassandra-like workloads, and if you run it on the both HDDs & SSDs, then you will see the difference in latency. You may not even execute it - just look historic folder, where there are results for different disk types;
DSBench - it was recently released by DataStax team, who is specializing in benchmarking Cassandra and DSE. There are built-in workloads described in wiki, that you can use for testing. Only make sure that you run the load long enough to see the effect of compaction, etc.

Source https://stackoverflow.com/questions/60796808

QUESTION

KUDU for JDBC replication purposes, but not for Off-loaded Analytics

Asked 2019-Nov-25 at 11:51

Given the quote from Apache KUDU official documentation, namely: https://kudu.apache.org/overview.html

Kudu isn't designed to be an OLTP system, but if you have some subset of data which fits in memory, it offers competitive random access performance. We've measured 99th percentile latencies of 6ms or below using YCSB with a uniform random access workload over a billion rows. Being able to run low-latency online workloads on the same storage as back-end data analytics can dramatically simplify application architecture.

Does this statement imply that KUDU can be used for replication from a JDBC source - the simplest form possible?

...

ANSWER

Answered 2019-Nov-25 at 11:51

Elsewhere I have used KUDU for replicating to from SAP and other COTS, so that reports could run against the KUDU tables as opposed to Hana. That was an architecture decided upon by others.

For pure replication of data, primarily for subsequent extractions from a Data Lake, for data with embellished history with a size < 1TB, this is feasible as well. Cloudera confirmed this after discussion. Even though KUDU has a columnar format and a row format would be desirable, it simply works as well.

Source https://stackoverflow.com/questions/58586931

QUESTION

Init static variable in synchronized block for all the threads, read without synchronized

Asked 2019-Oct-04 at 15:11

I often see the following pattern. One thread will initialize the client in init() method in synchronized block. All the other threads, also called init() method before they start to use the other class methods. Client value is not changed after initialization. They dont set the client value as volatile.

My question is that if this is correct to do? Do all of the threads that create client, and call init() method , will after init method finished see the correct initilized value that was initialized byt the first thread that called init() method?

...

ANSWER

Answered 2019-Oct-04 at 14:59

It looks like the rationale behind this type of pattern is to ensure that you can only have one instance of Client in the application. Multiple invocations (parallel/sequential) of init() method on different/same DB objects will not allow creating a new Client if it is already created and synchronized block is just to ensure that client object will be created only once if multiple threads called init() parallelly.

But it has nothing to do with safe call of insert() method on client object and that totally depends on the implementation of the insert() method that may be thread-safe or may not be.

Source https://stackoverflow.com/questions/58238522

QUESTION

Why the TiDB performance drop for 10 times when the updated field value is random?

Asked 2019-May-02 at 07:51

I set up the TiDB, TiKV and PD cluster in order to benchmark them with YCSB tool, connected by the MySQL driver. The cluster consists of 5 instances for each of TiDB, TiKV and PD. Each node run a single TiDB, TiKV and PD instance.

However, when I play around the YCSB code in the update statement, I notice that if the value of the updated field is fixed and hardcoded, the total throughput is ~30K tps and the latency at ~30ms. If the updated field value is random, the total throughput is ~2k tps and the latency is around ~300ms.

The update statement creation code is as follow:

...

ANSWER

Answered 2019-May-02 at 07:51

I manage to figure this out from this post (Same transaction returns different results when i ran multiply times) and pr (https://github.com/pingcap/tidb/issues/7644). It is because TiDB will not perform the txn if the updated field is identical to the previous value.

Source https://stackoverflow.com/questions/55913334

QUESTION

Low read throughput in Cloud Spanner

Asked 2018-Dec-26 at 22:02

I have a database populated with 100M rows with simple keys and values. The primary key is just a random 32-byte string, and the value is a 32-byte string. (It's quite similar to YCSB, although smaller).

I'm seeing wildly inconsistent throughput for a single node doing point reads. I'm seeing up to 15k QPS for a single node, but sometimes I'm seeing much lower throughput. The higher QPS seems to be the result of querying for a smaller subset of the keys. Is it possible that I'm running into some strange caching behavior?

...

ANSWER

Answered 2017-Apr-06 at 22:05

Caching (i.e. caching data from secondary storage) should not affect your performance so severely, and it generally can be ignored in most performance discussions for Cloud Spanner. However, Cloud Spanner does have a query cache, which might be part of the issue here.

There are a few factors that could affect your performance so severely:

1) If you are using SQL queries for your point reads, make sure you are using query parameters. In other words, make sure you are populating the params and paramTypes fields in your executeSql requests. This will improve performance for queries and also provide better security. More information is available in this whitepaper on query performance.

2) If you are running a loadtest, make sure you run your workload for at least 30 minutes to ensure that Spanner has the chance to optimize the distribution of your data by balancing (and creating new) splits across your nodes.

Note that you should be able to see great read performance at any level of freshness (e.g. Strong Reads), and you may see a slight bump up if you use Bounded-Staleness.

Source https://stackoverflow.com/questions/43266531

QUESTION

Correlation between throughtput and latency when benchmarking with YCSB

Asked 2018-Oct-15 at 08:04

I'm using YCSB to benchmark a number of different NoSQL databases. However, when playing around with the number of client threads I have a hard time interpreting the throughput vs. latency results.

For example, when benchmarking cassandra running workload a (50/50 reads and updates) with 16 client threads the following command is executed:

...

ANSWER

Answered 2018-Oct-15 at 08:04

In order to have a qualified benchmarks you should 1st define the SLA requirements you aim your system to achieve. Say your workload pattern is 50/50 WR/RD and your SLA requirements are 10K ops/sec throughput with 99th percentile latency < 10 millisec. Use YCSB -target flag to generate the needed throughput, and use various thread count to see which one meets your SLA needs.

It makes a lot of sense that when more threads are used, the throughput increased (more ops/sec), but that comes at a latency price. You should look into the relevant database metrics to try and find your bottleneck - it can be the:

Client (need a stronger client, or better parallelism using less threads but more clients)
Network
DB server (Disk / RAM - use a stronger instance).

You can read more about the Do's and Don't of DB benchmarking here

Source https://stackoverflow.com/questions/52759946

QUESTION

Hadoop with MongoDB storage

Asked 2018-Sep-16 at 05:37

I have a project to use NoSQL DB with Hadoop and benchmark it. I chose MongoDB as a database but I have been confused about something and have some questions that need to be clarified:

Will MongoDB be replacing HDFS or will they be working together and how?
Is benchmarking MongoDB alone different from doing it with Hadoop? Because I feel like at they are the same thing.
I found YCSB tool for benchmarking. Can it benchmark them together?
I know that MongoDB can work on cluster, when monogo on top of Hadoop , will the data be shared among nodes by MongoDB or by Hadoop?

I hope you clarify these concepts and thank you in advance.

...

ANSWER

Answered 2018-Sep-16 at 05:37

Will MongoDB be replacing HDFS

Absolutely not. HDFS is not meant to be used as a database, and Mongo is not a distributed filesystem capable of storing Petabytes of any data

will they be working together and how?

HIve and Spark can read data from Mongo directly. I'm sure there's other tools that can backup Mongo into HDFS.

Is benchmarking MongoDB alone different from doing it with Hadoop

Yes, reads and writes will be vastly different tuning parameters than HDFS, because HDFS is not a database

YCSB tool for benchmarking

Its not clear what you're benchmarking in Hadoop. Writing and reading a bunch of files (with and without mapreduce)? Seeing how many jobs run in YARN at a given time? Hadoop again isn't a database meant to store simple JSON blobs.

when monogo on top of Hadoop , will the data be shared among nodes by MongoDB or by Hadoop?

I've never heard of this, but maybe indicies are stored by Mongo, and raw data served by HDFS?

Source https://stackoverflow.com/questions/52337696

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install YCSB

You can download it from GitHub, Maven.
You can use YCSB like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the YCSB component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: