spark-sql-perf

 by   databricks Scala Version: v0.2.4 License: Apache-2.0

kandi X-RAY | spark-sql-perf Summary

kandi X-RAY | spark-sql-perf Summary

spark-sql-perf is a Scala library typically used in Big Data, Spark applications. spark-sql-perf has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

This is a performance testing framework for Spark SQL in Apache Spark 2.2+.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              spark-sql-perf has a medium active ecosystem.
              It has 525 star(s) with 373 fork(s). There are 311 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 41 open issues and 19 have been closed. On average issues are closed in 18 days. There are 13 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of spark-sql-perf is v0.2.4

            kandi-Quality Quality

              spark-sql-perf has 0 bugs and 0 code smells.

            kandi-Security Security

              spark-sql-perf has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              spark-sql-perf code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              spark-sql-perf is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              spark-sql-perf releases are not available. You will need to build from source code and install.
              Installation instructions, examples and code snippets are available.
              It has 10463 lines of code, 282 functions and 75 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-sql-perf
            Get all kandi verified functions for this library.

            spark-sql-perf Key Features

            No Key Features are available at this moment for spark-sql-perf.

            spark-sql-perf Examples and Code Snippets

            No Code Snippets are available at this moment for spark-sql-perf.

            Community Discussions

            QUESTION

            Spark executors and shuffle in local mode
            Asked 2021-Jun-12 at 16:13

            I am running a TPC-DS benchmark for Spark 3.0.1 in local mode and using sparkMeasure to get workload statistics. I have 16 total cores and SparkContext is available as

            Spark context available as 'sc' (master = local[*], app id = local-1623251009819)

            Q1. For local[*], driver and executors are created in a single JVM with 16 threads. Considering Spark's configuration which of the following will be true?

            • 1 worker instance, 1 executor having 16 cores/threads
            • 1 worker instance, 16 executors each having 1 core

            For a particular query, sparkMeasure reports shuffle data as follows

            shuffleRecordsRead => 183364403
            shuffleTotalBlocksFetched => 52582
            shuffleTotalBlocksFetched => 52582
            shuffleLocalBlocksFetched => 52582
            shuffleRemoteBlocksFetched => 0
            shuffleTotalBytesRead => 1570948723 (1498.0 MB)
            shuffleLocalBytesRead => 1570948723 (1498.0 MB)
            shuffleRemoteBytesRead => 0 (0 Bytes)
            shuffleRemoteBytesReadToDisk => 0 (0 Bytes)
            shuffleBytesWritten => 1570948723 (1498.0 MB)
            shuffleRecordsWritten => 183364480

            Q2. Regardless of the query specifics, why is there data shuffling when everything is inside a single JVM?

            ...

            ANSWER

            Answered 2021-Jun-11 at 05:56
            • executor is a jvm process when you use local[*] you run Spark locally with as many worker threads as logical cores on your machine so : 1 executor and as many worker threads as logical cores. when you configure SPARK_WORKER_INSTANCES=5 in spark-env.sh and execute these commands start-master.sh and start-slave.sh spark://local:7077 to bring up a standalone spark cluster in your local machine you have one master and 5 workers, if you want to send your application to this cluster you must configure application like SparkSession.builder().appName("app").master("spark://localhost:7077") in this case you can't specify [*] or [2] for example. but when you specify master to be local[*] a jvm process is created and master and all workers will be in that jvm process and after your application finished that jvm instance will be destroyed. local[*] and spark://localhost:7077 are two separate things.
            • workers do their job using tasks and each task actually is a thread i.e. task = thread. workers have memory and they assign a memory partition to each task in order to they do their job such as reading a part of a dataset into its own memory partition or do a transformation on read data. when a task such as join needs other partitions, shuffle occurs regardless weather the job is ran in cluster or local. if you were in cluster there is a possibility that two tasks were in different machines so Network transmission will be added to other stuffs such as writing the result and then reading by another task. in local if task B needs the data in the partition of the task A, task A should write it down and then task B will read it to do its job

            Source https://stackoverflow.com/questions/67923596

            QUESTION

            Spark error when running TPCDS benchmark datasets - Could not find dsdgen
            Asked 2020-Mar-29 at 08:29

            Im trying to build the TPCDS benchmark datasets, by following this website.

            https://xuechendi.github.io/2019/07/12/Prepare-TPCDS-For-Spark

            when I run this:

            ...

            ANSWER

            Answered 2020-Mar-29 at 08:29
            Could not find dsdgen at /home/troberts/spark-sql-perf/tpcds-kit/tools/dsdgen or //home/troberts/spark-sql-perf/tpcds-kit/tools/dsdgen
            

            Source https://stackoverflow.com/questions/60906687

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install spark-sql-perf

            Use sbt package or sbt assembly to build the library jar. Use sbt +package to build for scala 2.11 and 2.12.
            Before running any query, a dataset needs to be setup by creating a Benchmark object. Generating the TPCDS data requires dsdgen built and available on the machines. We have a fork of dsdgen that you will need. The fork includes changes to generate TPCDS data to stdout, so that this library can pipe them directly to Spark, without intermediate files. Therefore, this library will not work with the vanilla TPCDS kit.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/databricks/spark-sql-perf.git

          • CLI

            gh repo clone databricks/spark-sql-perf

          • sshUrl

            git@github.com:databricks/spark-sql-perf.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link