hive-testbench | data generator and set of queries that lets you experiment

 by   hortonworks Java Version: Current License: No License

kandi X-RAY | hive-testbench Summary

kandi X-RAY | hive-testbench Summary

hive-testbench is a Java library typically used in Big Data applications. hive-testbench has no bugs, it has no vulnerabilities and it has low support. However hive-testbench build file is not available. You can download it from GitHub.

The hive-testbench is a data generator and set of queries that lets you experiment with Apache Hive at scale. The testbench allows you to experience base Hive performance on large datasets, and gives an easy way to see the impact of Hive tuning parameters and advanced settings.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              hive-testbench has a low active ecosystem.
              It has 237 star(s) with 193 fork(s). There are 520 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 14 open issues and 9 have been closed. On average issues are closed in 41 days. There are 7 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of hive-testbench is current.

            kandi-Quality Quality

              hive-testbench has 0 bugs and 0 code smells.

            kandi-Security Security

              hive-testbench has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              hive-testbench code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              hive-testbench does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              hive-testbench releases are not available. You will need to build from source code and install.
              hive-testbench has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed hive-testbench and discovered the below as its top functions. This is intended to give you an instant insight into hive-testbench implemented functionality, and help decide if they suit your requirements.
            • Main entry point for sorting .
            • Copy a jar to a local temp file
            • Create the input file .
            • Reads from an input stream and returns it as a String .
            • Main entry point for the command line .
            Get all kandi verified functions for this library.

            hive-testbench Key Features

            No Key Features are available at this moment for hive-testbench.

            hive-testbench Examples and Code Snippets

            No Code Snippets are available at this moment for hive-testbench.

            Community Discussions

            QUESTION

            Error while running hive tpch-setup: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface
            Asked 2022-Feb-24 at 14:17

            I am trying to run hive tpcdh by following the instruction from https://github.com/hortonworks/hive-testbench.git . I am running into the following error. This issue is not seen for tpcds-setup.

            This is not working on CDP Trial 7.3.1, CDH Version: Cloudera Enterprise 6.3.4 but working on Apache Ambari Version 2.6.2.2

            ...

            ANSWER

            Answered 2022-Feb-24 at 14:17

            In hive-testbench/tpch-gen/pom.xml, changed the hadoop version and the issue got resolved

            Source https://stackoverflow.com/questions/71241059

            QUESTION

            How to Benchmark Hive (Azure Interactive Query HDI 4.0)
            Asked 2020-May-04 at 12:59

            Does anyone have a working and tested as of 2020 TPC-DS or TPC-H benchmarks for Azure Interactive Query HDI 4.0 clusters, which uses Hadoop 3.x+?

            I was using https://github.com/hortonworks/hive-testbench but I ran into an error trying to generate data for TPC-H and TPC-DS.

            Interactive Query HDI 4.0 (Hadoop 3.1.1). What could this error be? The step that fails is when it runs the jar file.

            ...

            ANSWER

            Answered 2020-Apr-28 at 06:10

            The MoveTask error is due to an internal sql database limitation. In Azure SQL Database, the incoming parameters can only have 2100 parameters and the benchmarks generate too many partitions.

            Source https://stackoverflow.com/questions/61258886

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install hive-testbench

            All of these steps should be carried out on your Hadoop cluster.
            Step 1: Prepare your environment. In addition to Hadoop and Hive, before you begin ensure ```gcc``` is installed and available on your system path. If you system does not have it, install it using yum or apt-get.
            Step 2: Decide which test suite(s) you want to use. hive-testbench comes with data generators and sample queries based on both the TPC-DS and TPC-H benchmarks. You can choose to use either or both of these benchmarks for experiementation. More information about these benchmarks can be found at the Transaction Processing Council homepage.
            Step 3: Compile and package the appropriate data generator. For TPC-DS, ```./tpcds-build.sh``` downloads, compiles and packages the TPC-DS data generator. For TPC-H, ```./tpch-build.sh``` downloads, compiles and packages the TPC-H data generator.
            Step 4: Decide how much data you want to generate. You need to decide on a "Scale Factor" which represents how much data you will generate. Scale Factor roughly translates to gigabytes, so a Scale Factor of 100 is about 100 gigabytes and one terabyte is Scale Factor 1000. Decide how much data you want and keep it in mind for the next step. If you have a cluster of 4-10 nodes or just want to experiment at a smaller scale, scale 1000 (1 TB) of data is a good starting point. If you have a large cluster, you may want to choose Scale 10000 (10 TB) or more. The notion of scale factor is similar between TPC-DS and TPC-H. If you want to generate a large amount of data, you should use Hive 13 or later. Hive 13 introduced an optimization that allows far more scalable data partitioning. Hive 12 and lower will likely crash if you generate more than a few hundred GB of data and tuning around the problem is difficult. You can generate text or RCFile data in Hive 13 and use it in multiple versions of Hive.
            Step 5: Generate and load the data. The scripts ```tpcds-setup.sh``` and ```tpch-setup.sh``` generate and load data for TPC-DS and TPC-H, respectively. General usage is ```tpcds-setup.sh scale_factor [directory]``` or ```tpch-setup.sh scale_factor [directory]``` Some examples: Build 1 TB of TPC-DS data: ```./tpcds-setup.sh 1000``` Build 1 TB of TPC-H data: ```./tpch-setup.sh 1000``` Build 100 TB of TPC-DS data: ```./tpcds-setup.sh 100000``` Build 30 TB of text formatted TPC-DS data: ```FORMAT=textfile ./tpcds-setup 30000``` Build 30 TB of RCFile formatted TPC-DS data: ```FORMAT=rcfile ./tpcds-setup 30000``` Also check other parameters in setup scripts important one is BUCKET_DATA.
            Step 6: Run queries. More than 50 sample TPC-DS queries and all TPC-H queries are included for you to try. You can use ```hive```, ```beeline``` or the SQL tool of your choice. The testbench also includes a set of suggested settings. This example assumes you have generated 1 TB of TPC-DS data during Step 5: ``` cd sample-queries-tpcds hive -i testbench.settings hive> use tpcds_bin_partitioned_orc_1000; hive> source query55.sql; ``` Note that the database is named based on the Data Scale chosen in step 3. At Data Scale 10000, your database will be named tpcds_bin_partitioned_orc_10000. At Data Scale 1000 it would be named tpch_flat_orc_1000. You can always ```show databases``` to get a list of available databases. Similarly, if you generated 1 TB of TPC-H data during Step 5: ``` cd sample-queries-tpch hive -i testbench.settings hive> use tpch_flat_orc_1000; hive> source tpch_query1.sql; ```

            Support

            If you have questions, comments or problems, visit the [Hortonworks Hive forum](http://hortonworks.com/community/forums/forum/hive/). If you have improvements, pull requests are accepted.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/hortonworks/hive-testbench.git

          • CLI

            gh repo clone hortonworks/hive-testbench

          • sshUrl

            git@github.com:hortonworks/hive-testbench.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link