spark-elastic | project combines Apache Spark and Elasticsearch to enable

 by   skrusche63 Scala Version: Current License: No License

kandi X-RAY | spark-elastic Summary

kandi X-RAY | spark-elastic Summary

spark-elastic is a Scala library typically used in Big Data, Spark applications. spark-elastic has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

This project combines Apache Spark and Elasticsearch to enable mining & prediction for Elasticsearch.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              spark-elastic has a low active ecosystem.
              It has 204 star(s) with 73 fork(s). There are 26 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 1 open issues and 0 have been closed. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of spark-elastic is current.

            kandi-Quality Quality

              spark-elastic has no bugs reported.

            kandi-Security Security

              spark-elastic has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              spark-elastic does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              spark-elastic releases are not available. You will need to build from source code and install.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of spark-elastic
            Get all kandi verified functions for this library.

            spark-elastic Key Features

            No Key Features are available at this moment for spark-elastic.

            spark-elastic Examples and Code Snippets

            No Code Snippets are available at this moment for spark-elastic.

            Community Discussions

            QUESTION

            Pulling only required columns in Spark from Cassandra without loading all the columns
            Asked 2020-Jun-19 at 02:21

            Using the spark-elasticsearch connector it is possible to directly load only the required columns from ES to Spark. However, there doesn't seem to exist such a straight forward option to do the same, using the spark-cassandra connector

            Reading data from ES into Spark -- here only required columns are being brought from ES to Spark :

            ...

            ANSWER

            Answered 2020-Jun-18 at 20:17

            Actually, connector should do that itself, without need to explicitly set anything, it's called "predicate pushdown", and cassandra-connector does it, according to documentation:

            The connector will automatically pushdown all valid predicates to Cassandra. The Datasource will also automatically only select columns from Cassandra which are required to complete the query. This can be monitored with the explain command.

            source: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md

            Source https://stackoverflow.com/questions/62457616

            QUESTION

            Spark Elasticsearch basic tuning
            Asked 2020-Jan-04 at 13:02

            How to setup spark for speed?

            I'm running spark-elasticsearch to analyze log data.

            It takes about 5min to do aggregate/join with 2million rows (4gig).

            I'm running 1 master, 3 workers on 3 machines. I increased executor memory to 8g, increased ES nodes from 1 to 3.

            I'm running standalone clusters in client mode (https://becominghuman.ai/real-world-python-workloads-on-spark-standalone-clusters-2246346c7040) I'm not using spark-submit, just running python code after launching master/workers

            Spark seems to launch 3 executors total (which are from 3 workers).

            I'd like to tune spark a little bit to get the most performance with little tuning..

            Which way should I take for optimization?

            1. consider other cluster (yarn, etc .. although I have no idea what they offer, but it seems it's easier to change memory related settings there)
            2. run more executors
            3. analyze the job plan with explain api
            4. accept it takes that much time because you have to download 4gig data (should spark grap all data to run aggregate? such as group by and sum), if applicable, save the data to parquet (?) for further analysis

            Below are my performance related setting

            ...

            ANSWER

            Answered 2020-Jan-04 at 13:02

            It is not always a matter of memory or cluster configuration, I would suggest starting by trying to optimize the query/aggregation you're running before increasing memory.

            You can find here some hints for Spark Performance Tuning. See also Tuning Spark. Make sure the query is optimal and avoid known bad performance as UDFs.

            For executor and memory configuration in your cluster, you have to take into consideration the available memory and cores on all machines to calculate the adequate parameters. Here is an intersting post on best practices.

            Source https://stackoverflow.com/questions/59590216

            QUESTION

            How to join RDDs based on elastic-hadoop
            Asked 2017-Sep-18 at 13:04

            I'm looking for ways to process parallel data from large index, I thought about snapshot the index (to hdfs) and then submit spark jobs to process the records.

            Other way to solve it, is to use elastic with spark.

            My questions:

            1. Can the snapshot API output be text file instead of binary files?
            2. How can I use spark-elastic and perform sub queries for a specific document? (lets say I have index of dogs and then I want to find the bones of each dog)?

            ------EDIT------

            My indexes changed a little, There is Dogs indexes, an dogs-relation index. Dogs index:

            ...

            ANSWER

            Answered 2017-Jan-31 at 15:37

            Pt 1.

            I don't think so, AFAIK the closest option would be to use the scan/scroll API (depend on which ES version you are on): ES v5.1 scroll api. You can 'export' your indexes to text file/s that way.

            Pt 2.

            The simplest way - code-wise - to do what you want (elasticsearch query per dog document), would be to load your dogsRDD using elastic-hadoop, then for the sub-query behaviour, do something like:

            Source https://stackoverflow.com/questions/41552208

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install spark-elastic

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/skrusche63/spark-elastic.git

          • CLI

            gh repo clone skrusche63/spark-elastic

          • sshUrl

            git@github.com:skrusche63/spark-elastic.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link