kandi background
Explore Kits

elasticsearch-hadoop | Elasticsearch real-time search

 by   elastic Java Version: v8.1.3 License: Apache-2.0

 by   elastic Java Version: v8.1.3 License: Apache-2.0

Download this library from

kandi X-RAY | elasticsearch-hadoop Summary

elasticsearch-hadoop is a Java library typically used in Big Data, Spark, Hadoop applications. elasticsearch-hadoop has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can download it from GitHub, Maven.
Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Apache Hive, Apache Pig, Apache Spark and Apache Storm. See project page and documentation for detailed information.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • elasticsearch-hadoop has a highly active ecosystem.
  • It has 1859 star(s) with 961 fork(s). There are 461 watchers for this library.
  • There were 10 major release(s) in the last 12 months.
  • There are 88 open issues and 1067 have been closed. On average issues are closed in 589 days. There are 3 open pull requests and 0 closed requests.
  • It has a positive sentiment in the developer community.
  • The latest version of elasticsearch-hadoop is v8.1.3
elasticsearch-hadoop Support
Best in #Java
Average in #Java
elasticsearch-hadoop Support
Best in #Java
Average in #Java

quality kandi Quality

  • elasticsearch-hadoop has 0 bugs and 0 code smells.
elasticsearch-hadoop Quality
Best in #Java
Average in #Java
elasticsearch-hadoop Quality
Best in #Java
Average in #Java

securitySecurity

  • elasticsearch-hadoop has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
  • elasticsearch-hadoop code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
elasticsearch-hadoop Security
Best in #Java
Average in #Java
elasticsearch-hadoop Security
Best in #Java
Average in #Java

license License

  • elasticsearch-hadoop is licensed under the Apache-2.0 License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
elasticsearch-hadoop License
Best in #Java
Average in #Java
elasticsearch-hadoop License
Best in #Java
Average in #Java

buildReuse

  • elasticsearch-hadoop releases are available to install and integrate.
  • Deployable package is available in Maven.
  • Build file is available. You can build the component from source.
  • Installation instructions are not available. Examples and code snippets are available.
  • It has 66548 lines of code, 5837 functions and 730 files.
  • It has medium code complexity. Code complexity directly impacts maintainability of the code.
elasticsearch-hadoop Reuse
Best in #Java
Average in #Java
elasticsearch-hadoop Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed elasticsearch-hadoop and discovered the below as its top functions. This is intended to give you an instant insight into elasticsearch-hadoop implemented functionality, and help decide if they suit your requirements.

  • Tries to flush the data .
    • Reads the hit as a map .
      • Initialize the extractors .
        • Add http authentication .
          • Generate the JSON message for the ECS template .
            • Runs a keystore command .
              • Writes bulk entry .
                • Writes a tuple to the generator
                  • Assemble the query parameters .
                    • Get the GitInfo for the given root directory .

                      Get all kandi verified functions for this library.

                      Get all kandi verified functions for this library.

                      elasticsearch-hadoop Key Features

                      :elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

                      Stable Release (currently

                      copy iconCopydownload iconDownload
                      <dependency>
                        <groupId>org.elasticsearch</groupId>
                        <artifactId>elasticsearch-hadoop</artifactId>
                        <version>8.0.0</version>
                      </dependency>
                      

                      Development Snapshot

                      copy iconCopydownload iconDownload
                      <dependency>
                        <groupId>org.elasticsearch</groupId>
                        <artifactId>elasticsearch-hadoop</artifactId>
                        <version>8.2.0-SNAPSHOT</version>
                      </dependency>
                      

                      Required

                      copy iconCopydownload iconDownload
                      es.resource=<ES resource location, relative to the host/port specified above>
                      

                      Essential

                      copy iconCopydownload iconDownload
                      es.query=<uri or query dsl query>              # defaults to {"query":{"match_all":{}}}
                      es.nodes=<ES host address>                     # defaults to localhost
                      es.port=<ES REST port>                         # defaults to 9200
                      

                      Reading

                      copy iconCopydownload iconDownload
                      JobConf conf = new JobConf();
                      conf.setInputFormat(EsInputFormat.class);
                      conf.set("es.resource", "radio/artists");
                      conf.set("es.query", "?q=me*");             // replace this with the relevant query
                      ...
                      JobClient.runJob(conf);
                      

                      Writing

                      copy iconCopydownload iconDownload
                      JobConf conf = new JobConf();
                      conf.setOutputFormat(EsOutputFormat.class);
                      conf.set("es.resource", "radio/artists"); // index or indices used for storing data
                      ...
                      JobClient.runJob(conf);
                      

                      Reading

                      copy iconCopydownload iconDownload
                      Configuration conf = new Configuration();
                      conf.set("es.resource", "radio/artists");
                      conf.set("es.query", "?q=me*");             // replace this with the relevant query
                      Job job = new Job(conf)
                      job.setInputFormatClass(EsInputFormat.class);
                      ...
                      job.waitForCompletion(true);
                      

                      Writing

                      copy iconCopydownload iconDownload
                      Configuration conf = new Configuration();
                      conf.set("es.resource", "radio/artists"); // index or indices used for storing data
                      Job job = new Job(conf)
                      job.setOutputFormatClass(EsOutputFormat.class);
                      ...
                      job.waitForCompletion(true);
                      

                      Reading

                      copy iconCopydownload iconDownload
                      CREATE EXTERNAL TABLE artists (
                          id      BIGINT,
                          name    STRING,
                          links   STRUCT<url:STRING, picture:STRING>)
                      STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
                      TBLPROPERTIES('es.resource' = 'radio/artists', 'es.query' = '?q=me*');
                      

                      Writing

                      copy iconCopydownload iconDownload
                      CREATE EXTERNAL TABLE artists (
                          id      BIGINT,
                          name    STRING,
                          links   STRUCT<url:STRING, picture:STRING>)
                      STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
                      TBLPROPERTIES('es.resource' = 'radio/artists');
                      

                      Reading

                      copy iconCopydownload iconDownload
                      A = LOAD 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?q=me*');
                      DUMP A;
                      

                      Writing

                      copy iconCopydownload iconDownload
                      A = LOAD 'src/artists.dat' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray);
                      B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links;
                      STORE B INTO 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage();
                      

                      Reading

                      copy iconCopydownload iconDownload
                      import org.elasticsearch.spark._
                      
                      ..
                      val conf = ...
                      val sc = new SparkContext(conf)
                      sc.esRDD("radio/artists", "?q=me*")
                      

                      Writing

                      copy iconCopydownload iconDownload
                      import org.elasticsearch.spark._
                      
                      val conf = ...
                      val sc = new SparkContext(conf)
                      
                      val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
                      val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
                      
                      sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")
                      

                      Reading

                      copy iconCopydownload iconDownload
                      import org.apache.spark.api.java.JavaSparkContext;
                      import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;
                      
                      SparkConf conf = ...
                      JavaSparkContext jsc = new JavaSparkContext(conf);
                      
                      JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(jsc, "radio/artists");
                      

                      Writing

                      copy iconCopydownload iconDownload
                      import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;
                      
                      SparkConf conf = ...
                      JavaSparkContext jsc = new JavaSparkContext(conf);
                      
                      Map<String, ?> numbers = ImmutableMap.of("one", 1, "two", 2);
                      Map<String, ?> airports = ImmutableMap.of("OTP", "Otopeni", "SFO", "San Fran");
                      
                      JavaRDD<Map<String, ?>> javaRDD = jsc.parallelize(ImmutableList.of(numbers, airports));
                      JavaEsSpark.saveToEs(javaRDD, "spark/docs");
                      

                      Reading

                      copy iconCopydownload iconDownload
                      import org.elasticsearch.storm.EsSpout;
                      
                      TopologyBuilder builder = new TopologyBuilder();
                      builder.setSpout("es-spout", new EsSpout("storm/docs", "?q=me*"), 5);
                      builder.setBolt("bolt", new PrinterBolt()).shuffleGrouping("es-spout");
                      

                      Writing

                      copy iconCopydownload iconDownload
                      import org.elasticsearch.storm.EsBolt;
                      
                      TopologyBuilder builder = new TopologyBuilder();
                      builder.setSpout("spout", new RandomSentenceSpout(), 10);
                      builder.setBolt("es-bolt", new EsBolt("storm/docs"), 5).shuffleGrouping("spout");
                      

                      License

                      copy iconCopydownload iconDownload
                      Licensed to Elasticsearch under one or more contributor
                      license agreements. See the NOTICE file distributed with
                      this work for additional information regarding copyright
                      ownership. Elasticsearch licenses this file to you under
                      the Apache License, Version 2.0 (the "License"); you may
                      not use this file except in compliance with the License.
                      You may obtain a copy of the License at
                      
                      http://www.apache.org/licenses/LICENSE-2.0
                      
                      Unless required by applicable law or agreed to in writing,
                      software distributed under the License is distributed on an
                      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
                      KIND, either express or implied.  See the License for the
                      specific language governing permissions and limitations
                      under the License.
                      

                      Spark 3.0 scala.None$ is not a valid external type for schema of string

                      copy iconCopydownload iconDownload
                      .option("es.field.read.empty.as.null", "no")
                      

                      elasticsearch-hadoop spark connector unable to connect/write using out-of-box ES server setup, & default library settings

                      copy iconCopydownload iconDownload
                      val spark = SparkSession
                          .builder()
                          .appName("writetoes")
                          .master("local[*]")
                          .config("spark.es.nodes","localhost")//give your elastic node ip
                          .config("spark.es.port","9200")//port where its running
                          .getOrCreate()
                      
                      import spark.implicits._
                      
                      val indexDocuments = Seq(
                          AlbumIndex("Led Zeppelin",1969,"Led Zeppelin"),
                          AlbumIndex("Boston",1976,"Boston"),
                          AlbumIndex("Fleetwood Mac", 1979,"Tusk")
                      ).toDF
                      
                      indexDocuments.saveToEs("demoindex/albumindex")
                      
                      SparkSession.builder()
                            .appName("my-app")
                            .config("spark.es.nodes", "localhost")
                            .config("spark.es.port", "9200")
                            .config("spark.es.nodes.discovery", false)
                            .getOrCreate()
                      

                      Integrating Spark with Elasticsearch

                      copy iconCopydownload iconDownload
                      PYSPARK_SUBMIT_ARGS --packages org.elasticsearch:elasticsearch-hadoop:7.5.1 pyspark-shell
                      
                      from pyspark.conf import SparkConf
                      from pyspark.context import SparkContext
                      from pyspark.sql import SparkSession
                      
                      conf = SparkConf()
                      conf.setMaster("local").setAppName("ES Test")
                      conf.set("es.index.auto.create", "true")
                      conf.set("es.nodes", "elasticsearch")  # name of my docker container, you might keep localhost
                      conf.set("es.port", "9200")
                      
                      sc = SparkContext(conf=conf)
                      spark = SparkSession(sc)
                      
                      colnames = [('col_' + str(i+1)) for i in range(11)]
                      df1 = spark._sc.parallelize([
                        [it for it in range(11)], 
                        [it for it in range(1,12)]]
                      ).toDF((colnames))
                      
                      (
                        df1
                        .write
                        .format('es')
                        .option(
                          'es.resource', '%s/%s' % ('<resource_name>', '<table_name>'))
                        .save()
                      )
                      
                      from elasticsearch import Elasticsearch
                      esclient = Elasticsearch(['elasticsearch:9200'])
                      
                      
                      response = esclient.search(
                          index='<resource_name>*',
                          body={
                              "query": {
                                  "match": {
                                      "col1": 1
                                  }
                              },
                              "aggs": {
                                  "test_agg": {
                                      "terms": {
                                          "field": "col1",
                                          "size": 10
                                      }
                                  }
                              }
                          }
                      )
                      
                      PYSPARK_SUBMIT_ARGS --packages org.elasticsearch:elasticsearch-hadoop:7.5.1 pyspark-shell
                      
                      from pyspark.conf import SparkConf
                      from pyspark.context import SparkContext
                      from pyspark.sql import SparkSession
                      
                      conf = SparkConf()
                      conf.setMaster("local").setAppName("ES Test")
                      conf.set("es.index.auto.create", "true")
                      conf.set("es.nodes", "elasticsearch")  # name of my docker container, you might keep localhost
                      conf.set("es.port", "9200")
                      
                      sc = SparkContext(conf=conf)
                      spark = SparkSession(sc)
                      
                      colnames = [('col_' + str(i+1)) for i in range(11)]
                      df1 = spark._sc.parallelize([
                        [it for it in range(11)], 
                        [it for it in range(1,12)]]
                      ).toDF((colnames))
                      
                      (
                        df1
                        .write
                        .format('es')
                        .option(
                          'es.resource', '%s/%s' % ('<resource_name>', '<table_name>'))
                        .save()
                      )
                      
                      from elasticsearch import Elasticsearch
                      esclient = Elasticsearch(['elasticsearch:9200'])
                      
                      
                      response = esclient.search(
                          index='<resource_name>*',
                          body={
                              "query": {
                                  "match": {
                                      "col1": 1
                                  }
                              },
                              "aggs": {
                                  "test_agg": {
                                      "terms": {
                                          "field": "col1",
                                          "size": 10
                                      }
                                  }
                              }
                          }
                      )
                      
                      PYSPARK_SUBMIT_ARGS --packages org.elasticsearch:elasticsearch-hadoop:7.5.1 pyspark-shell
                      
                      from pyspark.conf import SparkConf
                      from pyspark.context import SparkContext
                      from pyspark.sql import SparkSession
                      
                      conf = SparkConf()
                      conf.setMaster("local").setAppName("ES Test")
                      conf.set("es.index.auto.create", "true")
                      conf.set("es.nodes", "elasticsearch")  # name of my docker container, you might keep localhost
                      conf.set("es.port", "9200")
                      
                      sc = SparkContext(conf=conf)
                      spark = SparkSession(sc)
                      
                      colnames = [('col_' + str(i+1)) for i in range(11)]
                      df1 = spark._sc.parallelize([
                        [it for it in range(11)], 
                        [it for it in range(1,12)]]
                      ).toDF((colnames))
                      
                      (
                        df1
                        .write
                        .format('es')
                        .option(
                          'es.resource', '%s/%s' % ('<resource_name>', '<table_name>'))
                        .save()
                      )
                      
                      from elasticsearch import Elasticsearch
                      esclient = Elasticsearch(['elasticsearch:9200'])
                      
                      
                      response = esclient.search(
                          index='<resource_name>*',
                          body={
                              "query": {
                                  "match": {
                                      "col1": 1
                                  }
                              },
                              "aggs": {
                                  "test_agg": {
                                      "terms": {
                                          "field": "col1",
                                          "size": 10
                                      }
                                  }
                              }
                          }
                      )
                      

                      How to understand spark api for elasticsearch

                      copy iconCopydownload iconDownload
                      JavaEsSpark.saveToEs(javaRDD, "spark/docs");
                      

                      Community Discussions

                      Trending Discussions on elasticsearch-hadoop
                      • Spark 3.0 scala.None$ is not a valid external type for schema of string
                      • Invalid timestamp when reading Elasticsearch records with Spark
                      • Py4JJavaError: An error occurred while calling o45.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport
                      • elasticsearch-hadoop spark connector unable to connect/write using out-of-box ES server setup, &amp; default library settings
                      • pyspark - structured streaming into elastic search
                      • Integrating Spark with Elasticsearch
                      • How do I access SparkContext in Dataproc?
                      • Scripted_upsert with Elasticsearch-hadoop impossible?
                      • How to understand spark api for elasticsearch
                      • Spark Elasticsearch basic tuning
                      Trending Discussions on elasticsearch-hadoop

                      QUESTION

                      Spark 3.0 scala.None$ is not a valid external type for schema of string

                      Asked 2021-Apr-30 at 05:45

                      While using elasticsearch-hadoop library for reading elasticsearch index with empty attribute, getting the exception

                      Caused by: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string
                      

                      There is open defect in github for the same with steps to reproduce it: https://github.com/elastic/elasticsearch-hadoop/issues/1635

                      Spark: 3.1.1
                      Elasticsearch-Hadoop : elasticsearch-spark-30_2.12-7.12.0
                      Elasticsearch : 2.3.4

                      ANSWER

                      Answered 2021-Apr-30 at 05:45

                      It worked by setting elasticsearch-hadoop property es.field.read.empty.as.null = no

                      .option("es.field.read.empty.as.null", "no")
                      

                      From Elasticsearch Link:
                      es.field.read.empty.as.null (default yes)
                      Whether elasticsearch-hadoop will treat empty fields as null.

                      Source https://stackoverflow.com/questions/67328780

                      Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                      Vulnerabilities

                      No vulnerabilities reported

                      Install elasticsearch-hadoop

                      You can download it from GitHub, Maven.
                      You can use elasticsearch-hadoop like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the elasticsearch-hadoop component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

                      Support

                      Running against Hadoop 1.x is deprecated in 5.5 and will no longer be tested against in 6.0. ES-Hadoop is developed for and tested against Hadoop 2.x and YARN. More information in this section.

                      DOWNLOAD this Library from

                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      Explore Related Topics

                      Share this Page

                      share link
                      Reuse Pre-built Kits with elasticsearch-hadoop
                      Consider Popular Java Libraries
                      Try Top Libraries by elastic
                      Compare Java Libraries with Highest Support
                      Compare Java Libraries with Highest Quality
                      Compare Java Libraries with Highest Security
                      Compare Java Libraries with Permissive License
                      Compare Java Libraries with Highest Reuse
                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      • © 2022 Open Weaver Inc.