kandi background
Explore Kits

storm-yarn | yarn enables Storm clusters to be deployed into machines

 by   yahoo Java Version: Current License: Non-SPDX

 by   yahoo Java Version: Current License: Non-SPDX

Download this library from

kandi X-RAY | storm-yarn Summary

storm-yarn is a Java library typically used in Big Data, Hadoop applications. storm-yarn has no bugs, it has no vulnerabilities, it has build file available and it has low support. However storm-yarn has a Non-SPDX License. You can download it from GitHub.
<!-- Copyright (c) 2013 Yahoo! Inc. All Rights Reserved.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • storm-yarn has a low active ecosystem.
  • It has 418 star(s) with 163 fork(s). There are 106 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 33 open issues and 7 have been closed. On average issues are closed in 48 days. There are 10 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of storm-yarn is current.
storm-yarn Support
Best in #Java
Average in #Java
storm-yarn Support
Best in #Java
Average in #Java

quality kandi Quality

  • storm-yarn has no bugs reported.
storm-yarn Quality
Best in #Java
Average in #Java
storm-yarn Quality
Best in #Java
Average in #Java

securitySecurity

  • storm-yarn has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
storm-yarn Security
Best in #Java
Average in #Java
storm-yarn Security
Best in #Java
Average in #Java

license License

  • storm-yarn has a Non-SPDX License.
  • Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.
storm-yarn License
Best in #Java
Average in #Java
storm-yarn License
Best in #Java
Average in #Java

buildReuse

  • storm-yarn releases are not available. You will need to build from source code and install.
  • Build file is available. You can build the component from source.
  • Installation instructions are not available. Examples and code snippets are available.
storm-yarn Reuse
Best in #Java
Average in #Java
storm-yarn Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed storm-yarn and discovered the below as its top functions. This is intended to give you an instant insight into storm-yarn implemented functionality, and help decide if they suit your requirements.

  • Launches an application .
    • Launch a supervisor on a container .
      • Helper method to start the worker thread .
        • Command line entry point .
          • Main run method
            • Creates the log4j2 xml file .
              • Download storm . yaml
                • Stops the nimbus process
                  • Returns a string representation of the build .
                    • Main entry point .

                      Get all kandi verified functions for this library.

                      Get all kandi verified functions for this library.

                      storm-yarn Key Features

                      Andy Feng ([@anfeng](https://github.com/anfeng))

                      Robert Evans ([@revans2](https://github.com/revans2))

                      Derek Dagit ([@d2r](https://github.com/d2r))

                      Nathan Roberts ([@ynroberts](https://github.com/ynroberts))

                      Xin Wang ([@vesense](https://github.com/vesense))

                      We have updated the version of Apache Storm from 0.9.0 to 1.0.1.

                      We have added StormClusterChecker class, in order to monitor the storm cluster. It can adjust the number of supervisors based on the usage of system resources.

                      We have added the function, namely removeSupervisors() in order to monitor resources. Its functionality is just opposite to that of addSupervisors().

                      We have updated the logging framework from logback to log4j2.

                      Make sure [Hadoop YARN](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html) have been properly launched.

                      The storm-on-yarn implementation does not include running Zookeeper on YARN. Make sure the Zookeeper service is independently launched beforehands.

                      Download the source code of storm-on-yarn, e.g., execute the command git clone &lt;link&gt; to get the source code.

                      Edit pom.xml in storm-on-yarn root directory to set the Hadoop version. ![pom.xml](https://github.com/wendyshusband/storm-yarn/blob/storm-1.0.1/image/editpom.png)

                      To package items, please execute the following command under storm-on-yarn root directory. mvn package You will see the following execution messages. &lt;pre&gt;&lt;code&gt;17:57:27.810 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn launch ./conf/storm.yaml --stormZip lib/storm.zip --appname storm-on-yarn-test --output target/appId.txt 17:57:59.681 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn getStormConfig ./conf/storm.yaml --appId application_1372121842369_0001 --output ./lib/storm/storm.yaml 17:58:04.382 [main] INFO com.yahoo.storm.yarn.TestIntegration - ./lib/storm/bin/storm jar lib/storm-starter-0.0.1-SNAPSHOT.jar storm.starter.ExclamationTopology exclamation-topology 17:58:04.382 [main] INFO com.yahoo.storm.yarn.TestIntegration - ./lib/storm/bin/storm kill exclamation-topology 17:58:07.798 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn stopNimbus ./conf/storm.yaml --appId application_1372121842369_0001 17:58:10.131 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn startNimbus ./conf/storm.yaml --appId application_1372121842369_0001 17:58:12.460 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn stopUI ./conf/storm.yaml --appId application_1372121842369_0001 17:58:15.045 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn startUI ./conf/storm.yaml --appId application_1372121842369_0001 17:58:17.390 [main] INFO com.yahoo.storm.yarn.TestIntegration - bin/storm-yarn shutdown ./conf/storm.yaml --appId application_1372121842369_0001 &lt;/code&gt;&lt;/pre&gt; If you want to skip the tests, please add ``-DskipTests ``. mvn package -DskipTests

                      Copy the packaged storm-on-yarn project to Storm Client machine, downloading the project of [storm-1.0.1](http://www.apache.org/dyn/closer.lua/storm/apache-storm-1.0.1/apache-storm-1.0.1.tar.gz). and put the decompressed project of storm-1.0.1 into same directory as the storm-on-yarn project. As shown below, ![stormHome](https://github.com/wendyshusband/storm-yarn/blob/storm-1.0.1/image/stormhome.png) So far, you have put storm-on-yarn and storm in the right place on Storm Client machine. You do not need to start running the Storm cluster, as this will be done by running storm-on-yarn later on.

                      When executing storm-on-yarn commands, commands like "storm-yarn", "storm" and etc., will be frequently called. Therefore, all paths to the bin files containing these executable commands must be included to the PATH environment variable. Hence you are suggested to add storm-1.0.1/bin and $(storm-on-yarn root directory)/bin to your PATH environment variable, like this: ![environment](https://github.com/wendyshusband/storm-yarn/blob/storm-1.0.1/image/environment.png)

                      Storm-on-yarn will replicate a copy of Storm code throughout all the nodes of the YARN cluster using HDFS. However, the location of where to fetch such copy is hard-coded into the Storm-on-YARN client. Therefore,&nbsp;you will have to manually prepare the copy inside HDFS. &nbsp;The storm.zip file (the copy of Storm code) can be stored in HDFS under path "/lib/storm/[storm version]/storm.zip". Following commands illustrate how to upload the storm.zip from the local directory to "/lib/storm/1.0.1" in HDFS. hadoop fs -mkdir /lib hadoop fs -mkdir /lib/storm hadoop fs -mkdir /lib/storm/1.0.1 zip -r storm.zip storm-1.0.1 hadoop fs -put storm.zip /lib/storm/1.0.1/

                      storm-yarn has a number of new options for configuring the storm ApplicationManager (AM), e.g., master.initial-num-supervisors, which stands for the initial number of supervisors to launch with storm. master.container.size-mb, which stands for the size of the containers to request (from the YARN RM).

                      The procedure of "storm-yarn launch" returns an Application ID, which uniquely identifies the newly launched Storm master. This Application ID will be used for accessing the Storm master. To obtain a storm.yaml from the newly launch Storm master, you can run storm-yarn getStormConfig &lt;storm-yarn-config&gt; --appId &lt;Application-ID&gt; --output &lt;storm.yaml&gt; storm.yaml will be retrieved from Storm master.

                      For a full list of storm-yarn commands and options you can run storm-yarn help

                      Storm-on-yarn is now configured to use Netty for communication between spouts and bolts. It's pure JVM based, and thus OS independent. If you are running storm using zeromq (instead of Netty), you need to augment the standard storm.zip file the needed .so files. This can be done with the not ideally named create-tarball.sh script create-tarball.sh storm.zip

                      Ideally the storm.zip file is a world readable file installed by ops so there is only one copy in the distributed cache ever.

                      Community Discussions

                      Trending Discussions on Big Data
                      • How to group unassociated content
                      • Using Spark window with more than one partition when there is no obvious partitioning column
                      • What is the best way to store +3 millions records in Firestore?
                      • spark-shell throws java.lang.reflect.InvocationTargetException on running
                      • For function over multiple rows (i+1)?
                      • Filling up shuffle buffer (this may take a while)
                      • Designing Twitter Search - How to sort large datasets?
                      • Unnest Query optimisation for singular record
                      • handling million of rows for lookup operation using python
                      • split function does not return any observations with large dataset
                      Trending Discussions on Big Data

                      QUESTION

                      How to group unassociated content

                      Asked 2022-Apr-15 at 12:43

                      I have a hive table that records user behavior

                      like this

                      userid behavior timestamp url
                      1 view 1650022601 url1
                      1 click 1650022602 url2
                      1 click 1650022614 url3
                      1 view 1650022617 url4
                      1 click 1650022622 url5
                      1 view 1650022626 url7
                      2 view 1650022628 url8
                      2 view 1650022631 url9

                      About 400GB is added to the table every day.

                      I want to order by timestamp asc, then one 'view' is in a group between another 'view' like this table, the first 3 lines belong to a same group , then subtract the timestamps, like 1650022614 - 1650022601 as the view time.

                      How to do this?

                      i try lag and lead function, or scala like this

                              val pairRDD: RDD[(Int, String)] = record.map(x => {
                                  if (StringUtil.isDateString(x.split("\\s+")(0))) {
                                      partition = partition + 1
                                      (partition, x)
                                  } else {
                                      (partition, x)
                                  }
                              })
                      

                      or java like this

                              LongAccumulator part = spark.sparkContext().longAccumulator("part");
                      
                              JavaPairRDD<Long, Row> pairRDD = spark.sql(sql).coalesce(1).javaRDD().mapToPair((PairFunction<Row, Long, Row>) row -> {
                                  if (row.getAs("event") == "pageview") {
                                      part.add(1L);
                                  }
                              return new Tuple2<>(part.value(), row);
                              });
                      

                      but when a dataset is very large, this code just stupid.

                      save me plz

                      ANSWER

                      Answered 2022-Apr-15 at 12:43

                      If you use dataframe, you can build partition by using window that sum a column whose value is 1 when you change partition and 0 if you don't change partition.

                      You can transform a RDD to a dataframe with sparkSession.createDataframe() method as explained in this answer

                      Back to your problem. In you case, you change partition every time column behavior is equal to "view". So we can start with this condition:

                      import org.apache.spark.sql.functions.col
                      
                      val df1 = df.withColumn("is_view", (col("behavior") === "view").cast("integer"))
                      

                      You get the following dataframe:

                      +------+--------+----------+----+-------+
                      |userid|behavior|timestamp |url |is_view|
                      +------+--------+----------+----+-------+
                      |1     |view    |1650022601|url1|1      |
                      |1     |click   |1650022602|url2|0      |
                      |1     |click   |1650022614|url3|0      |
                      |1     |view    |1650022617|url4|1      |
                      |1     |click   |1650022622|url5|0      |
                      |1     |view    |1650022626|url7|1      |
                      |2     |view    |1650022628|url8|1      |
                      |2     |view    |1650022631|url9|1      |
                      +------+--------+----------+----+-------+
                      

                      Then you use a window ordered by timestamp to sum over the is_view column:

                      import org.apache.spark.sql.expressions.Window
                      import org.apache.spark.sql.functions.sum
                      
                      val df2 = df1.withColumn("partition", sum("is_view").over(Window.partitionBy("userid").orderBy("timestamp")))
                      

                      Which get you the following dataframe:

                      +------+--------+----------+----+-------+---------+
                      |userid|behavior|timestamp |url |is_view|partition|
                      +------+--------+----------+----+-------+---------+
                      |1     |view    |1650022601|url1|1      |1        |
                      |1     |click   |1650022602|url2|0      |1        |
                      |1     |click   |1650022614|url3|0      |1        |
                      |1     |view    |1650022617|url4|1      |2        |
                      |1     |click   |1650022622|url5|0      |2        |
                      |1     |view    |1650022626|url7|1      |3        |
                      |2     |view    |1650022628|url8|1      |1        |
                      |2     |view    |1650022631|url9|1      |2        |
                      +------+--------+----------+----+-------+---------+
                      

                      Then, you just have to aggregate per userid and partition:

                      import org.apache.spark.sql.functions.{max, min}
                      
                      val result = df2.groupBy("userid", "partition")
                        .agg((max("timestamp") - min("timestamp")).as("duration"))
                      

                      And you get the following results:

                      +------+---------+--------+
                      |userid|partition|duration|
                      +------+---------+--------+
                      |1     |1        |13      |
                      |1     |2        |5       |
                      |1     |3        |0       |
                      |2     |1        |0       |
                      |2     |2        |0       |
                      +------+---------+--------+
                      

                      The complete scala code:

                      import org.apache.spark.sql.expressions.Window
                      import org.apache.spark.sql.functions.{col, max, min, sum}
                      
                      val result = df
                        .withColumn("is_view", (col("behavior") === "view").cast("integer"))
                        .withColumn("partition", sum("is_view").over(Window.partitionBy("userid").orderBy("timestamp")))
                        .groupBy("userid", "partition")
                        .agg((max("timestamp") - min("timestamp")).as("duration"))
                      

                      Source https://stackoverflow.com/questions/71883786

                      Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                      Vulnerabilities

                      No vulnerabilities reported

                      Install storm-yarn

                      You can download it from GitHub.
                      You can use storm-yarn like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the storm-yarn component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

                      Support

                      For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

                      DOWNLOAD this Library from

                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      Explore Related Topics

                      Share this Page

                      share link
                      Consider Popular Java Libraries
                      Try Top Libraries by yahoo
                      Compare Java Libraries with Highest Support
                      Compare Java Libraries with Highest Quality
                      Compare Java Libraries with Highest Security
                      Compare Java Libraries with Permissive License
                      Compare Java Libraries with Highest Reuse
                      Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
                      over 430 million Knowledge Items
                      Find more libraries
                      Reuse Solution Kits and Libraries Curated by Popular Use Cases
                      Explore Kits

                      Save this library and start creating your kit

                      • © 2022 Open Weaver Inc.