oozie | Oozie - workflow engine for Hadoop

by YahooArchive Java Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | oozie Summary

oozie is a Java library typically used in Big Data, Spark, Hadoop applications. oozie has no bugs, it has build file available, it has a Permissive License and it has high support. However oozie has 3 vulnerabilities. You can download it from GitHub.

Oozie - workflow engine for Hadoop

Support

Quality

Security

License

Reuse

Support

oozie has a highly active ecosystem.

It has 376 star(s) with 158 fork(s). There are 46 watchers for this library.

It had no major release in the last 6 months.

There are 104 open issues and 280 have been closed. On average issues are closed in 53 days. There are 14 open pull requests and 0 closed requests.

It has a negative sentiment in the developer community.

The latest version of oozie is current.

Quality

oozie has 0 bugs and 0 code smells.

Security

oozie has 3 vulnerability issues reported (0 critical, 0 high, 3 medium, 0 low).

oozie code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

oozie is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

oozie releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

oozie saves you 75217 person hours of effort in developing the same functionality from scratch.

It has 83740 lines of code, 5484 functions and 740 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed oozie and discovered the below as its top functions. This is intended to give you an instant insight into oozie implemented functionality, and help decide if they suit your requirements.

Performs a map command and writes it to the output collector
Fail the launcher
Prints the contents of the current directory
Sets the main main configuration file
Executes the workflows
Creates a WorkflowBean from an array of objects
Launch Pig action
Gets the job ids from the specified log file
Runs Pig action
Extract values from the configuration
Initialize the log service
Lite execute method
Execute the action
Submit a job
Creates workflow
Execute workflow
Executes the workflow
Performs job related actions
Submit a coordinator
Create a workflow
Initializes the jaservice
Executes workflow
Start an action
Starts a signal command
Performs a POST operation
Submit a Coordinator

Get all kandi verified functions for this library.

oozie Key Features

No Key Features are available at this moment for oozie.

oozie Examples and Code Snippets

No Code Snippets are available at this moment for oozie.

Community Discussions

Trending Discussions on oozie

Snowflake cancel query when ABORT_DETACHED_QUERY=true on session level

Oozie coordinator get day of the week

Execute a job only when the previous has finished

Write HSSFWorkbook to hdfs

Oozie: why error log shows in mapreduce job, not in Spark job?

groovyx.net.http is missing RESTClient

failure to login: for principal: jztwk javax.security.auth.login.LoginException: Unable to obtain password from user

Oozie's job: yarn returns Error starting action [hive-4548]

Oozie variable cannot be resolved

Apache Oozie throws ClassNotFoundException (org.apache.hadoop.conf.Configuration) during startup

QUESTION

Snowflake cancel query when ABORT_DETACHED_QUERY=true on session level

Asked 2022-Mar-23 at 08:18

I am using Snowflake JDBC to execute multi-statement scripts on Snowflake. My application is started by a Oozie job on a hadoop cluster(in migration phase). The requirement here is, when Oozie job is killed and there by killing the running application instance, the query that was submitted using JDBC should get cancelled by Snowflake.

I have added ABORT_DETACHED_QUERY=true to the JDBC connection url which looks like jdbc:snowflake://.snowflakecomputing.com/?warehouse=&db=&schema=&ABORT_DETACHED_QUERY=true.

Even after 25 mins, the script execution is not cancelled by Snowflake. I tried to find out the underlying problems. I tried to query the session on SESSIONS view using session-id but it was not there. I also tried to query for active connections but could not find a way to do it.

So I have two queries,

Is it the right way to configure ABORT_DETACHED_QUERY parameter?
How do you check for active JDBC connections on Snowflake, because SHOW CONNECTIONS didn't return any connection to my application?

Also, I am using commons-dbcp BasicDataSource as datasource manager, commons-dbutils to submit query using int QueryRunner.execute(String) method.

...

ANSWER

Answered 2022-Mar-23 at 08:18

This is a session parameter not a connection string parameter therefore the proper way to set it is by using an ALTER command:

Source https://stackoverflow.com/questions/71581781

QUESTION

Oozie coordinator get day of the week

Asked 2022-Mar-11 at 09:47

I am trying to create a condition in my Oozie workflow, where an action should be executed only on mondays (at the end of the workflow).

So far I added a decision node in the workflow, and the current date as parameter in the coordinator, and I need to test the day of the week.

coordinator.xml

...

ANSWER

Answered 2022-Mar-11 at 09:47

I found a solution by using wf:actionData in a decision node :

workflow.sh

Source https://stackoverflow.com/questions/71422257

QUESTION

Execute a job only when the previous has finished

Asked 2022-Jan-20 at 17:46

I have a bash script which has a sqoop exec and after it three impala commands. I want to run it but only when the previous execution has finished. Is this possible to be done in cronjob or in oozie ?

...

ANSWER

Answered 2022-Jan-20 at 17:46

I assume you are in a linux environment so you should be able to use the run-one command ( ubuntu run-one ) in conjunction with you bash script in a crontab.

e.g.

Source https://stackoverflow.com/questions/70790847

QUESTION

Write HSSFWorkbook to hdfs

Asked 2021-Dec-02 at 08:24

I need to parse a csv file to xml and write it to hdfs. I managed to do the first part successfully, but get errors when writing. Here's the code.

...

ANSWER

Answered 2021-Dec-02 at 08:24

In the end I couldn't find what's wrong with my dependencies. I rewrote this whole thing in spark using the following dependecies.

Source https://stackoverflow.com/questions/70144709

QUESTION

Oozie: why error log shows in mapreduce job, not in Spark job?

Asked 2021-Sep-10 at 17:56

I submitted a oozie workflow that is a shell action, it calls spark-submit to run a Spring boot application which is a jar file. It runs on yarn in client mode.

However, I found that the all Spring log is inside oozie mapreduce job in yarn, not in Spark job itself. I don't understand why?

...

ANSWER

Answered 2021-Sep-10 at 17:56

The oozie shell action is nothing but a map only job. By default, you spark job prints all log to console (from where it is being run). Given that that spark job is being submitted from within the oozie action, the logs are collated & visible within the shell action logs.

Source https://stackoverflow.com/questions/68927054

QUESTION

groovyx.net.http is missing RESTClient

Asked 2021-Aug-16 at 03:17

I'm trying to figure out how to use Spock for REST tests. The tutorials and examples I've found all use RESTClient.

However I'm getting stuck with not being able to resolve RESTClient. The examples use import groovyx.net.http.RESTClient. According to the examples this RESTClient seems like it should be included in org.codehaus.groovy.modules.http-builder, which I've found in the MVN repository at https://mvnrepository.com/artifact/org.codehaus.groovy/http-builder/0.4.1.

0.4.1 seems to be the most recent version available there. This is my build.gradle:

...

ANSWER

Answered 2021-Aug-16 at 03:17

I think you want this dependency, which is available on Maven Central: https://search.maven.org/artifact/org.codehaus.groovy.modules.http-builder/http-builder/0.7.1/jar

Source https://stackoverflow.com/questions/68791642

QUESTION

failure to login: for principal: jztwk javax.security.auth.login.LoginException: Unable to obtain password from user

Asked 2021-Jun-23 at 03:23

I use CDH 6.3.2

I sumbit oozie job with spark,but I get a error

...

ANSWER

Answered 2021-Jun-18 at 07:32

my workflow.xml

Source https://stackoverflow.com/questions/68031061

QUESTION

Oozie's job: yarn returns Error starting action [hive-4548]

Asked 2021-May-27 at 15:45

There is a cluster with Cloudera including Hue. My need is the task for scheduler which send HQL-request to Hive. I'm trying to do task for oozie by web-constructor integrated in Hue.

My HQL request's file (request.hql):

...

ANSWER

Answered 2021-May-27 at 15:45

If attached execution plan displays whole content of the workflow.xml then you need to add start, end and kill to it. Also hive action requires parameter with path to a Hive settings (usually it stores at /etc/hive/conf/hive-site.xml).

Usually variables of the script are stored in a job.properties file, so parameters like jobTraker and nameNode are usually there. Also, you can define your own parameters in the block in the beginning of the workflow.xml.

Finally it should be something like that.

Source https://stackoverflow.com/questions/67685085

QUESTION

Oozie variable cannot be resolved

Asked 2021-May-19 at 12:52

Having an issue to pass a variable in decision Node. The parameter is declared under global config

...

ANSWER

Answered 2021-May-19 at 12:52

You should not declare parameters for the workflow.xml in global configuration block. In your case you can make inline, moreover you have to change '+' to concat()

Source https://stackoverflow.com/questions/66997692

QUESTION

Apache Oozie throws ClassNotFoundException (org.apache.hadoop.conf.Configuration) during startup

Asked 2021-May-09 at 23:25

I built the Apache Oozie 5.2.1 from the source code in my MacOS and currently having trouble running it. The ClassNotFoundException indicates a missing class org.apache.hadoop.conf.Configuration but it is available in both libext/ and the Hadoop file system.

I followed the 1st approach given here to copy Hadoop libraries to Oozie binary distro. https://oozie.apache.org/docs/5.2.1/DG_QuickStart.html

I downloaded Hadoop 2.6.0 distro and copied all the jars to libext before running Oozie in addition to other configs, etc as specified in the following blog.

https://www.trytechstuff.com/how-to-setup-apache-hadoop-2-6-0-version-single-node-on-ubuntu-mac/

This is how I installed Hadoop in MacOS. Hadoop 2.6.0 is working fine. http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html

This looks pretty basic issue but could not find why the jar/class in libext is not loaded.

OS: MacOS 10.14.6 (Mojave)
JAVA: 1.8.0_191
Hadoop: 2.6.0 (running in the Mac)

...

ANSWER

Answered 2021-May-09 at 23:25

I was able to sort the above issue and few other ClassNotFoundException by copying the following jar files from extlib to lib. Both folder are in oozie_install/oozie-5.2.1.

libext/hadoop-common-2.6.0.jar
libext/commons-configuration-1.6.jar
libext/hadoop-mapreduce-client-core-2.6.0.jar
libext/hadoop-hdfs-2.6.0.jar

While I am not sure how many more jars need to be moved from libext to lib while I try to run an example workflow/job in oozie. This fix brought up Oozie web site at http://localhost:11000/oozie/

I am also not sure why Oozie doesn't load the libraries in the libext/ folder.

Source https://stackoverflow.com/questions/67462448

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

CVE-2018-11799 MEDIUM

Vulnerability allows a user of Apache Oozie 3.1.3-incubating to 5.0.0 to impersonate other users. The malicious user can construct an XML that results workflows running in other user's name.

https://lists.apache.org/thread.html/347e7a8cb86014b7ca37e49eb00b8d088203bdc0bcfb4799f8e5955a@%3Cuser.oozie.apache.org%3E

http://www.securityfocus.com/bid/106266

CVE-2017-15712 MEDIUM

Vulnerability allows a user of Apache Oozie 3.1.3-incubating to 4.3.0 and 5.0.0-beta1 to expose private files on the Oozie server process. The malicious user can construct a workflow XML file containing XML directives and configuration that reference sensitive files on the Oozie server host.

https://lists.apache.org/thread.html/4606709264fe7cb0285e2a12aca2d01a06b14cd58791c9fc32abd216@%3Cdev.oozie.apache.org%3E

http://www.securityfocus.com/bid/103102

Install oozie

You can download it from GitHub.
You can use oozie like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the oozie component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: