mrjob | Run MapReduce jobs on Hadoop or Amazon Web Services

by Yelp Python Version: 0.7.4 License: Non-SPDX

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | mrjob Summary

mrjob is a Python library typically used in Big Data, Hadoop applications. mrjob has no bugs, it has no vulnerabilities, it has build file available and it has high support. However mrjob has a Non-SPDX License. You can install using 'pip install mrjob' or download it from GitHub, PyPI.

Run MapReduce jobs on Hadoop or Amazon Web Services

Support

Quality

Security

License

Reuse

Support

mrjob has a highly active ecosystem.

It has 2546 star(s) with 592 fork(s). There are 109 watchers for this library.

It had no major release in the last 12 months.

There are 201 open issues and 1091 have been closed. On average issues are closed in 208 days. There are 2 open pull requests and 0 closed requests.

It has a negative sentiment in the developer community.

The latest version of mrjob is 0.7.4

Quality

mrjob has 0 bugs and 0 code smells.

Security

mrjob has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

mrjob code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

mrjob has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

mrjob releases are not available. You will need to build from source code and install.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

mrjob saves you 23909 person hours of effort in developing the same functionality from scratch.

It has 46706 lines of code, 4504 functions and 231 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed mrjob and discovered the below as its top functions. This is intended to give you an instant insight into mrjob implemented functionality, and help decide if they suit your requirements.

Print the summary of the cluster
Strip microseconds
Calculate a percentage value
Return boto3 - compatible datetime
Return statistics for each cluster
Convert a cluster to a dictionary
Summarize a cluster
Return usage data for cluster
Configure command line arguments
Execute a single step on Spark
Creates an argument parser
Yield cluster clusters
Resolve EMR job options
Sorts lines with sort
Runs ssh on all connected nodes
Terminate the EMR jobs
Return the arguments for the Spark job script
Invoke the task function
Run a single partition
Generate the linkback node
Find jobs that are long running
Count the number of ngrams for each document
Score documents by ngram
List files in a directory
Score multiple documents
Parse a document

Get all kandi verified functions for this library.

mrjob Key Features

No Key Features are available at this moment for mrjob.

mrjob Examples and Code Snippets

mrjob starter kit,Running the code

Python

Lines of Code : 15

License : Permissive (MIT)

Copy

"h1" 520487
"h2" 1444041
"h3" 1958891
"h4" 1149127
"h5" 368755
"h6" 245941
"h7" 1043
"h8" 29
"h10" 3
"h11" 5
"h12" 3
"h13" 4
"h14" 19
"h15" 5
"h21" 1

mrjob starter kit,Setup

Python

Lines of Code : 4

License : Permissive (MIT)

Copy

pip install -r requirements.txt

virtualenv --no-site-packages env/
source env/bin/activate
pip install -r requirements.txt

mrjob starter kit,Running the code,Running locally

Python

Lines of Code : 4

License : Permissive (MIT)

Copy

./get-data.sh

python tag_counter.py --conf-path mrjob.conf --no-output --output-dir out input/test-1.warc
# or 'local' simulates more features of Hadoop such as counters
python tag_counter.py -r local --conf-path mrjob.conf --no-output --output-dir

counting relative frequency in pairs a strips mapreduce

Python

Lines of Code : 66