An open-source columnar data format designed for fast & realtime analytic with big data.
Support
Quality
Security
License
Reuse
Scala examples for learning to use Spark
Support
Quality
Security
License
Reuse
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Support
Quality
Security
License
Reuse
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Support
Quality
Security
License
Reuse
Apache Tez
Support
Quality
Security
License
Reuse
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Support
Quality
Security
License
Reuse
Unicorn for node.js
Support
Quality
Security
License
Reuse
Web tool for Kafka Connect |
Support
Quality
Security
License
Reuse
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Support
Quality
Security
License
Reuse
Read-only mirror of ASF Git Repo for jclouds
Support
Quality
Security
License
Reuse
Oozie - workflow engine for Hadoop
Support
Quality
Security
License
Reuse
Repository holding configuration files for running an HDFS cluster in Kubernetes
Support
Quality
Security
License
Reuse
A scalable, event-driven and event-sourced Java EE application
Support
Quality
Security
License
Reuse
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Support
Quality
Security
License
Reuse
This is a tool which used to manage and monitor ClickHouse database
Support
Quality
Security
License
Reuse
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Support
Quality
Security
License
Reuse
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Support
Quality
Security
License
Reuse
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Support
Quality
Security
License
Reuse
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Support
Quality
Security
License
Reuse
hadoop-common-2.2.0/bin
Support
Quality
Security
License
Reuse
Mirror of Apache Apex core
Support
Quality
Security
License
Reuse
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Support
Quality
Security
License
Reuse
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Support
Quality
Security
License
Reuse
Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker.
Support
Quality
Security
License
Reuse
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Support
Quality
Security
License
Reuse
HDFS-alike in Go. Written in 2014 to learn the language and get a job.
Support
Quality
Security
License
Reuse
Next-generation web analytics processing with Scala, Spark, and Parquet.
Support
Quality
Security
License
Reuse
Harmonious distributed data analysis in Rust.
Support
Quality
Security
License
Reuse
Create clusters of VMs on the cloud and configure them with Ansible.
Support
Quality
Security
License
Reuse
:sunny: 英语学习 :feet: 项目预览:https://jgsrty.github.io 国内访问:https://rtyxmd.gitee.io
Support
Quality
Security
License
Reuse
Prophecis is a one-stop cloud native machine learning platform.
Support
Quality
Security
License
Reuse
Apache hadoop management system
Support
Quality
Security
License
Reuse
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Support
Quality
Security
License
Reuse
Serverless proxy for Spark cluster
Support
Quality
Security
License
Reuse
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Preprocess files based off environment configuration
Support
Quality
Security
License
Reuse
Connect Spark to HBase for reading and writing data with ease
Support
Quality
Security
License
Reuse
Source of shouldideploy.today
Support
Quality
Security
License
Reuse
This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.
Support
Quality
Security
License
Reuse
Scala translations of Robert Sedgewick's Java Algorthms
Support
Quality
Security
License
Reuse
GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework
Support
Quality
Security
License
Reuse
Mirror of Apache Hivemall (incubating)
Support
Quality
Security
License
Reuse
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Support
Quality
Security
License
Reuse
Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive
Support
Quality
Security
License
Reuse
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Support
Quality
Security
License
Reuse
基于Spark的新闻推荐系统,包含爬虫项目、web网站以及spark推荐系统
Support
Quality
Security
License
Reuse
d
demo_11.11_storm-spark-hadoopby liguozhong
Java 279 Version:Current License: No License (No License)
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Support
Quality
Security
License
Reuse
给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Support
Quality
Security
License
Reuse
Spark RDD to read, write and delete from HBase
Support
Quality
Security
License
Reuse
i
indexrby shunfei
An open-source columnar data format designed for fast & realtime analytic with big data.
Java 443Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
LearningSparkby spirom
Scala examples for learning to use Spark
Scala 425Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
storm-yarnby yahoo
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Java 418Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hadoop-ansibleby analytically
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Shell 415Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
Support
Quality
Security
License
Reuse
h
hyperspaceby microsoft
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Scala 408Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
k
kafka-connect-uiby lensesio
Web tool for Kafka Connect |
JavaScript 400Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
B
BigData-In-Practiceby whirlys
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Java 398Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
j
jcloudsby jclouds
Read-only mirror of ASF Git Repo for jclouds
Java 383Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
o
oozieby YahooArchive
Oozie - workflow engine for Hadoop
Java 376Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kubernetes-HDFSby apache-spark-on-k8s
Repository holding configuration files for running an HDFS cluster in Kubernetes
Shell 372Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
scalable-coffee-shopby sdaschner
A scalable, event-driven and event-sourced Java EE application
Java 371Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
connectorsby delta-io
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Java 367Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
ckmanby housepower
This is a tool which used to manage and monitor ClickHouse database
Go 366Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
M
Miscellaneousby LucaCanali
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Jupyter Notebook 354Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
trendingtopicsby datawrangling
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Ruby 352Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tdigestby CamDavidsonPilon
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Python 350Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
y
ytk-learnby yuantiku
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Java 348Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
hadoop-common-2.2.0-binby srccodes
hadoop-common-2.2.0/bin
Shell 347Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
attic-apex-coreby apache
Mirror of Apache Apex core
Java 344Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
H
HashtagCashtagby shafiab
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Scala 341Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
cloudbreakby hortonworks
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Java 340Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
Gather-Deploymentby huseinzol05
Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker.
Jupyter Notebook 337Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
cascadingby cwensel
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Java 337Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
g
golang-distributed-filesystemby ligfx
HDFS-alike in Go. Written in 2014 to learn the language and get a job.
Go 336Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spindleby adobe-research
Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript 335Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
amadeusby constellation-rs
Harmonious distributed data analysis in Rust.
Rust 332Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elasticlusterby elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Python 329Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
j
jgsrty.github.ioby jgsrty
:sunny: 英语学习 :feet: 项目预览:https://jgsrty.github.io 国内访问:https://rtyxmd.gitee.io
Shell 323Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Prophecisby WeBankFinTech
Prophecis is a one-stop cloud native machine learning platform.
Go 317Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
easyhadoopby xianglei
Apache hadoop management system
PHP 313Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
c
cloudflowby lightbend
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Scala 312Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mistby Hydrospheredata
Serverless proxy for Spark cluster
Scala 310Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
morpheusby opencypher
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Scala 307Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
gohadoopby hortonworks
Go 304Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
grunt-preprocessby jsoverson
Preprocess files based off environment configuration
JavaScript 300Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
spark-hbase-connectorby nerdammer
Connect Spark to HBase for reading and writing data with ease
Scala 299Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
shouldideployby baires
Source of shouldideploy.today
TypeScript 297Updated: 2 y ago License: Permissive (WTFPL)
Support
Quality
Security
License
Reuse
D
Dryadby MicrosoftResearch
This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.
C# 296Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Scala-Algorithmsby garyaiki
Scala translations of Robert Sedgewick's Java Algorthms
Scala 296Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
goldenorbby jzachr
GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework
Java 293Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
incubator-hivemallby apache
Mirror of Apache Hivemall (incubating)
Java 292Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
behemothby DigitalPebble
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Java 286Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
cdh-twitter-exampleby cloudera
Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive
Java 284Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
h
hadoop-mini-clustersby sakserv
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Java 284Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
N
News_recommendby luochana
基于Spark的新闻推荐系统,包含爬虫项目、web网站以及spark推荐系统
Scala 282Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
demo_11.11_storm-spark-hadoopby liguozhong
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Java 279Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
alchemyby binglind
给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Java 279Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
hbase-rddby hbase-rdd
Spark RDD to read, write and delete from HBase
Scala 279Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse