An open-source columnar data format designed for fast & realtime analytic with big data.
Support
Quality
Security
License
Reuse
Scala examples for learning to use Spark
Support
Quality
Security
License
Reuse
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Support
Quality
Security
License
Reuse
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Support
Quality
Security
License
Reuse
Apache Tez
Support
Quality
Security
License
Reuse
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Support
Quality
Security
License
Reuse
Unicorn for node.js
Support
Quality
Security
License
Reuse
Web tool for Kafka Connect |
Support
Quality
Security
License
Reuse
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Support
Quality
Security
License
Reuse
Read-only mirror of ASF Git Repo for jclouds
Support
Quality
Security
License
Reuse
Oozie - workflow engine for Hadoop
Support
Quality
Security
License
Reuse
Repository holding configuration files for running an HDFS cluster in Kubernetes
Support
Quality
Security
License
Reuse
A scalable, event-driven and event-sourced Java EE application
Support
Quality
Security
License
Reuse
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Support
Quality
Security
License
Reuse
This is a tool which used to manage and monitor ClickHouse database
Support
Quality
Security
License
Reuse
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Support
Quality
Security
License
Reuse
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Support
Quality
Security
License
Reuse
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Support
Quality
Security
License
Reuse
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Support
Quality
Security
License
Reuse
hadoop-common-2.2.0/bin
Support
Quality
Security
License
Reuse
Mirror of Apache Apex core
Support
Quality
Security
License
Reuse
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Support
Quality
Security
License
Reuse
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Support
Quality
Security
License
Reuse
Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker.
Support
Quality
Security
License
Reuse
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Support
Quality
Security
License
Reuse
HDFS-alike in Go. Written in 2014 to learn the language and get a job.
Support
Quality
Security
License
Reuse
Next-generation web analytics processing with Scala, Spark, and Parquet.
Support
Quality
Security
License
Reuse
Harmonious distributed data analysis in Rust.
Support
Quality
Security
License
Reuse
Create clusters of VMs on the cloud and configure them with Ansible.
Support
Quality
Security
License
Reuse
:sunny: 英语学习 :feet: 项目预览:https://jgsrty.github.io 国内访问:https://rtyxmd.gitee.io
Support
Quality
Security
License
Reuse
Prophecis is a one-stop cloud native machine learning platform.
Support
Quality
Security
License
Reuse
Apache hadoop management system
Support
Quality
Security
License
Reuse
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Support
Quality
Security
License
Reuse
Serverless proxy for Spark cluster
Support
Quality
Security
License
Reuse
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Preprocess files based off environment configuration
Support
Quality
Security
License
Reuse
Connect Spark to HBase for reading and writing data with ease
Support
Quality
Security
License
Reuse
Source of shouldideploy.today
Support
Quality
Security
License
Reuse
This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.
Support
Quality
Security
License
Reuse
Scala translations of Robert Sedgewick's Java Algorthms
Support
Quality
Security
License
Reuse
GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework
Support
Quality
Security
License
Reuse
Mirror of Apache Hivemall (incubating)
Support
Quality
Security
License
Reuse
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Support
Quality
Security
License
Reuse
Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive
Support
Quality
Security
License
Reuse
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Support
Quality
Security
License
Reuse
基于Spark的新闻推荐系统,包含爬虫项目、web网站以及spark推荐系统
Support
Quality
Security
License
Reuse
d
demo_11.11_storm-spark-hadoopby liguozhong
Java 
279
Version:Current
License: No License (No License)
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Support
Quality
Security
License
Reuse
给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Support
Quality
Security
License
Reuse
Spark RDD to read, write and delete from HBase
Support
Quality
Security
License
Reuse
i
indexrby shunfei
An open-source columnar data format designed for fast & realtime analytic with big data.
Java
443
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
LearningSparkby spirom
Scala examples for learning to use Spark
Scala
425
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
storm-yarnby yahoo
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Java
418
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hadoop-ansibleby analytically
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Shell
415
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
Support
Quality
Security
License
Reuse
h
hyperspaceby microsoft
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Scala
408
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
k
kafka-connect-uiby lensesio
Web tool for Kafka Connect |
JavaScript
400
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
B
BigData-In-Practiceby whirlys
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Java
398
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
j
jcloudsby jclouds
Read-only mirror of ASF Git Repo for jclouds
Java
383
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
o
oozieby YahooArchive
Oozie - workflow engine for Hadoop
Java
376
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kubernetes-HDFSby apache-spark-on-k8s
Repository holding configuration files for running an HDFS cluster in Kubernetes
Shell
372
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
scalable-coffee-shopby sdaschner
A scalable, event-driven and event-sourced Java EE application
Java
371
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
connectorsby delta-io
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Java
367
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
ckmanby housepower
This is a tool which used to manage and monitor ClickHouse database
Go
366
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
M
Miscellaneousby LucaCanali
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Jupyter Notebook
354
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
trendingtopicsby datawrangling
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Ruby
352
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
t
tdigestby CamDavidsonPilon
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Python
350
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
y
ytk-learnby yuantiku
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Java
348
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
h
hadoop-common-2.2.0-binby srccodes
hadoop-common-2.2.0/bin
Shell
347
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
a
attic-apex-coreby apache
Mirror of Apache Apex core
Java
344
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
H
HashtagCashtagby shafiab
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Scala
341
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
c
cloudbreakby hortonworks
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Java
340
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
Gather-Deploymentby huseinzol05
Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker.
Jupyter Notebook
337
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
c
cascadingby cwensel
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Java
337
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
g
golang-distributed-filesystemby ligfx
HDFS-alike in Go. Written in 2014 to learn the language and get a job.
Go
336
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
spindleby adobe-research
Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript
335
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
amadeusby constellation-rs
Harmonious distributed data analysis in Rust.
Rust
332
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elasticlusterby elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Python
329
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
j
jgsrty.github.ioby jgsrty
:sunny: 英语学习 :feet: 项目预览:https://jgsrty.github.io 国内访问:https://rtyxmd.gitee.io
Shell
323
Updated: 3 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
P
Prophecisby WeBankFinTech
Prophecis is a one-stop cloud native machine learning platform.
Go
317
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
easyhadoopby xianglei
Apache hadoop management system
PHP
313
Updated: 2 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
c
cloudflowby lightbend
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Scala
312
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mistby Hydrospheredata
Serverless proxy for Spark cluster
Scala
310
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
morpheusby opencypher
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Scala
307
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
gohadoopby hortonworks
Go
304
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
grunt-preprocessby jsoverson
Preprocess files based off environment configuration
JavaScript
300
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
spark-hbase-connectorby nerdammer
Connect Spark to HBase for reading and writing data with ease
Scala
299
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
shouldideployby baires
Source of shouldideploy.today
TypeScript
297
Updated: 2 y ago
License: Permissive (WTFPL)
Support
Quality
Security
License
Reuse
D
Dryadby MicrosoftResearch
This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN.
C#
296
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Scala-Algorithmsby garyaiki
Scala translations of Robert Sedgewick's Java Algorthms
Scala
296
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
g
goldenorbby jzachr
GoldenOrb is an open-source implementation of Pregel, Google's graph processing framework
Java
293
Updated: 5 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
incubator-hivemallby apache
Mirror of Apache Hivemall (incubating)
Java
292
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
behemothby DigitalPebble
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Java
286
Updated: 4 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
cdh-twitter-exampleby cloudera
Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive
Java
284
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
h
hadoop-mini-clustersby sakserv
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Java
284
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
N
News_recommendby luochana
基于Spark的新闻推荐系统,包含爬虫项目、web网站以及spark推荐系统
Scala
282
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
d
demo_11.11_storm-spark-hadoopby liguozhong
hadoop_storm_spark结合实验的例子,模拟淘宝双11节,根据订单详细信息,汇总出总销售量,各个省份销售排行,以及后期的SQL分析,数据分析,数据挖掘等。 --------大概流程------- 第一阶段(storm实时报表) 第二阶段(离线报表)第三阶段(大规模订单即席查询,和多维度查询) 第四阶段(数据挖掘和图计算)
Java
279
Updated: 4 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
a
alchemyby binglind
给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Java
279
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
hbase-rddby hbase-rdd
Spark RDD to read, write and delete from HBase
Scala
279
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse