深圳地铁大数据客流分析系统🚇🚄🌟
Support
Quality
Security
License
Reuse
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Support
Quality
Security
License
Reuse
Production Ready Data Integration Product, documentation:
Support
Quality
Security
License
Reuse
Apache Drill is a distributed MPP query layer for self describing data
Support
Quality
Security
License
Reuse
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Support
Quality
Security
License
Reuse
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Support
Quality
Security
License
Reuse
AI on Hadoop
Support
Quality
Security
License
Reuse
Run Hadoop Custer within Docker Containers
Support
Quality
Security
License
Reuse
Mirror of Apache Kudu
Support
Quality
Security
License
Reuse
Python interface to Hive and Presto. 🐝
Support
Quality
Security
License
Reuse
生产环境的海量数据计算产品,文档地址:
Support
Quality
Security
License
Reuse
Home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
Support
Quality
Security
License
Reuse
Apache Atlas
Support
Quality
Security
License
Reuse
MongoDB Connector for Hadoop
Support
Quality
Security
License
Reuse
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Write any function in minutes – whether to run a simple job that cleans up a database or build a more complex architecture. Creating functions is easier than ever before, whatever your chosen OS, platform, or development method.
Support
Quality
Security
License
Reuse
Apache Parquet
Support
Quality
Security
License
Reuse
HiBench is a big data benchmark suite.
Support
Quality
Security
License
Reuse
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Support
Quality
Security
License
Reuse
TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Support
Quality
Security
License
Reuse
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Support
Quality
Security
License
Reuse
A cluster computing framework for processing large-scale geospatial data
Support
Quality
Security
License
Reuse
Distributed deep learning on Hadoop and Spark clusters.
Support
Quality
Security
License
Reuse
cluster data collected from production clusters in Alibaba for cluster management research
Support
Quality
Security
License
Reuse
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Support
Quality
Security
License
Reuse
Apache Spark 官方文档中文版
Support
Quality
Security
License
Reuse
Scalable, fault-tolerant application-layer sharding for Node.js applications
Support
Quality
Security
License
Reuse
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Support
Quality
Security
License
Reuse
StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
Support
Quality
Security
License
Reuse
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Support
Quality
Security
License
Reuse
Distributed Stream and Batch Processing
Support
Quality
Security
License
Reuse
Python module that allows one to easily write and run Hadoop programs.
Support
Quality
Security
License
Reuse
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Support
Quality
Security
License
Reuse
Make stream processing easier! Flink & Spark development scaffold, The original intention of StreamX is to make the development of Flink easier. StreamX focuses on the management of development phases and tasks. Our ultimate goal is to build a one-stop big data solution integrating stream processing, batch processing, data warehouse and data laker.
Support
Quality
Security
License
Reuse
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Support
Quality
Security
License
Reuse
Apache Accumulo
Support
Quality
Security
License
Reuse
Apache Impala
Support
Quality
Security
License
Reuse
s
spark-scala-tutorialby deanwampler
Jupyter Notebook 966 Version:Current License: Proprietary (Proprietary)
A free tutorial for Apache Spark.
Support
Quality
Security
License
Reuse
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Support
Quality
Security
License
Reuse
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Support
Quality
Security
License
Reuse
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Support
Quality
Security
License
Reuse
Mirror of Apache Sqoop
Support
Quality
Security
License
Reuse
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Support
Quality
Security
License
Reuse
A connector for Spark that allows reading and writing to/from Redis cluster
Support
Quality
Security
License
Reuse
R interface for Apache Spark
Support
Quality
Security
License
Reuse
Distributed machine learning platform
Support
Quality
Security
License
Reuse
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Support
Quality
Security
License
Reuse
U
UserActionAnalyzePlatformby oeljeklaus-you
Java 847 Version:Current License: Permissive (Apache-2.0)
电商用户行为分析大数据平台
Support
Quality
Security
License
Reuse
☕️ Java Security,安全编码和代码审计
Support
Quality
Security
License
Reuse
S
SZT-bigdataby geekyouth
深圳地铁大数据客流分析系统🚇🚄🌟
Scala 1871Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
i
incubator-gobblinby apache
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Java 1819Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
seatunnelby InterestingLab
Production Ready Data Integration Product, documentation:
Java 1819Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
drillby apache
Apache Drill is a distributed MPP query layer for self describing data
Java 1801Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
oryxby OryxProject
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Java 1798Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bookkeeperby apache
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Java 1748Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
X
Support
Quality
Security
License
Reuse
h
hadoop-cluster-dockerby kiwenlau
Run Hadoop Custer within Docker Containers
Shell 1724Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
Support
Quality
Security
License
Reuse
P
PyHiveby dropbox
Python interface to Hive and Presto. 🐝
Python 1609Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
waterdropby InterestingLab
生产环境的海量数据计算产品,文档地址:
Java 1601Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
prestoby prestosql
Home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
Java 1595Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
m
mongo-hadoopby mongodb
MongoDB Connector for Hadoop
Java 1521Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
w
winutilsby cdarlint
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Shell 1481Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
metacatby Netflix
Java 1444Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
Function Appby Microsoft
Write any function in minutes – whether to run a simple job that cleans up a database or build a more complex architecture. Creating functions is easier than ever before, whatever your chosen OS, platform, or development method.
cloud_api 1382Updated: Current License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
H
HiBenchby Intel-bigdata
HiBench is a big data benchmark suite.
Java 1351Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
i
incubator-kyuubiby apache
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Scala 1343Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TBaseby Tencent
TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
C 1321Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
dr-elephantby linkedin
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Java 1302Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
incubator-sedonaby apache
A cluster computing framework for processing large-scale geospatial data
Java 1302Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
CaffeOnSparkby yahoo
Distributed deep learning on Hadoop and Spark clusters.
Jupyter Notebook 1265Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
clusterdataby alibaba
cluster data collected from production clusters in Alibaba for cluster management research
Jupyter Notebook 1256Updated: 1 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
k
killrweatherby killrweather
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Scala 1185Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-doc-zhby apachecn
Apache Spark 官方文档中文版
JavaScript 1184Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
r
ringpop-nodeby uber-node
Scalable, fault-tolerant application-layer sharding for Node.js applications
JavaScript 1177Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
D
Dockerfilesby HariSekhon
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Shell 1147Updated: 1 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
datacollectorby streamsets
StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
Java 1145Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
N
Nagios-Pluginsby HariSekhon
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Python 1101Updated: 1 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hazelcast-jetby hazelcast
Distributed Stream and Batch Processing
Java 1054Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
dumboby klbostee
Python module that allows one to easily write and run Hadoop programs.
Python 1044Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
k
kyloby Teradata
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Java 1041Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
streamxby streamxhub
Make stream processing easier! Flink & Spark development scaffold, The original intention of StreamX is to make the development of Flink easier. StreamX focuses on the management of development phases and tasks. Our ultimate goal is to build a one-stop big data solution integrating stream processing, batch processing, data warehouse and data laker.
Java 1031Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
data-algorithms-bookby mahmoudparsian
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Java 996Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
i
Support
Quality
Security
License
Reuse
s
spark-scala-tutorialby deanwampler
A free tutorial for Apache Spark.
Jupyter Notebook 966Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
C
Coding-Nowby josonle
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Python 951Updated: 2 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
a
adamby bigdatagenomics
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Scala 943Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
B
BigDataGuideby Dr11ft
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Java 935Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
A
Addaxby wgzhao
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Java 912Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-redisby RedisLabs
A connector for Spark that allows reading and writing to/from Redis cluster
Scala 908Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
s
sparklyrby sparklyr
R interface for Apache Spark
R 906Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
v
velesby Samsung
Distributed machine learning platform
C++ 893Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hadoop_studyby realguoshuai
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Java 853Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
U
UserActionAnalyzePlatformby oeljeklaus-you
电商用户行为分析大数据平台
Java 847Updated: 1 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
H
Hello-Java-Secby j3ers3
☕️ Java Security,安全编码和代码审计
Java 841Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse