winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Support
Quality
Security
License
Reuse
A curated list of awesome Apache Spark packages and resources.
Support
Quality
Security
License
Reuse
A cluster computing framework for processing large-scale geospatial data
Support
Quality
Security
License
Reuse
MLeap: Deploy ML Pipelines to Production
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Support
Quality
Security
License
Reuse
Lightning-fast cluster computing in Java, Scala and Python.
Support
Quality
Security
License
Reuse
Base classes to use when writing tests with Spark
Support
Quality
Security
License
Reuse
Kolo(former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Support
Quality
Security
License
Reuse
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Support
Quality
Security
License
Reuse
Write any function in minutes – whether to run a simple job that cleans up a database or build a more complex architecture. Creating functions is easier than ever before, whatever your chosen OS, platform, or development method.
Support
Quality
Security
License
Reuse
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Support
Quality
Security
License
Reuse
High performance data store solution
Support
Quality
Security
License
Reuse
Apache Parquet
Support
Quality
Security
License
Reuse
HiBench is a big data benchmark suite.
Support
Quality
Security
License
Reuse
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Support
Quality
Security
License
Reuse
An open optimized software library project for the ARM® Architecture
Support
Quality
Security
License
Reuse
DataStax Python Driver for Apache Cassandra
Support
Quality
Security
License
Reuse
TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Support
Quality
Security
License
Reuse
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
Support
Quality
Security
License
Reuse
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Support
Quality
Security
License
Reuse
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Support
Quality
Security
License
Reuse
A cluster computing framework for processing large-scale geospatial data
Support
Quality
Security
License
Reuse
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
Support
Quality
Security
License
Reuse
SIMD Vector Classes for C++
Support
Quality
Security
License
Reuse
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Support
Quality
Security
License
Reuse
Distributed deep learning on Hadoop and Spark clusters.
Support
Quality
Security
License
Reuse
cluster data collected from production clusters in Alibaba for cluster management research
Support
Quality
Security
License
Reuse
🔥🔥 BigData 💥 大数据 💥大数据AllData平台,通过二开大数据BigData生态组件,以及大数据BigData采集、大数据BigData存储、大数据BigData计算、大数据BigData开发来建设开源社区大数据BigData平台。联系作者: https://docs.qq.com/doc/DVFVMYUp6cFhSRVJs
Support
Quality
Security
License
Reuse
SQL-based streaming analytics platform at scale
Support
Quality
Security
License
Reuse
fastest text uwuifier in the west
Support
Quality
Security
License
Reuse
Jupyter magics and kernels for working with remote Spark clusters
Support
Quality
Security
License
Reuse
A script to easily create and destroy an Apache Cassandra cluster on localhost
Support
Quality
Security
License
Reuse
p
pyspark-example-projectby AlexIoannides
Python 1195 Version:Current License: No License (No License)
Example project implementing best practices for PySpark ETL jobs and applications.
Support
Quality
Security
License
Reuse
DataStax Node.js Driver for Apache Cassandra
Support
Quality
Security
License
Reuse
Dremio - the missing link in modern data
Support
Quality
Security
License
Reuse
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Support
Quality
Security
License
Reuse
Apache Spark 官方文档中文版
Support
Quality
Security
License
Reuse
RisingWave: the next-generation streaming database in the cloud.
Support
Quality
Security
License
Reuse
Scalable, fault-tolerant application-layer sharding for Node.js applications
Support
Quality
Security
License
Reuse
A library for time series analysis on Apache Spark
Support
Quality
Security
License
Reuse
Apache InLong - a one-stop integration framework for massive data
Support
Quality
Security
License
Reuse
Enoki: structured vectorization and differentiation on modern processor architectures
Support
Quality
Security
License
Reuse
SQL-based streaming analytics platform at scale
Support
Quality
Security
License
Reuse
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Support
Quality
Security
License
Reuse
StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
Support
Quality
Security
License
Reuse
PySpark + Scikit-learn = Sparkit-learn
Support
Quality
Security
License
Reuse
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Support
Quality
Security
License
Reuse
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Support
Quality
Security
License
Reuse
A simple and fast linear algebra library for games and graphics
Support
Quality
Security
License
Reuse
w
winutilsby cdarlint
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Shell 1481Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
awesome-sparkby awesome-spark
A curated list of awesome Apache Spark packages and resources.
Shell 1467Updated: 2 y ago License: Permissive (CC0-1.0)
Support
Quality
Security
License
Reuse
s
sedonaby apache
A cluster computing framework for processing large-scale geospatial data
Java 1457Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mleapby combust
MLeap: Deploy ML Pipelines to Production
Scala 1449Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
metacatby Netflix
Java 1444Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kfrby kfrlib
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
C++ 1425Updated: 2 y ago License: Strong Copyleft (GPL-2.0)
Support
Quality
Security
License
Reuse
s
sparkby mesos
Lightning-fast cluster computing in Java, Scala and Python.
Scala 1423Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-testing-baseby holdenk
Base classes to use when writing tests with Spark
Scala 1414Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kolo-langby byzer-org
Kolo(former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
JavaScript 1400Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
optimusby hi-primus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Python 1383Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
Function Appby Microsoft
Write any function in minutes – whether to run a simple job that cleans up a database or build a more complex architecture. Creating functions is easier than ever before, whatever your chosen OS, platform, or development method.
cloud_api 1382Updated: Current License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
b
bitsailby bytedance
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Java 1363Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
carbondataby apache
High performance data store solution
Scala 1359Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
H
HiBenchby Intel-bigdata
HiBench is a big data benchmark suite.
Java 1351Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
i
incubator-kyuubiby apache
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Scala 1343Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
N
Ne10by projectNe10
An open optimized software library project for the ARM® Architecture
C 1340Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
python-driverby datastax
DataStax Python Driver for Apache Cassandra
Python 1335Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TBaseby Tencent
TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
C 1321Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
f
fluidby fluid-cloudnative
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
Go 1320Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
LakeSoulby lakesoul-io
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Scala 1303Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
dr-elephantby linkedin
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Java 1302Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
incubator-sedonaby apache
A cluster computing framework for processing large-scale geospatial data
Java 1302Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
geomesaby locationtech
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
Scala 1302Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
V
Support
Quality
Security
License
Reuse
L
LakeSoulby meta-soul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Scala 1298Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
CaffeOnSparkby yahoo
Distributed deep learning on Hadoop and Spark clusters.
Jupyter Notebook 1265Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
clusterdataby alibaba
cluster data collected from production clusters in Alibaba for cluster management research
Jupyter Notebook 1256Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
alldataby alldatacenter
🔥🔥 BigData 💥 大数据 💥大数据AllData平台,通过二开大数据BigData生态组件,以及大数据BigData采集、大数据BigData存储、大数据BigData计算、大数据BigData开发来建设开源社区大数据BigData平台。联系作者: https://docs.qq.com/doc/DVFVMYUp6cFhSRVJs
Java 1236Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
A
AthenaXby uber-archive
SQL-based streaming analytics platform at scale
Java 1219Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
u
uwuby Daniel-Liu-c0deb0t
fastest text uwuifier in the west
Rust 1215Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
sparkmagicby jupyter-incubator
Jupyter magics and kernels for working with remote Spark clusters
Python 1213Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
ccmby riptano
A script to easily create and destroy an Apache Cassandra cluster on localhost
Python 1202Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pyspark-example-projectby AlexIoannides
Example project implementing best practices for PySpark ETL jobs and applications.
Python 1195Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
n
nodejs-driverby datastax
DataStax Node.js Driver for Apache Cassandra
JavaScript 1192Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
dremio-ossby dremio
Dremio - the missing link in modern data
Java 1190Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
killrweatherby killrweather
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
Scala 1185Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-doc-zhby apachecn
Apache Spark 官方文档中文版
JavaScript 1184Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
r
risingwaveby singularity-data
RisingWave: the next-generation streaming database in the cloud.
Rust 1183Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
r
ringpop-nodeby uber-node
Scalable, fault-tolerant application-layer sharding for Node.js applications
JavaScript 1177Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spark-timeseriesby sryza
A library for time series analysis on Apache Spark
Scala 1175Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
inlongby apache
Apache InLong - a one-stop integration framework for massive data
Java 1174Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
enokiby mitsuba-renderer
Enoki: structured vectorization and differentiation on modern processor architectures
C++ 1172Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
AthenaXby uber
SQL-based streaming analytics platform at scale
Java 1147Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
Dockerfilesby HariSekhon
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
Shell 1147Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
datacollectorby streamsets
StreamSets Data Collector - Continuous big data and cloud platform ingest infrastructure
Java 1145Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparkit-learnby lensacom
PySpark + Scikit-learn = Sparkit-learn
Python 1135Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elephant-birdby twitter
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Java 1132Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
Taierby DTStack
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Java 1129Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
g
glam-rsby bitshifter
A simple and fast linear algebra library for games and graphics
Rust 1116Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse