Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Support
Quality
Security
License
Reuse
Apache Parquet
Support
Quality
Security
License
Reuse
Embree ray tracing kernels repository.
Support
Quality
Security
License
Reuse
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Support
Quality
Security
License
Reuse
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Support
Quality
Security
License
Reuse
Compile-time Language Integrated Queries for Scala
Support
Quality
Security
License
Reuse
Make stream processing easier! easy-to-use stream processing application development framework and one-stop stream processing operation platform
Support
Quality
Security
License
Reuse
Node.js bindings for librdkafka
Support
Quality
Security
License
Reuse
Deep Learning Pipelines for Apache Spark
Support
Quality
Security
License
Reuse
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Support
Quality
Security
License
Reuse
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.
Support
Quality
Security
License
Reuse
Apache Hadoop docker image
Support
Quality
Security
License
Reuse
TensorFlow binaries supporting AVX, FMA, SSE
Support
Quality
Security
License
Reuse
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Support
Quality
Security
License
Reuse
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Support
Quality
Security
License
Reuse
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Support
Quality
Security
License
Reuse
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
Support
Quality
Security
License
Reuse
DataStax Spark Cassandra Connector
Support
Quality
Security
License
Reuse
深圳地铁大数据客流分析系统🚇🚄🌟
Support
Quality
Security
License
Reuse
Implementations of SIMD instruction sets for systems which don't natively support them.
Support
Quality
Security
License
Reuse
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Support
Quality
Security
License
Reuse
Production Ready Data Integration Product, documentation:
Support
Quality
Security
License
Reuse
Apache Drill is a distributed MPP query layer for self describing data
Support
Quality
Security
License
Reuse
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Support
Quality
Security
License
Reuse
a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header
Support
Quality
Security
License
Reuse
jdbi is designed to provide convenient tabular data access in Java; including templated SQL, parameterized and strongly typed queries, and Streams integration
Support
Quality
Security
License
Reuse
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Support
Quality
Security
License
Reuse
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
Support
Quality
Security
License
Reuse
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Support
Quality
Security
License
Reuse
📽 Highly Optimized Graphics Math (glm) for C
Support
Quality
Security
License
Reuse
Mirror of Apache Kudu
Support
Quality
Security
License
Reuse
A large-scale entity and relation database supporting aggregation of properties
Support
Quality
Security
License
Reuse
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Support
Quality
Security
License
Reuse
Elassandra = Elasticsearch + Apache Cassandra
Support
Quality
Security
License
Reuse
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Support
Quality
Security
License
Reuse
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Support
Quality
Security
License
Reuse
Numpy exercises.
Support
Quality
Security
License
Reuse
Python interface to Hive and Presto. 🐝
Support
Quality
Security
License
Reuse
生产环境的海量数据计算产品,文档地址:
Support
Quality
Security
License
Reuse
Home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
Support
Quality
Security
License
Reuse
Distributed Deep learning with Keras & Spark
Support
Quality
Security
License
Reuse
Apache Atlas
Support
Quality
Security
License
Reuse
The Universal Storage Engine
Support
Quality
Security
License
Reuse
MongoDB Connector for Hadoop
Support
Quality
Security
License
Reuse
s
spark-py-notebooksby jadianes
Jupyter Notebook 
1521
Version:Current
License: Proprietary (Proprietary)
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Support
Quality
Security
License
Reuse
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Support
Quality
Security
License
Reuse
A Scala kernel for Jupyter
Support
Quality
Security
License
Reuse
SIMD for humans
Support
Quality
Security
License
Reuse
Open-source graph database, built for real-time streaming data, compatible with Neo4j.
Support
Quality
Security
License
Reuse
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Support
Quality
Security
License
Reuse
i
incubator-devlakeby apache
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Go
2089
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
e
embreeby embree
Embree ray tracing kernels repository.
C++
2024
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
B
BigDataGuideby MoRan1607
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Java
2023
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
Q
Quicksqlby Qihoo360
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Java
2005
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
q
quillby getquill
Compile-time Language Integrated Queries for Scala
Scala
1992
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
streamparkby streamxhub
Make stream processing easier! easy-to-use stream processing application development framework and one-stop stream processing operation platform
Java
1971
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
node-rdkafkaby Blizzard
Node.js bindings for librdkafka
JavaScript
1969
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spark-deep-learningby databricks
Deep Learning Pipelines for Apache Spark
Python
1968
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
EasyMLby ICT-BDA
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Java
1958
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
esProcby SPLWare
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.
Java
1951
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
docker-hadoopby big-data-europe
Apache Hadoop docker image
Shell
1940
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
t
tensorflow-build-archivedby lakshayg
TensorFlow binaries supporting AVX, FMA, SSE
Shell
1938
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
a
ambariby apache
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Java
1925
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
flinkStreamSQLby DTStack
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Java
1921
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparkby dotnet
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
C#
1905
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
elasticsearch-hadoopby elastic
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
Java
1902
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-cassandra-connectorby datastax
DataStax Spark Cassandra Connector
Scala
1902
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SZT-bigdataby geekyouth
深圳地铁大数据客流分析系统🚇🚄🌟
Scala
1871
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
simdeby simd-everywhere
Implementations of SIMD instruction sets for systems which don't natively support them.
C
1827
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
incubator-gobblinby apache
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Java
1819
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
seatunnelby InterestingLab
Production Ready Data Integration Product, documentation:
Java
1819
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
drillby apache
Apache Drill is a distributed MPP query layer for self describing data
Java
1801
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
oryxby OryxProject
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Java
1798
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
x
xbyakby herumi
a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header
C++
1785
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
j
jdbiby jdbi
jdbi is designed to provide convenient tabular data access in Java; including templated SQL, parameterized and strongly typed queries, and Streams integration
Java
1782
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bookkeeperby apache
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Java
1748
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
x
xsimdby xtensor-stack
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
C++
1747
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
b
byzer-langby byzer-org
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Scala
1731
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cglmby recp
📽 Highly Optimized Graphics Math (glm) for C
C
1711
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
Support
Quality
Security
License
Reuse
G
Gafferby gchq
A large-scale entity and relation database supporting aggregation of properties
Java
1700
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
homeby apachecn
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
CSS
1694
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
elassandraby strapdata
Elassandra = Elasticsearch + Apache Cassandra
Java
1667
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kyuubiby apache
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Scala
1631
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
fugueby fugue-project
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Python
1622
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
Support
Quality
Security
License
Reuse
P
PyHiveby dropbox
Python interface to Hive and Presto. 🐝
Python
1609
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
waterdropby InterestingLab
生产环境的海量数据计算产品,文档地址:
Java
1601
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
prestoby prestosql
Home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
Java
1595
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elephasby maxpumperla
Distributed Deep learning with Keras & Spark
Python
1560
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
T
Support
Quality
Security
License
Reuse
m
mongo-hadoopby mongodb
MongoDB Connector for Hadoop
Java
1521
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-py-notebooksby jadianes
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Jupyter Notebook
1521
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
y
ytsaurusby ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
C++
1520
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
almondby almond-sh
A Scala kernel for Jupyter
Scala
1516
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
m
memgraphby memgraph
Open-source graph database, built for real-time streaming data, compatible with Neo4j.
C++
1494
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
aasby sryza
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Scala
1485
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse