Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Support
Quality
Security
License
Reuse
Apache Parquet
Support
Quality
Security
License
Reuse
Embree ray tracing kernels repository.
Support
Quality
Security
License
Reuse
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Support
Quality
Security
License
Reuse
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Support
Quality
Security
License
Reuse
Compile-time Language Integrated Queries for Scala
Support
Quality
Security
License
Reuse
Make stream processing easier! easy-to-use stream processing application development framework and one-stop stream processing operation platform
Support
Quality
Security
License
Reuse
Node.js bindings for librdkafka
Support
Quality
Security
License
Reuse
Deep Learning Pipelines for Apache Spark
Support
Quality
Security
License
Reuse
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Support
Quality
Security
License
Reuse
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.
Support
Quality
Security
License
Reuse
Apache Hadoop docker image
Support
Quality
Security
License
Reuse
TensorFlow binaries supporting AVX, FMA, SSE
Support
Quality
Security
License
Reuse
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Support
Quality
Security
License
Reuse
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Support
Quality
Security
License
Reuse
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Support
Quality
Security
License
Reuse
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
Support
Quality
Security
License
Reuse
DataStax Spark Cassandra Connector
Support
Quality
Security
License
Reuse
深圳地铁大数据客流分析系统🚇🚄🌟
Support
Quality
Security
License
Reuse
Implementations of SIMD instruction sets for systems which don't natively support them.
Support
Quality
Security
License
Reuse
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Support
Quality
Security
License
Reuse
Production Ready Data Integration Product, documentation:
Support
Quality
Security
License
Reuse
Apache Drill is a distributed MPP query layer for self describing data
Support
Quality
Security
License
Reuse
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Support
Quality
Security
License
Reuse
a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header
Support
Quality
Security
License
Reuse
jdbi is designed to provide convenient tabular data access in Java; including templated SQL, parameterized and strongly typed queries, and Streams integration
Support
Quality
Security
License
Reuse
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Support
Quality
Security
License
Reuse
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
Support
Quality
Security
License
Reuse
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Support
Quality
Security
License
Reuse
📽 Highly Optimized Graphics Math (glm) for C
Support
Quality
Security
License
Reuse
Mirror of Apache Kudu
Support
Quality
Security
License
Reuse
A large-scale entity and relation database supporting aggregation of properties
Support
Quality
Security
License
Reuse
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Support
Quality
Security
License
Reuse
Elassandra = Elasticsearch + Apache Cassandra
Support
Quality
Security
License
Reuse
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Support
Quality
Security
License
Reuse
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Support
Quality
Security
License
Reuse
Numpy exercises.
Support
Quality
Security
License
Reuse
Python interface to Hive and Presto. 🐝
Support
Quality
Security
License
Reuse
生产环境的海量数据计算产品,文档地址:
Support
Quality
Security
License
Reuse
Home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
Support
Quality
Security
License
Reuse
Distributed Deep learning with Keras & Spark
Support
Quality
Security
License
Reuse
Apache Atlas
Support
Quality
Security
License
Reuse
The Universal Storage Engine
Support
Quality
Security
License
Reuse
MongoDB Connector for Hadoop
Support
Quality
Security
License
Reuse
s
spark-py-notebooksby jadianes
Jupyter Notebook 1521 Version:Current License: Proprietary (Proprietary)
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Support
Quality
Security
License
Reuse
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Support
Quality
Security
License
Reuse
A Scala kernel for Jupyter
Support
Quality
Security
License
Reuse
SIMD for humans
Support
Quality
Security
License
Reuse
Open-source graph database, built for real-time streaming data, compatible with Neo4j.
Support
Quality
Security
License
Reuse
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Support
Quality
Security
License
Reuse
i
incubator-devlakeby apache
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Go 2089Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
e
embreeby embree
Embree ray tracing kernels repository.
C++ 2024Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
B
BigDataGuideby MoRan1607
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Java 2023Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
Q
Quicksqlby Qihoo360
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Java 2005Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
q
quillby getquill
Compile-time Language Integrated Queries for Scala
Scala 1992Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
streamparkby streamxhub
Make stream processing easier! easy-to-use stream processing application development framework and one-stop stream processing operation platform
Java 1971Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
node-rdkafkaby Blizzard
Node.js bindings for librdkafka
JavaScript 1969Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spark-deep-learningby databricks
Deep Learning Pipelines for Apache Spark
Python 1968Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
E
EasyMLby ICT-BDA
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Java 1958Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
esProcby SPLWare
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.
Java 1951Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
docker-hadoopby big-data-europe
Apache Hadoop docker image
Shell 1940Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
tensorflow-build-archivedby lakshayg
TensorFlow binaries supporting AVX, FMA, SSE
Shell 1938Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
ambariby apache
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
Java 1925Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
flinkStreamSQLby DTStack
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Java 1921Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparkby dotnet
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
C# 1905Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
elasticsearch-hadoopby elastic
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
Java 1902Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-cassandra-connectorby datastax
DataStax Spark Cassandra Connector
Scala 1902Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SZT-bigdataby geekyouth
深圳地铁大数据客流分析系统🚇🚄🌟
Scala 1871Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
simdeby simd-everywhere
Implementations of SIMD instruction sets for systems which don't natively support them.
C 1827Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
incubator-gobblinby apache
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Java 1819Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
seatunnelby InterestingLab
Production Ready Data Integration Product, documentation:
Java 1819Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
drillby apache
Apache Drill is a distributed MPP query layer for self describing data
Java 1801Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
oryxby OryxProject
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Java 1798Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
x
xbyakby herumi
a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header
C++ 1785Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
j
jdbiby jdbi
jdbi is designed to provide convenient tabular data access in Java; including templated SQL, parameterized and strongly typed queries, and Streams integration
Java 1782Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bookkeeperby apache
Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
Java 1748Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
x
xsimdby xtensor-stack
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
C++ 1747Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
b
byzer-langby byzer-org
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Scala 1731Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cglmby recp
📽 Highly Optimized Graphics Math (glm) for C
C 1711Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
Support
Quality
Security
License
Reuse
G
Gafferby gchq
A large-scale entity and relation database supporting aggregation of properties
Java 1700Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
homeby apachecn
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
CSS 1694Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
elassandraby strapdata
Elassandra = Elasticsearch + Apache Cassandra
Java 1667Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kyuubiby apache
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Scala 1631Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
fugueby fugue-project
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Python 1622Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
Support
Quality
Security
License
Reuse
P
PyHiveby dropbox
Python interface to Hive and Presto. 🐝
Python 1609Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
w
waterdropby InterestingLab
生产环境的海量数据计算产品,文档地址:
Java 1601Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
prestoby prestosql
Home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
Java 1595Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elephasby maxpumperla
Distributed Deep learning with Keras & Spark
Python 1560Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
T
Support
Quality
Security
License
Reuse
m
mongo-hadoopby mongodb
MongoDB Connector for Hadoop
Java 1521Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-py-notebooksby jadianes
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Jupyter Notebook 1521Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
y
ytsaurusby ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
C++ 1520Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
almondby almond-sh
A Scala kernel for Jupyter
Scala 1516Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
m
memgraphby memgraph
Open-source graph database, built for real-time streaming data, compatible with Neo4j.
C++ 1494Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
a
aasby sryza
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Scala 1485Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse