h
hadoopecosystemtable.github.ioby hadoopecosystemtable
HTML 667 Version:Current License: Permissive (Apache-2.0)
This page is a summary to keep the track of Hadoop related projects, and relevant projects around Big Data scene focused on the open source, free software environment.
Support
Quality
Security
License
Reuse
A wide linear algebra crate for games and graphics.
Support
Quality
Security
License
Reuse
Fast, accurate and scalable probabilistic data linkage using your choice of SQL backend
Support
Quality
Security
License
Reuse
OpenDAL: Access data freely, painlessly, and efficiently
Support
Quality
Security
License
Reuse
Apache Flink Training Excercises
Support
Quality
Security
License
Reuse
(Finished) Geek Time Data Analysis Practical 45 Lecture - Detailed notes containing markdown images mind map code data can be read directly code test
Support
Quality
Security
License
Reuse
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library
Support
Quality
Security
License
Reuse
Scalable, redundant, and distributed object store for Apache Hadoop
Support
Quality
Security
License
Reuse
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Support
Quality
Security
License
Reuse
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Support
Quality
Security
License
Reuse
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Support
Quality
Security
License
Reuse
R frontend for Spark
Support
Quality
Security
License
Reuse
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Support
Quality
Security
License
Reuse
pyspark🍒🥭 is delicious,just eat it!😋😋
Support
Quality
Security
License
Reuse
Spark reference applications
Support
Quality
Security
License
Reuse
Quantcast File System
Support
Quality
Security
License
Reuse
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
Support
Quality
Security
License
Reuse
Lazydata: Scalable data dependencies for Python projects
Support
Quality
Security
License
Reuse
Golang framework for streaming ETL, observability data pipeline, and event processing apps
Support
Quality
Security
License
Reuse
DataFusion has now been donated to the Apache Arrow project
Support
Quality
Security
License
Reuse
💎🔥大数据学习笔记
Support
Quality
Security
License
Reuse
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Support
Quality
Security
License
Reuse
Expressive Vector Engine - SIMD in C++ Goes Brrrr
Support
Quality
Security
License
Reuse
Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring.
Support
Quality
Security
License
Reuse
Low-code metrics store, modern open-source alternative to Looker
Support
Quality
Security
License
Reuse
Mirror of Apache Giraph
Support
Quality
Security
License
Reuse
Distributed Neural Networks for Spark
Support
Quality
Security
License
Reuse
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Support
Quality
Security
License
Reuse
C# bindings for lemire/simdjson (and full C# port)
Support
Quality
Security
License
Reuse
D
Data_Engineering_Simplifiedby JagadeeshwaranM
Python 605 Version:Current License: No License (No License)
Support
Quality
Security
License
Reuse
Web UI for Trino, Hive and SparkSQL
Support
Quality
Security
License
Reuse
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Support
Quality
Security
License
Reuse
A tool to graphically visualize SIMD code
Support
Quality
Security
License
Reuse
Jupyter Notebook & Data Associated with my Tutorial video on the Python NumPy Library
Support
Quality
Security
License
Reuse
Redshift data source for Apache Spark
Support
Quality
Security
License
Reuse
An open protocol for secure data sharing
Support
Quality
Security
License
Reuse
A distributed system designed to ingest and process time series data
Support
Quality
Security
License
Reuse
BigData Project 大数据项目由浅入深
Support
Quality
Security
License
Reuse
Hadoop library for large-scale data processing, now an Apache Incubator project
Support
Quality
Security
License
Reuse
Hadoop library for large-scale data processing, now an Apache Incubator project
Support
Quality
Security
License
Reuse
Today I Learned
Support
Quality
Security
License
Reuse
A wide linear algebra crate for games and graphics.
Support
Quality
Security
License
Reuse
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Support
Quality
Security
License
Reuse
J
JustEnoughScalaForSparkby deanwampler
Jupyter Notebook 579 Version:Current License: Permissive (Apache-2.0)
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Support
Quality
Security
License
Reuse
A lightweight high performance tensor algebra framework for modern C++
Support
Quality
Security
License
Reuse
Python bindings for the simdjson project.
Support
Quality
Security
License
Reuse
DataStax C# Driver for Apache Cassandra
Support
Quality
Security
License
Reuse
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Support
Quality
Security
License
Reuse
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Support
Quality
Security
License
Reuse
A vector with a fixed capacity. (Rust)
Support
Quality
Security
License
Reuse
h
hadoopecosystemtable.github.ioby hadoopecosystemtable
This page is a summary to keep the track of Hadoop related projects, and relevant projects around Big Data scene focused on the open source, free software environment.
HTML 667Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
u
ultravioletby fu5ha
A wide linear algebra crate for games and graphics.
Rust 665Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
splinkby moj-analytical-services
Fast, accurate and scalable probabilistic data linkage using your choice of SQL backend
Python 664Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
opendalby datafuselabs
OpenDAL: Access data freely, painlessly, and efficiently
Rust 664Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
flink-trainingby apache
Apache Flink Training Excercises
Java 663Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
DataAnalysisInActionby xiaomiwujiecao
(Finished) Geek Time Data Analysis Practical 45 Lecture - Detailed notes containing markdown images mind map code data can be read directly code test
Python 663Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
k
kleinby jeremyong
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library
C++ 660Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
ozoneby apache
Scalable, redundant, and distributed object store for Apache Hadoop
Java 658Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
DevOps-Python-toolsby HariSekhon
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Python 657Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
blinkdbby sameeragarwal
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Scala 648Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
nessieby projectnessie
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
Java 644Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SparkR-pkgby amplab-extras
R frontend for Spark
R 643Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TonYby linkedin
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Java 640Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
e
eat_pyspark_in_10_daysby lyhue1991
pyspark🍒🥭 is delicious,just eat it!😋😋
Python 636Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
r
reference-appsby databricks
Spark reference applications
Scala 633Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
q
Support
Quality
Security
License
Reuse
t
tisby qlangtech
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
Java 632Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
lazydataby rstojnic
Lazydata: Scalable data dependencies for Python projects
Python 629Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
fireboltby digitalocean
Golang framework for streaming ETL, observability data pipeline, and event processing apps
Go 629Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
datafusionby andygrove
DataFusion has now been donated to the Apache Arrow project
Rust 627Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
B
Support
Quality
Security
License
Reuse
d
dist-kerasby cerndb
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Python 615Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
e
eveby jfalcou
Expressive Vector Engine - SIMD in C++ Goes Brrrr
C++ 615Updated: 2 y ago License: Permissive (BSL-1.0)
Support
Quality
Security
License
Reuse
s
spring-hadoopby spring-projects
Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring.
Java 612Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
m
mlcraftby mlcraft-io
Low-code metrics store, modern open-source alternative to Looker
JavaScript 612Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
g
Support
Quality
Security
License
Reuse
S
SparkNetby amplab
Distributed Neural Networks for Spark
Scala 608Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
o
orcby apache
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
HTML 607Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SimdJsonSharpby EgorBo
C# bindings for lemire/simdjson (and full C# port)
C# 605Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
Data_Engineering_Simplifiedby JagadeeshwaranM
Python 605Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
y
yanagishimaby yanagishima
Web UI for Trino, Hive and SparkSQL
Java 602Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
libxsmmby hfp
Library for specialized dense and sparse matrix operations, and deep learning primitives.
C 597Updated: 3 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
S
SIMD-Visualiserby piotte13
A tool to graphically visualize SIMD code
JavaScript 595Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
N
NumPyby KeithGalli
Jupyter Notebook & Data Associated with my Tutorial video on the Python NumPy Library
Jupyter Notebook 594Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-redshiftby databricks
Redshift data source for Apache Spark
Scala 593Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
delta-sharingby delta-io
An open protocol for secure data sharing
Scala 592Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bluefloodby rax-maas
A distributed system designed to ingest and process time series data
Java 592Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
B
BigDataby monsonlee
BigData Project 大数据项目由浅入深
Java 588Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
d
datafuby linkedin
Hadoop library for large-scale data processing, now an Apache Incubator project
Java 588Updated: 5 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
datafuby LinkedInAttic
Hadoop library for large-scale data processing, now an Apache Incubator project
Java 588Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
T
Support
Quality
Security
License
Reuse
u
ultravioletby termhn
A wide linear algebra crate for games and graphics.
Rust 580Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
disk.frameby DiskFrame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
R 580Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
J
JustEnoughScalaForSparkby deanwampler
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Jupyter Notebook 579Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
Fastorby romeric
A lightweight high performance tensor algebra framework for modern C++
C++ 578Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pysimdjsonby TkTech
Python bindings for the simdjson project.
Python 577Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
csharp-driverby datastax
DataStax C# Driver for Apache Cassandra
C# 576Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
disk.frameby xiaodaigh
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
R 574Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
d
datafakerby gangly
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Python 573Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
arrayvecby bluss
A vector with a fixed capacity. (Rust)
Rust 572Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse