Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Support
Quality
Security
License
Reuse
Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
Support
Quality
Security
License
Reuse
Learning Apache spark,including code and data .Most part can run local.
Support
Quality
Security
License
Reuse
This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one
Support
Quality
Security
License
Reuse
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Support
Quality
Security
License
Reuse
The Big Nerd Ranch Core Data Stack
Support
Quality
Security
License
Reuse
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
Support
Quality
Security
License
Reuse
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Support
Quality
Security
License
Reuse
BigData Ecosystem Dataset
Support
Quality
Security
License
Reuse
C++ SIMD Noise Library
Support
Quality
Security
License
Reuse
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Support
Quality
Security
License
Reuse
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Support
Quality
Security
License
Reuse
DataCap is integrated software for data transformation, integration, and visualization. Support a variety of data sources, file types, big data related database, relational database, NoSQL database, etc. Through the software can realize the management of multiple data sources, the data under the source of various operations conversion ...
Support
Quality
Security
License
Reuse
Distributed database specialized in exporting key/value data from Hadoop
Support
Quality
Security
License
Reuse
A simplified, lightweight ETL Framework based on Apache Spark
Support
Quality
Security
License
Reuse
Avro Data Source for Apache Spark
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Rust's standard library vendor-specific APIs and run-time feature detection
Support
Quality
Security
License
Reuse
HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Support
Quality
Security
License
Reuse
Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
Support
Quality
Security
License
Reuse
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
SIMD numeric type for EcmaScript
Support
Quality
Security
License
Reuse
An optimized implementation of librsync in pure Rust.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Real Time Analytics and Data Pipelines based on Spark Streaming
Support
Quality
Security
License
Reuse
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Support
Quality
Security
License
Reuse
Qubole Sparklens tool for performance tuning Apache Spark
Support
Quality
Security
License
Reuse
This is a list of hardware which supports Intel SGX - Software Guard Extensions.
Support
Quality
Security
License
Reuse
基于 scrapy-redis 的通用分布式爬虫框架
Support
Quality
Security
License
Reuse
MADlib has moved to Apache MADlib (incubating). Please send pull requests to the Apache repository.
Support
Quality
Security
License
Reuse
Geo Spatial Data Analytics on Spark
Support
Quality
Security
License
Reuse
Python Thrift driver for Apache Cassandra
Support
Quality
Security
License
Reuse
Data Lineage Tracking And Visualization Solution
Support
Quality
Security
License
Reuse
Minos is beyond a hadoop deployment system.
Support
Quality
Security
License
Reuse
Portable Packed SIMD Vectors for Rust standard library
Support
Quality
Security
License
Reuse
Simple & Efficient data access for Scala and Scala.js
Support
Quality
Security
License
Reuse
Simple & Efficient data access for Scala and Scala.js
Support
Quality
Security
License
Reuse
Low level access to native memory, JVM and OS.
Support
Quality
Security
License
Reuse
ClickHouse Native Protocol JDBC implementation
Support
Quality
Security
License
Reuse
πflow is a big data flow engine with spark support
Support
Quality
Security
License
Reuse
Speed-up over 50% in average vs traditional memcpy in gcc 4.9 or vc2012
Support
Quality
Security
License
Reuse
A Scala productivity framework for Hadoop.
Support
Quality
Security
License
Reuse
Spring Hadoop Samples
Support
Quality
Security
License
Reuse
RustFFT is a high-performance FFT library written in pure Rust.
Support
Quality
Security
License
Reuse
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
t
thrillby thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
C++ 567Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
k
kyuubiby NetEase
Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
Scala 566Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SparkLearningby xubo245
Learning Apache spark,including code and data .Most part can run local.
Scala 563Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
K
Kylinby KylinOLAP
This code base is retained for historical interest only, please visit Apache Incubator Repo for latest one
Java 561Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparkMeasureby LucaCanali
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Scala 561Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
CoreDataStackby bignerdranch
The Big Nerd Ranch Core Data Stack
Swift 561Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
streaming-benchmarksby yahoo
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
Jupyter Notebook 560Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
aws-glue-libsby awslabs
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Python 555Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
b
bigdata-ecosystemby zenkay
BigData Ecosystem Dataset
HTML 554Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
F
Support
Quality
Security
License
Reuse
s
spark-rapidsby NVIDIA
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Scala 543Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bdp-platformby wlhbdp
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Java 541Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
d
datacapby EdurtIO
DataCap is integrated software for data transformation, integration, and visualization. Support a variety of data sources, file types, big data related database, relational database, NoSQL database, etc. Through the software can realize the management of multiple data sources, the data under the source of various operations conversion ...
Java 541Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elephantdbby nathanmarz
Distributed database specialized in exporting key/value data from Hadoop
Java 540Updated: 4 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
m
metorikkuby YotpoLtd
A simplified, lightweight ETL Framework based on Apache Spark
Scala 539Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spark-avroby databricks
Avro Data Source for Apache Spark
Scala 538Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
hetu-coreby openlookeng
Java 537Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
glutenby oap-project
Scala 536Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
stdarchby rust-lang
Rust's standard library vendor-specific APIs and run-time feature detection
HTML 535Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
H
HPCC-Platformby hpcc-systems
HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics.
C++ 534Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
z
zerocopyby google
Rust 534Updated: 2 y ago License: Permissive (BSD-2-Clause)
Support
Quality
Security
License
Reuse
b
bdp-dataplatformby wlhbdp
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Java 533Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
h
hadoop-lzoby twitter
Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20
Shell 533Updated: 3 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
s
shcby hortonworks-spark
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Scala 531Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sql-trainingby ververica
Java 529Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
ecmascript_simdby tc39
SIMD numeric type for EcmaScript
JavaScript 526Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
f
fast_rsyncby dropbox
An optimized implementation of librsync in pure Rust.
Rust 526Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-sql-perfby databricks
Scala 525Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spartaby Stratio
Real Time Analytics and Data Pipelines based on Spark Streaming
Scala 525Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
elandby elastic
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Python 519Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparklensby qubole
Qubole Sparklens tool for performance tuning Apache Spark
Scala 517Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SGX-hardwareby ayeks
This is a list of hardware which supports Intel SGX - Software Guard Extensions.
C 517Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
Support
Quality
Security
License
Reuse
a
archived_madlibby madlib
MADlib has moved to Apache MADlib (incubating). Please send pull requests to the Apache repository.
C 512Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
magellanby harsha2010
Geo Spatial Data Analytics on Spark
Scala 508Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pycassaby pycassa
Python Thrift driver for Apache Cassandra
Python 507Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
splineby AbsaOSS
Data Lineage Tracking And Visualization Solution
Scala 503Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
minosby XiaoMi
Minos is beyond a hadoop deployment system.
Python 502Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
packed_simdby rust-lang
Portable Packed SIMD Vectors for Rust standard library
Rust 501Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
fetchby 47degrees
Simple & Efficient data access for Scala and Scala.js
Scala 490Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
f
fetchby xebia-functional
Simple & Efficient data access for Scala and Scala.js
Scala 490Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
Chronicle-Coreby OpenHFT
Low level access to native memory, JVM and OS.
Java 488Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
C
ClickHouse-Native-JDBCby housepower
ClickHouse Native Protocol JDBC implementation
Java 486Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
piflowby cas-bigdatalab
πflow is a big data flow engine with spark support
Scala 486Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
FastMemcpyby skywind3000
Speed-up over 50% in average vs traditional memcpy in gcc 4.9 or vc2012
C 486Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
scoobiby NICTA
A Scala productivity framework for Hadoop.
Scala 485Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spring-hadoop-samplesby spring-attic
Spring Hadoop Samples
Java 484Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
R
RustFFTby ejmahler
RustFFT is a high-performance FFT library written in pure Rust.
Rust 482Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
enterprise_gatewayby jupyter
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Python 481Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
f
findsparkby minrk
Python 479Updated: 2 y ago License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse