Riak client for Javascript
Support
Quality
Security
License
Reuse
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
Support
Quality
Security
License
Reuse
Spring Hadoop Samples
Support
Quality
Security
License
Reuse
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Support
Quality
Security
License
Reuse
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Simplifying robust end-to-end machine learning on Apache Spark.
Support
Quality
Security
License
Reuse
SIMD-accelerated UTF-8 validation for Rust.
Support
Quality
Security
License
Reuse
h
high-performance-spark-examplesby high-performance-spark
Scala 467 Version:Current License: Proprietary (Proprietary)
Examples for High Performance Spark
Support
Quality
Security
License
Reuse
D
Data-Engineering-Projectsby alanchn31
Jupyter Notebook 467 Version:Current License: No License (No License)
Personal Data Engineering Projects
Support
Quality
Security
License
Reuse
Arctic is a streaming lake warehouse service open sourced by NetEase
Support
Quality
Security
License
Reuse
Hybrid data integration service that simplifies ETL at scale
Support
Quality
Security
License
Reuse
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Support
Quality
Security
License
Reuse
A Scala feature transformation library for data science and machine learning
Support
Quality
Security
License
Reuse
Stream Data Mining Library for Spark Streaming
Support
Quality
Security
License
Reuse
pyspark methods to enhance developer productivity 📣 👯 🎉
Support
Quality
Security
License
Reuse
Deploying complex solutions, magically.
Support
Quality
Security
License
Reuse
Kafka Connect HDFS connector
Support
Quality
Security
License
Reuse
Presto Ethereum Connector -- SQL on Ethereum
Support
Quality
Security
License
Reuse
Generic Data Ingestion & Dispersal Library for Hadoop
Support
Quality
Security
License
Reuse
Automated Repair Awesomeness for Apache Cassandra
Support
Quality
Security
License
Reuse
This is a repo documenting the best practices in PySpark.
Support
Quality
Security
License
Reuse
An Apache Flink subproject to provide storage for dynamic tables.
Support
Quality
Security
License
Reuse
An open-source columnar data format designed for fast & realtime analytic with big data.
Support
Quality
Security
License
Reuse
UNMAINTAINED - An API compatible open source server for interacting with devices speaking the spark-protocol
Support
Quality
Security
License
Reuse
Distributed Stockfish analysis for lichess.org
Support
Quality
Security
License
Reuse
Moonbox is a DVtaaS (Data Virtualization as a Service) Platform
Support
Quality
Security
License
Reuse
Vectorized processing for Apache Arrow
Support
Quality
Security
License
Reuse
The Vector Optimized Library of Kernels
Support
Quality
Security
License
Reuse
a lightweight 3d particle engine in javascript, compatible with THREE.js and TWEEN.js
Support
Quality
Security
License
Reuse
Diagrams describing Apache Hadoop internals (2.3.0 or later).
Support
Quality
Security
License
Reuse
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Support
Quality
Security
License
Reuse
Scala examples for learning to use Spark
Support
Quality
Security
License
Reuse
Data Engineering Practice Problems
Support
Quality
Security
License
Reuse
Stanford CoreNLP wrapper for Apache Spark
Support
Quality
Security
License
Reuse
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Support
Quality
Security
License
Reuse
Prisma generator for automatically generating documentation reference from the Prisma schema.
Support
Quality
Security
License
Reuse
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Support
Quality
Security
License
Reuse
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Support
Quality
Security
License
Reuse
[MAINTENANCE ONLY] DataStax PHP Driver for Apache Cassandra
Support
Quality
Security
License
Reuse
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Support
Quality
Security
License
Reuse
Utility tool to load Data into Cassandra to help you writing good isolated JUnit Test into your application
Support
Quality
Security
License
Reuse
Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
Support
Quality
Security
License
Reuse
Mirror of Apache Eagle
Support
Quality
Security
License
Reuse
Iceberg is a table format for large, slow-moving tabular data
Support
Quality
Security
License
Reuse
Mirror of Apache Eagle
Support
Quality
Security
License
Reuse
Apache Tez
Support
Quality
Security
License
Reuse
24MHz sampling rate Logic Analyzer based on fx2lafw
Support
Quality
Security
License
Reuse
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Support
Quality
Security
License
Reuse
Stroom is a highly scalable data storage, processing and analysis platform.
Support
Quality
Security
License
Reuse
r
riak-jsby mostlyserious
Riak client for Javascript
JavaScript 479Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-scala-examplesby spark-examples
This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
Scala 477Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spring-hadoop-samplesby spring-projects
Spring Hadoop Samples
Java 476Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
pandapyby firmai
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Python 476Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
datawaveby NationalSecurityAgency
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Java 476Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
legacy-jcloudsby jclouds
Java 475Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
k
keystoneby amplab
Simplifying robust end-to-end machine learning on Apache Spark.
Scala 472Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
simdutf8by rusticstuff
SIMD-accelerated UTF-8 validation for Rust.
Rust 469Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
high-performance-spark-examplesby high-performance-spark
Examples for High Performance Spark
Scala 467Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
D
Data-Engineering-Projectsby alanchn31
Personal Data Engineering Projects
Jupyter Notebook 467Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
arcticby NetEase
Arctic is a streaming lake warehouse service open sourced by NetEase
Java 467Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
Data Factoryby Microsoft
Hybrid data integration service that simplifies ETL at scale
cloud_api 465Updated: Current License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
SparkStreamingby ljcan
Spark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Java 461Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
featranby spotify
A Scala feature transformation library for data science and machine learning
Scala 460Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
streamDMby huawei-noah
Stream Data Mining Library for Spark Streaming
Scala 460Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
q
quinnby MrPowers
pyspark methods to enhance developer productivity 📣 👯 🎉
Python 455Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
conjure-upby conjure-up
Deploying complex solutions, magically.
Python 455Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kafka-connect-hdfsby confluentinc
Kafka Connect HDFS connector
Java 452Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
p
presto-ethereumby xiaoyao1991
Presto Ethereum Connector -- SQL on Ethereum
Java 450Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
marmarayby uber
Generic Data Ingestion & Dispersal Library for Hadoop
Java 449Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
cassandra-reaperby thelastpickle
Automated Repair Awesomeness for Apache Cassandra
Java 448Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-syntaxby ericxiao251
This is a repo documenting the best practices in PySpark.
Jupyter Notebook 447Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
flink-table-storeby apache
An Apache Flink subproject to provide storage for dynamic tables.
Java 445Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
indexrby shunfei
An open-source columnar data format designed for fast & realtime analytic with big data.
Java 443Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-serverby particle-iot
UNMAINTAINED - An API compatible open source server for interacting with devices speaking the spark-protocol
JavaScript 443Updated: 4 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
f
fishnetby niklasf
Distributed Stockfish analysis for lichess.org
Rust 440Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
m
moonboxby edp963
Moonbox is a DVtaaS (Data Virtualization as a Service) Platform
JavaScript 438Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
g
gandivaby dremio
Vectorized processing for Apache Arrow
C++ 438Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
v
volkby gnuradio
The Vector Optimized Library of Kernels
C++ 436Updated: 2 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
s
sparks.jsby zz85
a lightweight 3d particle engine in javascript, compatible with THREE.js and TWEEN.js
JavaScript 432Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
H
HadoopInternalsby ercoppa
Diagrams describing Apache Hadoop internals (2.3.0 or later).
HTML 429Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-solrby lucidworks
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Scala 426Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
LearningSparkby spirom
Scala examples for learning to use Spark
Scala 425Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
data-engineering-practiceby danielbeach
Data Engineering Practice Problems
Python 425Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
s
spark-corenlpby databricks
Stanford CoreNLP wrapper for Apache Spark
Scala 423Updated: 4 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
i
incubator-celebornby apache
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Java 421Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
prisma-docs-generatorby pantharshit00
Prisma generator for automatically generating documentation reference from the Prisma schema.
TypeScript 419Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
storm-yarnby yahoo
Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Java 418Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
A
Agile_Data_Code_2by rjurney
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Jupyter Notebook 417Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
php-driverby datastax
[MAINTENANCE ONLY] DataStax PHP Driver for Apache Cassandra
C 415Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
h
hadoop-ansibleby analytically
Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
Shell 415Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cassandra-unitby jsevellec
Utility tool to load Data into Cassandra to help you writing good isolated JUnit Test into your application
Java 414Updated: 4 y ago License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
s
streampipesby apache
Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
Java 412Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
i
icebergby Netflix
Iceberg is a table format for large, slow-moving tabular data
Java 409Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
e
Support
Quality
Security
License
Reuse
t
Support
Quality
Security
License
Reuse
n
nanoDLAby wuxx
24MHz sampling rate Logic Analyzer based on fx2lafw
C 408Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
h
hyperspaceby microsoft
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Scala 408Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
stroomby gchq
Stroom is a highly scalable data storage, processing and analysis platform.
Java 406Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse