Apache Parquet
Support
Quality
Security
License
Reuse
Unicorn for node.js
Support
Quality
Security
License
Reuse
MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
Support
Quality
Security
License
Reuse
Web tool for Kafka Connect |
Support
Quality
Security
License
Reuse
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Support
Quality
Security
License
Reuse
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Support
Quality
Security
License
Reuse
Stream computing platform for bigdata
Support
Quality
Security
License
Reuse
Apache Spark training material
Support
Quality
Security
License
Reuse
ODPS Python SDK and data analysis framework
Support
Quality
Security
License
Reuse
l
lambda-refarch-mapreduceby awslabs
JavaScript 388 Version:Current License: Proprietary (Proprietary)
This repo presents a reference architecture for running serverless MapReduce jobs. This has been implemented using AWS Lambda and Amazon S3.
Support
Quality
Security
License
Reuse
Apache Flink Playgrounds
Support
Quality
Security
License
Reuse
DataStax C/C++ Driver for Apache Cassandra
Support
Quality
Security
License
Reuse
Read-only mirror of ASF Git Repo for jclouds
Support
Quality
Security
License
Reuse
SIMD-accelerated base64 codecs
Support
Quality
Security
License
Reuse
S
StockInference-Sparkby Pivotal-Open-Source-Hub
Java 381 Version:Current License: Permissive (Apache-2.0)
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Support
Quality
Security
License
Reuse
PySpark test helper methods with beautiful error messages
Support
Quality
Security
License
Reuse
Compute over Data framework for public, transparent, and optionally verifiable computation
Support
Quality
Security
License
Reuse
Oozie - workflow engine for Hadoop
Support
Quality
Security
License
Reuse
Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.
Support
Quality
Security
License
Reuse
A lightweight IoT edge analytics software
Support
Quality
Security
License
Reuse
Scripts used to setup a Spark cluster on EC2
Support
Quality
Security
License
Reuse
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Support
Quality
Security
License
Reuse
An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Support
Quality
Security
License
Reuse
Repository holding configuration files for running an HDFS cluster in Kubernetes
Support
Quality
Security
License
Reuse
the data and ipython notebook of my attempt to solve the kaggle titanic problem
Support
Quality
Security
License
Reuse
TerichDB, an open source data store based on terark engine
Support
Quality
Security
License
Reuse
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Support
Quality
Security
License
Reuse
A Spark plugin for reading and writing Excel files
Support
Quality
Security
License
Reuse
This is a tool which used to manage and monitor ClickHouse database
Support
Quality
Security
License
Reuse
Performance tests for Apache Spark
Support
Quality
Security
License
Reuse
🏐 Apache Parquet for modern .NET
Support
Quality
Security
License
Reuse
Optimized Analytics Package for Spark* Platform
Support
Quality
Security
License
Reuse
Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course
Support
Quality
Security
License
Reuse
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Support
Quality
Security
License
Reuse
🧭 Use Prisma as a multi-tenant provider for your application
Support
Quality
Security
License
Reuse
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Support
Quality
Security
License
Reuse
Provides support to increase developer productivity in Java when using Apache Cassandra. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
Support
Quality
Security
License
Reuse
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Support
Quality
Security
License
Reuse
h
Jupyter Notebook 352 Version:Current License: Permissive (Apache-2.0)
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Support
Quality
Security
License
Reuse
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Support
Quality
Security
License
Reuse
Stream Processing with Apache Flink - Scala Examples
Support
Quality
Security
License
Reuse
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Support
Quality
Security
License
Reuse
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Support
Quality
Security
License
Reuse
Node.js API for Apache Spark with Remote Client
Support
Quality
Security
License
Reuse
hadoop-common-2.2.0/bin
Support
Quality
Security
License
Reuse
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Support
Quality
Security
License
Reuse
Mirror of Apache Apex core
Support
Quality
Security
License
Reuse
Enhanced PostgreSQL logical replication
Support
Quality
Security
License
Reuse
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Support
Quality
Security
License
Reuse
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Support
Quality
Security
License
Reuse
p
Support
Quality
Security
License
Reuse
f
Support
Quality
Security
License
Reuse
M
MIPPby aff3ct
MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.
C++ 402Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kafka-connect-uiby lensesio
Web tool for Kafka Connect |
JavaScript 400Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
B
BigData-In-Practiceby whirlys
大数据实践项目 Hadoop、Spark、Kafka、Hbase、Flink.....
Java 398Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
z
zatby SuperCowPowers
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Jupyter Notebook 397Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
sylphby harbby
Stream computing platform for bigdata
Java 396Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-trainingby databricks
Apache Spark training material
Scala 394Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
aliyun-odps-python-sdkby aliyun
ODPS Python SDK and data analysis framework
Python 390Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
l
lambda-refarch-mapreduceby awslabs
This repo presents a reference architecture for running serverless MapReduce jobs. This has been implemented using AWS Lambda and Amazon S3.
JavaScript 388Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
f
flink-playgroundsby apache
Apache Flink Playgrounds
Java 385Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cpp-driverby datastax
DataStax C/C++ Driver for Apache Cassandra
C++ 384Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
j
jcloudsby jclouds
Read-only mirror of ASF Git Repo for jclouds
Java 383Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
f
fastbase64by lemire
SIMD-accelerated base64 codecs
C 383Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
S
StockInference-Sparkby Pivotal-Open-Source-Hub
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
Java 381Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
chispaby MrPowers
PySpark test helper methods with beautiful error messages
Python 379Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
b
bacalhauby bacalhau-project
Compute over Data framework for public, transparent, and optionally verifiable computation
Go 378Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
oozieby YahooArchive
Oozie - workflow engine for Hadoop
Java 376Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
neo4j-mazerunnerby neo4j-contrib
Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.
Java 376Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kuiperby emqx
A lightweight IoT edge analytics software
Go 375Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-ec2by amplab
Scripts used to setup a Spark cluster on EC2
Python 374Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kotlin-spark-apiby Kotlin
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Kotlin 374Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
I
IQLby teeyog
An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
JavaScript 373Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
k
kubernetes-HDFSby apache-spark-on-k8s
Repository holding configuration files for running an HDFS cluster in Kubernetes
Shell 372Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
K
Kaggle_Titanicby HanXiaoyang
the data and ipython notebook of my attempt to solve the kaggle titanic problem
Jupyter Notebook 371Updated: 3 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
t
terichdbby krareT
TerichDB, an open source data store based on terark engine
C++ 368Updated: 4 y ago License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
c
connectorsby delta-io
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Java 367Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-excelby crealytics
A Spark plugin for reading and writing Excel files
Scala 366Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
ckmanby housepower
This is a tool which used to manage and monitor ClickHouse database
Go 366Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-perfby databricks
Performance tests for Apache Spark
Scala 366Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
parquet-dotnetby elastacloud
🏐 Apache Parquet for modern .NET
C# 361Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
O
OAPby Intel-bigdata
Optimized Analytics Package for Spark* Platform
Scala 359Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mooc-setupby spark-mooc
Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course
Python 356Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
E
ECommerceRecommendSystemby ittqqzz
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Java 355Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
p
prisma-multi-tenantby Errorname
🧭 Use Prisma as a multi-tenant provider for your application
TypeScript 354Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
M
Miscellaneousby LucaCanali
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
Jupyter Notebook 354Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spring-data-cassandraby spring-projects
Provides support to increase developer productivity in Java when using Apache Cassandra. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
Java 352Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
trendingtopicsby datawrangling
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Ruby 352Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
h
hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inferenceby kaiwaehner
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Jupyter Notebook 352Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tdigestby CamDavidsonPilon
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Python 350Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
examples-scalaby streaming-with-flink
Stream Processing with Apache Flink - Scala Examples
Scala 349Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
blazeby blaze-init
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Rust 349Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
y
ytk-learnby yuantiku
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Java 348Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
eclairjs-nodeby EclairJS
Node.js API for Apache Spark with Remote Client
JavaScript 348Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
hadoop-common-2.2.0-binby srccodes
hadoop-common-2.2.0/bin
Shell 347Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
g
graphxby amplab
Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
Scala 345Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
attic-apex-coreby apache
Mirror of Apache Apex core
Java 344Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pgcatby kingluo
Enhanced PostgreSQL logical replication
Go 341Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
H
HashtagCashtagby shafiab
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
Scala 341Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
c
cloudbreakby hortonworks
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Java 340Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse