Convenient and performant logging library for Scala wrapping SLF4J.
Support
Quality
Security
License
Reuse
s
spark-nlp-workshopby JohnSnowLabs
Jupyter Notebook 
888
Version:Current
License: Permissive (Apache-2.0)
Public runnable examples of using John Snow Labs' NLP for Apache Spark.
Support
Quality
Security
License
Reuse
The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
Support
Quality
Security
License
Reuse
CTR prediction model based on spark(LR, GBDT, DNN)
Support
Quality
Security
License
Reuse
hadoop各组件使用,持续更新
Support
Quality
Security
License
Reuse
Postgres to Elasticsearch/OpenSearch sync
Support
Quality
Security
License
Reuse
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Support
Quality
Security
License
Reuse
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Support
Quality
Security
License
Reuse
Expressive types for Spark.
Support
Quality
Security
License
Reuse
Scalable, fast, and lightweight system for large-scale topic modeling
Support
Quality
Security
License
Reuse
U
UserActionAnalyzePlatformby oeljeklaus-you
Java 
847
Version:Current
License: Permissive (Apache-2.0)
电商用户行为分析大数据平台
Support
Quality
Security
License
Reuse
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Support
Quality
Security
License
Reuse
A common bricks library for building scalable and portable distributed machine learning.
Support
Quality
Security
License
Reuse
An extensible distributed system for reliable nearline data streaming at scale
Support
Quality
Security
License
Reuse
Apache Metron
Support
Quality
Security
License
Reuse
Apache Metron
Support
Quality
Security
License
Reuse
Convenient and performant logging library for Scala wrapping SLF4J.
Support
Quality
Security
License
Reuse
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Support
Quality
Security
License
Reuse
A new lossy/lossless image format for photos and the internet
Support
Quality
Security
License
Reuse
e
Jupyter Notebook 
808
Version:Current
License: Permissive (Apache-2.0)
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Support
Quality
Security
License
Reuse
A scalable, mature and versatile web crawler based on Apache Storm
Support
Quality
Security
License
Reuse
Dashboard for Apache APISIX
Support
Quality
Security
License
Reuse
Apache Arrow Ballista Distributed Query Engine
Support
Quality
Security
License
Reuse
Go library providing algorithms optimized to leverage the characteristics of modern CPUs
Support
Quality
Security
License
Reuse
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
Support
Quality
Security
License
Reuse
💥🔥 大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Support
Quality
Security
License
Reuse
S
SQL-Data-Analysis-and-Visualization-Projectsby ptyadana
Jupyter Notebook 
758
Version:Current
License: Permissive (MIT)
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Support
Quality
Security
License
Reuse
Apache Ranger - To enable, monitor and manage comprehensive data security across the Hadoop platform and beyond
Support
Quality
Security
License
Reuse
Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
Support
Quality
Security
License
Reuse
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Support
Quality
Security
License
Reuse
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Support
Quality
Security
License
Reuse
Productionise & schedule your Jupyter Notebooks as easily as you wrote them.
Support
Quality
Security
License
Reuse
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Support
Quality
Security
License
Reuse
Mirror of Apache Bahir Flink
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Read - Write JSON SerDe for Apache Hive.
Support
Quality
Security
License
Reuse
A place for all things related to ye olde Spark Thermostat Hackathon
Support
Quality
Security
License
Reuse
Essential Spark extensions and helper methods ✨😲
Support
Quality
Security
License
Reuse
Mirror of Apache Toree (Incubating)
Support
Quality
Security
License
Reuse
An open source framework for building data analytic applications.
Support
Quality
Security
License
Reuse
Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Support
Quality
Security
License
Reuse
Modular node graph based noise generation library using SIMD, C++17 and templates
Support
Quality
Security
License
Reuse
AR相册 Photo Album For AR
Support
Quality
Security
License
Reuse
A Ruby client for the Cassandra distributed database
Support
Quality
Security
License
Reuse
docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud, Apache Kafka Confluent Platform
Support
Quality
Security
License
Reuse
Apache HAWQ
Support
Quality
Security
License
Reuse
The testing ground for the future of portable SIMD in Rust
Support
Quality
Security
License
Reuse
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Support
Quality
Security
License
Reuse
OpenShift 3 and 4 product and community documentation
Support
Quality
Security
License
Reuse
The MongoDB Spark Connector
Support
Quality
Security
License
Reuse
s
scala-loggingby lightbend-labs
Convenient and performant logging library for Scala wrapping SLF4J.
Scala
890
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-nlp-workshopby JohnSnowLabs
Public runnable examples of using John Snow Labs' NLP for Apache Spark.
Jupyter Notebook
888
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
t
tiflashby pingcap
The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
C++
887
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SparkCTRby wzhe06
CTR prediction model based on spark(LR, GBDT, DNN)
Scala
872
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
Support
Quality
Security
License
Reuse
p
pgsyncby toluaina
Postgres to Elasticsearch/OpenSearch sync
Python
860
Updated: 2 y ago
License: Weak Copyleft (LGPL-3.0)
Support
Quality
Security
License
Reuse
t
tisparkby pingcap
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Scala
856
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
hadoop_studyby realguoshuai
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Java
853
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
f
framelessby typelevel
Expressive types for Spark.
Scala
851
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
L
LightLDAby microsoft
Scalable, fast, and lightweight system for large-scale topic modeling
C++
849
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
U
UserActionAnalyzePlatformby oeljeklaus-you
电商用户行为分析大数据平台
Java
847
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sha256-simdby minio
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Go
837
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
d
dmlc-coreby dmlc
A common bricks library for building scalable and portable distributed machine learning.
C++
835
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
brooklinby linkedin
An extensible distributed system for reliable nearline data streaming at scale
Java
833
Updated: 2 y ago
License: Permissive (BSD-2-Clause)
Support
Quality
Security
License
Reuse
m
Support
Quality
Security
License
Reuse
a
Support
Quality
Security
License
Reuse
s
scala-loggingby lightbend
Convenient and performant logging library for Scala wrapping SLF4J.
Scala
821
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
pyspark-style-guideby palantir
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Python
813
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
p
pikby google
A new lossy/lossless image format for photos and the internet
C++
810
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
elasticsearch-spark-recommenderby IBM
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Jupyter Notebook
808
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
storm-crawlerby DigitalPebble
A scalable, mature and versatile web crawler based on Apache Storm
HTML
803
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
apisix-dashboardby apache
Dashboard for Apache APISIX
Go
802
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
arrow-ballistaby apache
Apache Arrow Ballista Distributed Query Engine
Rust
801
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
asmby segmentio
Go library providing algorithms optimized to leverage the characteristics of modern CPUs
Go
784
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
tensorframesby databricks
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
Scala
761
Updated: 3 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
w
wlhbdpby authorwlh
💥🔥 大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Java
760
Updated: 3 y ago
License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
S
SQL-Data-Analysis-and-Visualization-Projectsby ptyadana
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Jupyter Notebook
758
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
r
rangerby apache
Apache Ranger - To enable, monitor and manage comprehensive data security across the Hadoop platform and beyond
Java
756
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
base64by aklomp
Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
C
751
Updated: 2 y ago
License: Permissive (BSD-2-Clause)
Support
Quality
Security
License
Reuse
z
zinggby zinggAI
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Java
739
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
i
incubator-livyby apache
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Scala
735
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
n
notebookerby man-group
Productionise & schedule your Jupyter Notebooks as easily as you wrote them.
Python
731
Updated: 2 y ago
License: Strong Copyleft (AGPL-3.0)
Support
Quality
Security
License
Reuse
l
libxsmmby libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
C
729
Updated: 2 y ago
License: Permissive (BSD-3-Clause)
Support
Quality
Security
License
Reuse
b
bahir-flinkby apache
Mirror of Apache Bahir Flink
Java
727
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
p
private-join-and-computeby google
C++
720
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
H
Hive-JSON-Serdeby rcongiu
Read - Write JSON SerDe for Apache Hive.
Java
717
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
t
thermostatby particle-iot
A place for all things related to ye olde Spark Thermostat Hackathon
Ruby
717
Updated: 4 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spark-dariaby MrPowers
Essential Spark extensions and helper methods ✨😲
Scala
713
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
i
incubator-toreeby apache
Mirror of Apache Toree (Incubating)
Scala
712
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cdapby cdapio
An open source framework for building data analytic applications.
Java
706
Updated: 2 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
i
impylaby cloudera
Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Python
702
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
F
FastNoise2by Auburn
Modular node graph based noise generation library using SIMD, C++17 and templates
C++
702
Updated: 2 y ago
License: Permissive (MIT)
Support
Quality
Security
License
Reuse
H
HeavenMemoirsby SherlockQi
AR相册 Photo Album For AR
Swift
680
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
c
cassandraby cassandra-rb
A Ruby client for the Cassandra distributed database
Ruby
677
Updated: 4 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cp-all-in-oneby confluentinc
docker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud, Apache Kafka Confluent Platform
Python
674
Updated: 2 y ago
License: No License (No License)
Support
Quality
Security
License
Reuse
h
Support
Quality
Security
License
Reuse
p
portable-simdby rust-lang
The testing ground for the future of portable SIMD in Rust
Rust
673
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
T
TonYby tony-framework
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Java
672
Updated: 3 y ago
License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
o
openshift-docsby openshift
OpenShift 3 and 4 product and community documentation
HTML
670
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
mongo-sparkby mongodb
The MongoDB Spark Connector
Java
669
Updated: 2 y ago
License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse