Build, run and manage your data pipelines with Python or SQL on any cloud
Support
Quality
Security
License
Reuse
Cassandra DB native client written in Rust language. Find 1.x versions on https://github.com/AlexPikalov/cdrs/tree/v.1.x Looking for an async version? - Check WIP https://github.com/AlexPikalov/cdrs-async
Support
Quality
Security
License
Reuse
Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker.
Support
Quality
Security
License
Reuse
Sparkling Pandas
Support
Quality
Security
License
Reuse
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Support
Quality
Security
License
Reuse
HDFS-alike in Go. Written in 2014 to learn the language and get a job.
Support
Quality
Security
License
Reuse
Django Cassandra Engine - the Cassandra backend for Django
Support
Quality
Security
License
Reuse
Next-generation web analytics processing with Scala, Spark, and Parquet.
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Harmonious distributed data analysis in Rust.
Support
Quality
Security
License
Reuse
CentOS Cloud Instance SIG: Metadata to build & release instances
Support
Quality
Security
License
Reuse
A boilerplate for writing PySpark Jobs
Support
Quality
Security
License
Reuse
Create clusters of VMs on the cloud and configure them with Ansible.
Support
Quality
Security
License
Reuse
Examples for learning spark
Support
Quality
Security
License
Reuse
<<THIS REPOSITORY IS DEPRECATED>> The HTTP Archive provides information about website performance such as # of HTTP requests, use of gzip, and amount of JavaScript. This information is recorded over time revealing trends in how the Internet is performing. Built using Open Source software, the code and data are available to everyone allowing researchers large and small to work from a common base.
Support
Quality
Security
License
Reuse
xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚
Support
Quality
Security
License
Reuse
[CVPR 2022 Oral] QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
Support
Quality
Security
License
Reuse
The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Support
Quality
Security
License
Reuse
Fast integer compression in C using the StreamVByte codec
Support
Quality
Security
License
Reuse
Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Support
Quality
Security
License
Reuse
A tool for monitoring and tuning Spark jobs for efficiency.
Support
Quality
Security
License
Reuse
Reactive Hive Toolkit for Streaming Platforms
Support
Quality
Security
License
Reuse
A Bulk Data Pipeline out of Cassandra
Support
Quality
Security
License
Reuse
Mirror of Apache HttpCore
Support
Quality
Security
License
Reuse
Train and run Pytorch models on Apache Spark.
Support
Quality
Security
License
Reuse
Apache hadoop management system
Support
Quality
Security
License
Reuse
Fast. Scalable. Powerful. The Blockchain for Web3
Support
Quality
Security
License
Reuse
Crunch is an Apache TLP now, and lives at http://crunch.apache.org/
Support
Quality
Security
License
Reuse
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Support
Quality
Security
License
Reuse
Powered by Spark Streaming & Siddhi
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Distributed SQL Engine in Python using Dask
Support
Quality
Security
License
Reuse
Serverless proxy for Spark cluster
Support
Quality
Security
License
Reuse
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Support
Quality
Security
License
Reuse
Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
Support
Quality
Security
License
Reuse
Spark library for easy MongoDB access
Support
Quality
Security
License
Reuse
Drop-in replacement of LINQ aggregation operations extremely faster with SIMD.
Support
Quality
Security
License
Reuse
TensorFlow on Spark
Support
Quality
Security
License
Reuse
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Support
Quality
Security
License
Reuse
Support
Quality
Security
License
Reuse
Open single and half precision gemm implementations
Support
Quality
Security
License
Reuse
dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
Support
Quality
Security
License
Reuse
my build machine for bromite development
Support
Quality
Security
License
Reuse
大数据采集,抽取平台
Support
Quality
Security
License
Reuse
Easy to use library to bring Tensorflow on Apache Spark
Support
Quality
Security
License
Reuse
Connect Spark to HBase for reading and writing data with ease
Support
Quality
Security
License
Reuse
An open platform and marketplace for distributed computations
Support
Quality
Security
License
Reuse
Standalone VM using LLVM JIT
Support
Quality
Security
License
Reuse
s
spark-bigquery-connectorby GoogleCloudDataproc
Java 298 Version:Current License: Permissive (Apache-2.0)
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Support
Quality
Security
License
Reuse
Demonstrates a software (CPU) based approach to occllusion culling using multi-threading and SIMD instructions to improve performance.
Support
Quality
Security
License
Reuse
v
versatile-data-kitby vmware
Build, run and manage your data pipelines with Python or SQL on any cloud
Python 340Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cdrsby AlexPikalov
Cassandra DB native client written in Rust language. Find 1.x versions on https://github.com/AlexPikalov/cdrs/tree/v.1.x Looking for an async version? - Check WIP https://github.com/AlexPikalov/cdrs-async
Rust 338Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
G
Gather-Deploymentby huseinzol05
Gathers scalable Tensorflow and Python infrastructure deployment, Husein Go-To for development, 100% Docker.
Jupyter Notebook 337Updated: 3 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
sparklingpandasby sparklingpandas
Sparkling Pandas
Python 337Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cascadingby cwensel
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Java 337Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
g
golang-distributed-filesystemby ligfx
HDFS-alike in Go. Written in 2014 to learn the language and get a job.
Go 336Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
d
django-cassandra-engineby r4fek
Django Cassandra Engine - the Cassandra backend for Django
Python 335Updated: 2 y ago License: Permissive (BSD-2-Clause)
Support
Quality
Security
License
Reuse
s
spindleby adobe-research
Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript 335Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
k
kooberby jamesward
Scala 333Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
a
amadeusby constellation-rs
Harmonious distributed data analysis in Rust.
Rust 332Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sig-cloud-instance-buildby CentOS
CentOS Cloud Instance SIG: Metadata to build & release instances
Shell 332Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
P
PySpark-Boilerplateby ekampf
A boilerplate for writing PySpark Jobs
Python 331Updated: 4 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
e
elasticlusterby elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Python 329Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
l
learning-spark-examplesby holdenk
Examples for learning spark
Java 328Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
l
legacy.httparchive.orgby HTTPArchive
<<THIS REPOSITORY IS DEPRECATED>> The HTTP Archive provides information about website performance such as # of HTTP requests, use of gzip, and amount of JavaScript. This information is recorded over time revealing trends in how the Internet is performing. Built using Open Source software, the code and data are available to everyone allowing researchers large and small to work from a common base.
PHP 328Updated: 3 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
x
xichuan_noteby Raray-chuan
xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚
Java 328Updated: 2 y ago License: No License (No License)
Support
Quality
Security
License
Reuse
Q
QueryDet-PyTorchby ChenhongyiYang
[CVPR 2022 Oral] QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
Python 327Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
a
amazon-kinesis-scaling-utilsby awslabs
The Kinesis Scaling Utility is designed to give you the ability to scale Amazon Kinesis Streams in the same way that you scale EC2 Auto Scaling groups – up or down by a count or as a percentage of the total fleet. You can also simply scale to an exact number of Shards. There is no requirement for you to manage the allocation of the keyspace to Shards when using this API, as it is done automatically.
Java 325Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
streamvbyteby lemire
Fast integer compression in C using the StreamVByte codec
C 324Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Spark-SQL-on-HBaseby Huawei-Spark
Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
Scala 320Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparklintby groupon
A tool for monitoring and tuning Spark jobs for efficiency.
Scala 319Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
r
rxhiveby sksamuel
Reactive Hive Toolkit for Streaming Platforms
Kotlin 318Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
a
aegisthusby Netflix
A Bulk Data Pipeline out of Cassandra
Java 316Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
h
httpcomponents-coreby apache
Mirror of Apache HttpCore
Java 315Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
sparktorchby dmmiller612
Train and run Pytorch models on Apache Spark.
Python 313Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
e
easyhadoopby xianglei
Apache hadoop management system
PHP 313Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
h
hiveby openhive-network
Fast. Scalable. Powerful. The Blockchain for Web3
C++ 313Updated: 2 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
c
crunchby cloudera
Crunch is an Apache TLP now, and lives at http://crunch.apache.org/
Java 312Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
c
cloudflowby lightbend
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Scala 312Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
D
Decisionby Stratio
Powered by Spark Streaming & Siddhi
Java 311Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
s
spark-installerby laravel
PHP 311Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
dask-sqlby dask-contrib
Distributed SQL Engine in Python using Dask
Python 311Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
m
mistby Hydrospheredata
Serverless proxy for Spark cluster
Scala 310Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
m
morpheusby opencypher
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Scala 307Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
i
incubator-streampipesby apache
Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
Java 306Updated: 3 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
Spark-MongoDBby Stratio
Spark library for easy MongoDB access
Scala 306Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
S
SimdLinqby Cysharp
Drop-in replacement of LINQ aggregation operations extremely faster with SIMD.
C# 306Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
t
Support
Quality
Security
License
Reuse
s
std-simdby VcDevel
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
C++ 304Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
g
gohadoopby hortonworks
Go 304Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
o
openai-gemmby openai
Open single and half precision gemm implementations
C 303Updated: 4 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
d
dbt-sparkby dbt-labs
dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
Python 302Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
b
bromite-buildtoolsby uazo
my build machine for bromite development
Shell 302Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
z
Support
Quality
Security
License
Reuse
s
sparkflowby lifeomic
Easy to use library to bring Tensorflow on Apache Spark
Python 299Updated: 2 y ago License: Permissive (MIT)
Support
Quality
Security
License
Reuse
s
spark-hbase-connectorby nerdammer
Connect Spark to HBase for reading and writing data with ease
Scala 299Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
y
yagnaby golemfactory
An open platform and marketplace for distributed computations
Rust 299Updated: 2 y ago License: Strong Copyleft (GPL-3.0)
Support
Quality
Security
License
Reuse
w
wasm-jit-prototypeby WebAssembly
Standalone VM using LLVM JIT
C++ 299Updated: 4 y ago License: Proprietary (Proprietary)
Support
Quality
Security
License
Reuse
s
spark-bigquery-connectorby GoogleCloudDataproc
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Java 298Updated: 2 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse
O
OcclusionCullingby GameTechDev
Demonstrates a software (CPU) based approach to occllusion culling using multi-threading and SIMD instructions to improve performance.
C++ 298Updated: 4 y ago License: Permissive (Apache-2.0)
Support
Quality
Security
License
Reuse