shards | Configurable , scalable and automatic sharding library | Bot library

by servusDei2018 Go Version: v1.2.1 License: MIT

X-Ray Key Features Code Snippets(5)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | shards Summary

shards is a Go library typically used in Automation, Bot, Discord applications. shards has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Configurable, scalable and automatic sharding library for discordgo.

Support

Quality

Security

License

Reuse

Support

shards has a low active ecosystem.

It has 5 star(s) with 2 fork(s). There are 3 watchers for this library.

It had no major release in the last 12 months.

There are 0 open issues and 1 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of shards is v1.2.1

Quality

shards has 0 bugs and 0 code smells.

Security

shards has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

shards code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

shards is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

shards releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

It has 260 lines of code, 19 functions and 3 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of shards

Get all kandi verified functions for this library.

shards Key Features

No Key Features are available at this moment for shards.

shards Examples and Code Snippets

Usage

Lines of Code : 3

License : Permissive (MIT)

Copy

import (
	"github.com/servusdei2018/shards"
)

Installing

Lines of Code : 1

License : Permissive (MIT)

Copy

go get github.com/servusdei2018/shards

Return the shape of a list of shards .

python

Lines of Code : 35

License : Non-SPDX (Apache License 2.0)

Copy

def get_unsharded_shape(self, shapes):
    """Returns the shape of an unsharded Tensor given a list of shards.

    When given a list of shapes of shards, returns the shape of the
    unsharded Tensor that would generate the shards. Sets defaults for

Creates a dataset based on auto shards .

python

Lines of Code : 30

License : Non-SPDX (Apache License 2.0)

Copy

def auto_shard_dataset(dataset, num_shards, index, num_replicas_in_sync=None):
  """Shard the input pipeline by sharding the underlying list of files.

  Args:
    dataset: A `tf.data.Dataset` instance, typically the result of a bunch of
      datase

Sets the number of shards .

python

Lines of Code : 25

License : Non-SPDX (Apache License 2.0)

Copy

def set_number_of_shards(self, number_of_shards):
    """Sets the number of shards for the current policy.

    If the policy has been frozen then number_of_shards must match the
    existing setting.

    Args:
      number_of_shards: The number of

Community Discussions

Trending Discussions on shards

react render blank screen when redirect ( no Switch from react-router-dom )

How to inject delay between the window and sink operator?

Primary Keys (UUID, Sequence) strategy on Citus

Elasticsearch service hang and kills while data insertion jvm heap

mongodb unable establish remote cursors

Tensorflow Datasets: Crop/Resize images per batch after dataset.batch()

Best Approach for Developing a SaaS-based Event Analytics

Amazon Redshift: `ActiveStatementsExceededException` (how to do INSERTs concurrently)

BigQuery EXPORT sql is sharding small query results

Node.Js version 14 throw e; error for discord bots (discord.js)

QUESTION

react render blank screen when redirect ( no Switch from react-router-dom )

Asked 2022-Mar-22 at 22:58

im using templete and it use old version of react & routing i want to redirect user to login if not but this templet have diffrent routing method in app.js

"react-router-dom": "^4.3.1",
"react": "^16.6.3",
"react-dom": "^16.6.3",

...

ANSWER

Answered 2022-Mar-22 at 22:27

You should give isLogin data from redux and declare a condition inside a '/' route. like this:

Source https://stackoverflow.com/questions/71579601

QUESTION

How to inject delay between the window and sink operator?

Asked 2022-Mar-08 at 07:37

Context - Application

We have an Apache Flink application which processes events

The application uses event time characteristics
The application shards (keyBy) events based on the sessionId field
The application has windowing with 1 minute tumbling window
- The windowing is specified by a reduce and a process functions
- So, for each session we will have 1 computed record
The application emits the data into a Postgres sink

Context - Infrastructure

Application:

It is hosted in AWS via Kinesis Data Analytics (KDA)
It is running in 5 different regions
The exact same code is running in each region

Database:

It is hosted in AWS via RDS (currently it is a PostgreSQL)
It is located in one region (with a read replica in a different region)

Problem

Because we are using event time characteristics with 1 minute tumbling window all regions' sink emit their records nearly at the same time.

What we want to achieve is to add artificial delay between window and sink operators to postpone sink emition.

Flink App Offset Window 1 Sink 1st run Window 2 Sink 2nd run #1 0 60 60 120 120 #2 12 60 72 120 132 #3 24 60 84 120 144 #4 36 60 96 120 156 #5 48 60 108 120 168 Not working work-around

We have thought that we can add some sleep to evictor's evictBefore like this

...

ANSWER

Answered 2022-Mar-07 at 16:03

You could use TumblingEventTimeWindows of(Time size, Time offset, WindowStagger windowStagger) with WindowStagger.RANDOM.

See https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/windowing/assigners/WindowStagger.html for documentation.

Source https://stackoverflow.com/questions/71380016

QUESTION

Primary Keys (UUID, Sequence) strategy on Citus

Asked 2021-Dec-30 at 02:30

What is the best approach for primary keys on Citus?

UUID: No lock required, as opposed to the Identity/Serial. But expensive in storage and eventually on queries + causes fragmentations.

Sequence - Identity Causes a bottleneck while creating an entity. Less expensive in storage and queries will be faster + no fragmentations.

If we want to be scale-ready project, will it better to work with UUID?

According to this post: https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/

For shards it is recommended to work with UUID eventually.

How well it perform on Citus?

I'll give a schema example:

...

ANSWER

Answered 2021-Dec-30 at 02:30

Disclaimer: I am the Software Engineer working on Citus engine that opened a PR for supporting UDFs in column defaults.

In the post you shared in the question gen_random_uuid() UDF is used as a column default. This is not yet supported on Citus.

I suggest you use bigserial.

I also want to note that some of the statements in the blog are not correct for Citus. For example:

So if you use sharding, where you distribute your data across several databases, you cannot use a sequence.

In Citus, you can create distributed sequence objects where each worker node will use a disjoint subset of the allowed values of the sequence. The application sees a single sequence on the coordinator node, while the shards have their own sequences with different start values.

(You could use sequences defined with an INCREMENT greater than 1 and different START values, but that might lead to problems when you add additional shards.)

Citus shifts the start of a bigserial by node_group_id * 2^48 and it means that you are limited to a maximum of 2^18 shards that is practically unlimited. You will run into bigger issues when you have petabytes of data in a single table, or a quarter of a million shards, and this limitations here will not really affect your cluster.

PS: My work on the UDF column defaults is on hold now, as some recent code changes will make the solution cleaner, and easier to maintain. We did not yet ship the changes in a released version. https://github.com/citusdata/citus/pull/5149

Source https://stackoverflow.com/questions/70505619

QUESTION

Elasticsearch service hang and kills while data insertion jvm heap

Asked 2021-Dec-04 at 14:11

I am using elasticsearch 5.6.13 version, I need some experts configurations for the elasticsearch. I have 3 nodes in the same system (node1,node2,node3) where node1 is master and else 2 data nodes. I have number of indexes around 40, I created all these indexes with default 5 primary shards and some of them have 2 replicas. What I am facing the issue right now, My data (scraping) is growing day by day and I have 400GB of the data in my one of index. similarly 3 other indexes are also very loaded. From some last days I am facing the issue while insertion of data my elasticsearch hangs and then the service is killed which effect my processing. I have tried several things. I am sharing the system specs and current ES configuration + logs. Please suggest some solution.

The System Specs: RAM: 160 GB, CPU: AMD EPYC 7702P 64-Core Processor, Drive: 2 TB SSD (The drive in which the ES installed still have 500 GB left)

ES Configuration JVM options: -Xms26g, -Xmx26g (I just try this but not sure what is the perfect heap size for my scenario) I just edit this above lines and the rest of the file is as defult. I edit this on all three nodes jvm.options files.

ES LOGS

[2021-09-22T12:05:17,983][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][170] overhead, spent [7.1s] collecting in the last [7.2s] [2021-09-22T12:05:21,868][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][171] overhead, spent [3.7s] collecting in the last [1.9s] [2021-09-22T12:05:51,190][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][172] overhead, spent [27.7s] collecting in the last [23.3s] [2021-09-22T12:06:54,629][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][173] overhead, spent [57.5s] collecting in the last [1.1m] [2021-09-22T12:06:56,536][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][174] overhead, spent [1.9s] collecting in the last [1.9s] [2021-09-22T12:07:02,176][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][175] overhead, spent [5.4s] collecting in the last [5.6s] [2021-09-22T12:06:56,546][ERROR][o.e.i.e.Engine ] [cluster_name] [index_name][3] merge failed java.lang.OutOfMemoryError: Java heap space

[2021-09-22T12:06:56,548][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [cluster_name] fatal error in thread [elasticsearch[cluster_name][bulk][T#25]], exiting java.lang.OutOfMemoryError: Java heap space

Some more logs

[2021-09-22T12:10:06,526][INFO ][o.e.n.Node ] [cluster_name] initializing ... [2021-09-22T12:10:06,589][INFO ][o.e.e.NodeEnvironment ] [cluster_name] using [1] data paths, mounts [[(D:)]], net usable_space [563.3gb], net total_space [1.7tb], spins? [unknown], types [NTFS] [2021-09-22T12:10:06,589][INFO ][o.e.e.NodeEnvironment ] [cluster_name] heap size [1.9gb], compressed ordinary object pointers [true] [2021-09-22T12:10:07,239][INFO ][o.e.n.Node ] [cluster_name] node name [sashanode1], node ID [2p-ux-OXRKGuxmN0efvF9Q] [2021-09-22T12:10:07,240][INFO ][o.e.n.Node ] [cluster_name] version[5.6.13], pid[57096], build[4d5320b/2018-10-30T19:05:08.237Z], OS[Windows Server 2019/10.0/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_261/25.261-b12] [2021-09-22T12:10:07,240][INFO ][o.e.n.Node ] [cluster_name] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Delasticsearch, -Des.path.home=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1, -Des.default.path.logs=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\logs, -Des.default.path.data=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\data, -Des.default.path.conf=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\config, exit, -Xms2048m, -Xmx2048m, -Xss1024k]

Also in my ES folder there are so many files with the random names (java_pid197036.hprof) Further details can be shared please suggest any further configurations. Thanks

The output for _cluster/stats?pretty&human is

{ "_nodes": { "total": 3, "successful": 3, "failed": 0 }, "cluster_name": "cluster_name", "timestamp": 1632375228033, "status": "red", "indices": { "count": 42, "shards": { "total": 508, "primaries": 217, "replication": 1.3410138248847927, "index": { "shards": { "min": 2, "max": 60, "avg": 12.095238095238095 }, "primaries": { "min": 1, "max": 20, "avg": 5.166666666666667 }, "replication": { "min": 1.0, "max": 2.0, "avg": 1.2857142857142858 } } }, "docs": { "count": 107283077, "deleted": 1047418 }, "store": { "size": "530.2gb", "size_in_bytes": 569385384976, "throttle_time": "0s", "throttle_time_in_millis": 0 }, "fielddata": { "memory_size": "0b", "memory_size_in_bytes": 0, "evictions": 0 }, "query_cache": { "memory_size": "0b", "memory_size_in_bytes": 0, "total_count": 0, "hit_count": 0, "miss_count": 0, "cache_size": 0, "cache_count": 0, "evictions": 0 }, "completion": { "size": "0b", "size_in_bytes": 0 }, "segments": { "count": 3781, "memory": "2gb", "memory_in_bytes": 2174286255, "terms_memory": "1.7gb", "terms_memory_in_bytes": 1863786029, "stored_fields_memory": "105.6mb", "stored_fields_memory_in_bytes": 110789048, "term_vectors_memory": "0b", "term_vectors_memory_in_bytes": 0, "norms_memory": "31.9mb", "norms_memory_in_bytes": 33527808, "points_memory": "13.1mb", "points_memory_in_bytes": 13742470, "doc_values_memory": "145.3mb", "doc_values_memory_in_bytes": 152440900, "index_writer_memory": "0b", "index_writer_memory_in_bytes": 0, "version_map_memory": "0b", "version_map_memory_in_bytes": 0, "fixed_bit_set": "0b", "fixed_bit_set_memory_in_bytes": 0, "max_unsafe_auto_id_timestamp": 1632340789677, "file_sizes": { } } }, "nodes": { "count": { "total": 3, "data": 3, "coordinating_only": 0, "master": 1, "ingest": 3 }, "versions": [ "5.6.13" ], "os": { "available_processors": 192, "allocated_processors": 96, "names": [ { "name": "Windows Server 2019", "count": 3 } ], "mem": { "total": "478.4gb", "total_in_bytes": 513717497856, "free": "119.7gb", "free_in_bytes": 128535437312, "used": "358.7gb", "used_in_bytes": 385182060544, "free_percent": 25, "used_percent": 75 } }, "process": { "cpu": { "percent": 5 }, "open_file_descriptors": { "min": -1, "max": -1, "avg": 0 } }, "jvm": { "max_uptime": "1.9d", "max_uptime_in_millis": 167165106, "versions": [ { "version": "1.8.0_261", "vm_name": "Java HotSpot(TM) 64-Bit Server VM", "vm_version": "25.261-b12", "vm_vendor": "Oracle Corporation", "count": 3 } ], "mem": { "heap_used": "5gb", "heap_used_in_bytes": 5460944144, "heap_max": "5.8gb", "heap_max_in_bytes": 6227755008 }, "threads": 835 }, "fs": { "total": "1.7tb", "total_in_bytes": 1920365228032, "free": "499.1gb", "free_in_bytes": 535939969024, "available": "499.1gb", "available_in_bytes": 535939969024 }, "plugins": [ ], "network_types": { "transport_types": { "netty4": 3 }, "http_types": { "netty4": 3 } } } }

The jvm.options file.

...

ANSWER

Answered 2021-Oct-08 at 06:38

My issue is solved, It is due to the heap size issue, actually I am running the ES as service and the heap size is by default 2 GB and it is not reflecting. I just install the new service with the updated options.jvm file with heap size of 10 GB, and then run my cluster. It reflect the heap size from 2 GB to 10 GB. And my problem is solved. Thanks for the suggestions.

to check your heap size use this command.

Source https://stackoverflow.com/questions/69280083

QUESTION

mongodb unable establish remote cursors

Asked 2021-Dec-02 at 15:52

Anybody seen this messages before in mongodb sharded cluster 4.0.16 mongos during balancing:

...

ANSWER

Answered 2021-Dec-02 at 15:52

This message is expected behaviour during balancing when there is read request for documents already migrated to other shard.
The meaning is that the mongos is not able to establish remote cursor to the old shard since the config is reported stale and data is moved to the new shard.
No fix is necessary this is informative message only.

Source https://stackoverflow.com/questions/69699723

QUESTION

Tensorflow Datasets: Crop/Resize images per batch after dataset.batch()

Asked 2021-Dec-02 at 08:56

Is it possible to Crop/Resize images per batch ?

I'm using Tensorflow dataset API as below:

...

ANSWER

Answered 2021-Dec-01 at 14:51

Generally, you can try something like this:

Source https://stackoverflow.com/questions/70091290

QUESTION

Best Approach for Developing a SaaS-based Event Analytics

Asked 2021-Nov-22 at 07:46

My company is developing a SaaS service to store events and provide analytics through dashboards.

Since we won't have deletes or updates, the idea is to create a columnar-based, OLAP architecture to benefit from compression and latency it provides, and PostgreSQL Citus is one platform we intend to evaluate.

The overall architecture is pretty standard: an API will receive the events and then store them on Kafka in JSON format. Then, those events will be sent to PostgreSQL. Some fields will be "jsonb" data type.

By reading the docs, the best practice is distribute tables by tenant id.

Just wanted to doucle-check a few things and would greatly appreciate someone's input:

Does the architecture described above make sense? Is there anything we should change or pay attention to?
Are there limitations in the number of nodes or shards that can be scaled out for this columnar approach?
Is GIN index supported? (I believe it is, since it's not listed in 'Limitations')

Thanks!

...

ANSWER

Answered 2021-Nov-22 at 07:46

I've used citus for a multi-tenant service and distributing tables by the tenant id worked pretty well.

The overall architecture that you described makes sense but I'm not sure if someone from outside or at least without some details can give you much more to work with. Sending events through Kafka and processing them to be stored somewhere is a pretty standard way of working with events these days so nothing crazy is happening there.

There seem to be no limitations to scale out in terms of the number of nodes but what you should keep in mind is that how you set your shards count from the start. re-balancing will lock your tables and can take a while to finish so you want to keep it as small and easy to process as you can. Take a look here for more details: https://docs.citusdata.com/en/v10.2/faq/faq.html#how-do-i-choose-the-shard-count-when-i-hash-partition-my-data

GIN indexes are supported as they used it in their examples: https://docs.citusdata.com/en/v10.2/use_cases/multi_tenant.html?highlight=GIN#when-data-differs-across-tenants

Also, note that you won't have failover support in the community version. You have to go with the enterprise version which supports failover and also allows you to rebalance the tables without locking the entire table.

I wrote a simple service to handle the failover which you can use: https://github.com/Navid2zp/citus-failover

Source https://stackoverflow.com/questions/70039768

QUESTION

Amazon Redshift: `ActiveStatementsExceededException` (how to do INSERTs concurrently)

Asked 2021-Oct-26 at 17:08

I have a Kinesis cluster that's pushing data into Amazon Redshift via Lambda.

Currently my lambda code looks something like this:

...

ANSWER

Answered 2021-Oct-26 at 16:15

The comment in your code gives me pause - "query = # prepare an INSERT query here". This seems to imply that you are reading the S3 data into Lambda and INSERTing the this data into Redshift. If so this is not a good pattern.

First off Redshift expects data to be brought into the cluster through COPY (or Spectrum or ...) but not through INSERT. This will create issues in Redshift with managing the transactions and create a tremendous waste or disk space / need for VACUUM. The INSERT approach for putting data in Redshift is an anti-pattern and shouldn't be done for even moderate sizes of data.

More generally the concern is the data movement impedance mismatch. Kinesis is lots of independent streams of data and code generating small files. Redshift is a massive database that works on large data segments. Mismatching these tools in a way that misses their designed targets will make either of them perform very poorly. You need to match the data requirement by batching up S3 into Redshift. This means COPYing many S3 files into Redshift in a single COPY command. This can be done with manifests or by "directory" structure in S3. "COPY everything from S3 path ..." This process of COPYing data into Redshift can be run every time interval (2 or 5 or 10 minutes). So you want your Kinesis Lambdas to organize the data in S3 (or add to a manifest) so that a "batch" of S3 files can be collected up for a COPY execution. This way a large number of S3 files can be brought into Redshift at once (its preferred data size) and will also greatly reduce your execute API calls.

Now if you have a very large Kinesis pipe set up and the data is very large there is another data movement "preference" to take into account. This only matters when you are moving a lot of data per minute. This extra preference is for S3. S3 being an object store means that there is a significant amount of time taken up by "looking up" a requested object key. It is about .5 sec. So reading a thousand S3 objects will take 500 require (in total) 500 seconds of key lookup time. Redshift will make requests to S3 in parallel, one per slice in the cluster, so some of this time is in parallel. If the files being read are 1KB in size the data transfer of the data, after S3 lookup is complete, will be about 1.25 sec. total. Again this time is in parallel but you can see how much time is spent in lookup vs. transfer. To get the maximum bandwidth out of S3 for reading many files, these files need to be 1GB in size (100MB is ok in my experience). You can see if you are to ingest millions of files per minute from Kinesis into Redshift you will need a process to combine many small files into bigger files to avoid this hazard of S3. Since you are using Lambda as your Kinesis reader I expect that you aren't to this data rate yet but it is good to have your eyes on this issue if you expect to expand to a very large scale.

Just because tools have high bandwidth doesn't mean that they can be piped together. Bandwidth comes in many styles.

Source https://stackoverflow.com/questions/69725643

QUESTION

BigQuery EXPORT sql is sharding small query results

Asked 2021-Oct-20 at 12:50

I have a partitioned table with less than 600MB data in the targeted partitions.

When I run an EXPORT query with a subquery that returns ~1000 rows, EXPORT shards the data into 27(!) files.

This is not a big query result. I assume it's happening because the optimizer sees the 600MB, but I'm not sure.

Has anyone else encountered this? I am GZIPing the results so concatenating would involve unzipping, appending and zipping again...

...

ANSWER

Answered 2021-Oct-20 at 12:50

I have run many various export scenarios in BigQuery. I think there is only one use case when BigQuery export would export multiple files where Table size is partitioned and smaller than 1 GB. It's using wildcards during exporting.

BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the URI except as part of the bucket name. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern.

I have tested this on using Public database with partitioned and normal table:

Partitioned table: bigquery-public-data.covid19_geotab_mobility_impact.us_border_wait_times
Normal table: bigquery-public-data.covid19_google_mobility.mobility_report

Scenario 1:

Normal table: bigquery-public-data.covid19_google_mobility.mobility_report

Source https://stackoverflow.com/questions/69463008

QUESTION

Node.Js version 14 throw e; error for discord bots (discord.js)

Asked 2021-Sep-27 at 16:23

I forked a Discord music bot from GitHub to ReplIt and then I try to follow steps to run the bot successfully!

I use Node.JS v.14!

When I run the bot I receive following error:

...

ANSWER

Answered 2021-Sep-26 at 08:23

You would have to force clean your npm cache and remove your node_modules and reinstall them again to compile them against a new node version, for the same you may use the following commands in your Console / Shell:

Source https://stackoverflow.com/questions/69333030

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install shards

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: