shards | Configurable , scalable and automatic sharding library | Bot library
kandi X-RAY | shards Summary
kandi X-RAY | shards Summary
Configurable, scalable and automatic sharding library for discordgo.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of shards
shards Key Features
shards Examples and Code Snippets
def get_unsharded_shape(self, shapes):
"""Returns the shape of an unsharded Tensor given a list of shards.
When given a list of shapes of shards, returns the shape of the
unsharded Tensor that would generate the shards. Sets defaults for
def auto_shard_dataset(dataset, num_shards, index, num_replicas_in_sync=None):
"""Shard the input pipeline by sharding the underlying list of files.
Args:
dataset: A `tf.data.Dataset` instance, typically the result of a bunch of
datase
def set_number_of_shards(self, number_of_shards):
"""Sets the number of shards for the current policy.
If the policy has been frozen then number_of_shards must match the
existing setting.
Args:
number_of_shards: The number of
Community Discussions
Trending Discussions on shards
QUESTION
im using templete and it use old version of react & routing i want to redirect user to login if not but this templet have diffrent routing method in app.js
"react-router-dom": "^4.3.1",
"react": "^16.6.3",
"react-dom": "^16.6.3",
ANSWER
Answered 2022-Mar-22 at 22:27You should give isLogin data from redux and declare a condition inside a '/' route. like this:
QUESTION
We have an Apache Flink application which processes events
- The application uses event time characteristics
- The application shards (
keyBy
) events based on thesessionId
field - The application has windowing with 1 minute tumbling window
- The windowing is specified by a
reduce
and aprocess
functions - So, for each session we will have 1 computed record
- The windowing is specified by a
- The application emits the data into a Postgres sink
Application:
- It is hosted in AWS via Kinesis Data Analytics (KDA)
- It is running in 5 different regions
- The exact same code is running in each region
Database:
- It is hosted in AWS via RDS (currently it is a PostgreSQL)
- It is located in one region (with a read replica in a different region)
Because we are using event time characteristics with 1 minute tumbling window all regions' sink emit their records nearly at the same time.
What we want to achieve is to add artificial delay between window and sink operators to postpone sink emition.
Flink App Offset Window 1 Sink 1st run Window 2 Sink 2nd run #1 0 60 60 120 120 #2 12 60 72 120 132 #3 24 60 84 120 144 #4 36 60 96 120 156 #5 48 60 108 120 168 Not working work-aroundWe have thought that we can add some sleep to evictor's evictBefore
like this
ANSWER
Answered 2022-Mar-07 at 16:03You could use TumblingEventTimeWindows of(Time size, Time offset, WindowStagger windowStagger)
with WindowStagger.RANDOM
.
See https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/windowing/assigners/WindowStagger.html for documentation.
QUESTION
What is the best approach for primary keys on Citus?
UUID: No lock required, as opposed to the Identity/Serial. But expensive in storage and eventually on queries + causes fragmentations.
Sequence - Identity Causes a bottleneck while creating an entity. Less expensive in storage and queries will be faster + no fragmentations.
If we want to be scale-ready project, will it better to work with UUID?
According to this post: https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/
For shards it is recommended to work with UUID eventually.
How well it perform on Citus?
I'll give a schema example:
...ANSWER
Answered 2021-Dec-30 at 02:30Disclaimer: I am the Software Engineer working on Citus engine that opened a PR for supporting UDFs in column defaults.
In the post you shared in the question gen_random_uuid()
UDF is used as a column default. This is not yet supported on Citus.
I suggest you use bigserial.
I also want to note that some of the statements in the blog are not correct for Citus. For example:
So if you use sharding, where you distribute your data across several databases, you cannot use a sequence.
In Citus, you can create distributed sequence objects where each worker node will use a disjoint subset of the allowed values of the sequence. The application sees a single sequence on the coordinator node, while the shards have their own sequences with different start values.
(You could use sequences defined with an INCREMENT greater than 1 and different START values, but that might lead to problems when you add additional shards.)
Citus shifts the start of a bigserial by node_group_id * 2^48
and it means that you are limited to a maximum of 2^18
shards that is practically unlimited. You will run into bigger issues when you have petabytes of data in a single table, or a quarter of a million shards, and this limitations here will not really affect your cluster.
PS: My work on the UDF column defaults is on hold now, as some recent code changes will make the solution cleaner, and easier to maintain. We did not yet ship the changes in a released version. https://github.com/citusdata/citus/pull/5149
QUESTION
I am using elasticsearch 5.6.13 version, I need some experts configurations for the elasticsearch. I have 3 nodes in the same system (node1,node2,node3) where node1 is master and else 2 data nodes. I have number of indexes around 40, I created all these indexes with default 5 primary shards and some of them have 2 replicas. What I am facing the issue right now, My data (scraping) is growing day by day and I have 400GB of the data in my one of index. similarly 3 other indexes are also very loaded. From some last days I am facing the issue while insertion of data my elasticsearch hangs and then the service is killed which effect my processing. I have tried several things. I am sharing the system specs and current ES configuration + logs. Please suggest some solution.
The System Specs: RAM: 160 GB, CPU: AMD EPYC 7702P 64-Core Processor, Drive: 2 TB SSD (The drive in which the ES installed still have 500 GB left)
ES Configuration JVM options: -Xms26g, -Xmx26g (I just try this but not sure what is the perfect heap size for my scenario) I just edit this above lines and the rest of the file is as defult. I edit this on all three nodes jvm.options files.
ES LOGS
[2021-09-22T12:05:17,983][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][170] overhead, spent [7.1s] collecting in the last [7.2s] [2021-09-22T12:05:21,868][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][171] overhead, spent [3.7s] collecting in the last [1.9s] [2021-09-22T12:05:51,190][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][172] overhead, spent [27.7s] collecting in the last [23.3s] [2021-09-22T12:06:54,629][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][173] overhead, spent [57.5s] collecting in the last [1.1m] [2021-09-22T12:06:56,536][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][174] overhead, spent [1.9s] collecting in the last [1.9s] [2021-09-22T12:07:02,176][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][175] overhead, spent [5.4s] collecting in the last [5.6s] [2021-09-22T12:06:56,546][ERROR][o.e.i.e.Engine ] [cluster_name] [index_name][3] merge failed java.lang.OutOfMemoryError: Java heap space
[2021-09-22T12:06:56,548][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [cluster_name] fatal error in thread [elasticsearch[cluster_name][bulk][T#25]], exiting java.lang.OutOfMemoryError: Java heap space
Some more logs
[2021-09-22T12:10:06,526][INFO ][o.e.n.Node ] [cluster_name] initializing ... [2021-09-22T12:10:06,589][INFO ][o.e.e.NodeEnvironment ] [cluster_name] using [1] data paths, mounts [[(D:)]], net usable_space [563.3gb], net total_space [1.7tb], spins? [unknown], types [NTFS] [2021-09-22T12:10:06,589][INFO ][o.e.e.NodeEnvironment ] [cluster_name] heap size [1.9gb], compressed ordinary object pointers [true] [2021-09-22T12:10:07,239][INFO ][o.e.n.Node ] [cluster_name] node name [sashanode1], node ID [2p-ux-OXRKGuxmN0efvF9Q] [2021-09-22T12:10:07,240][INFO ][o.e.n.Node ] [cluster_name] version[5.6.13], pid[57096], build[4d5320b/2018-10-30T19:05:08.237Z], OS[Windows Server 2019/10.0/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_261/25.261-b12] [2021-09-22T12:10:07,240][INFO ][o.e.n.Node ] [cluster_name] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Delasticsearch, -Des.path.home=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1, -Des.default.path.logs=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\logs, -Des.default.path.data=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\data, -Des.default.path.conf=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\config, exit, -Xms2048m, -Xmx2048m, -Xss1024k]
Also in my ES folder there are so many files with the random names (java_pid197036.hprof) Further details can be shared please suggest any further configurations. Thanks
The output for _cluster/stats?pretty&human is
{ "_nodes": { "total": 3, "successful": 3, "failed": 0 }, "cluster_name": "cluster_name", "timestamp": 1632375228033, "status": "red", "indices": { "count": 42, "shards": { "total": 508, "primaries": 217, "replication": 1.3410138248847927, "index": { "shards": { "min": 2, "max": 60, "avg": 12.095238095238095 }, "primaries": { "min": 1, "max": 20, "avg": 5.166666666666667 }, "replication": { "min": 1.0, "max": 2.0, "avg": 1.2857142857142858 } } }, "docs": { "count": 107283077, "deleted": 1047418 }, "store": { "size": "530.2gb", "size_in_bytes": 569385384976, "throttle_time": "0s", "throttle_time_in_millis": 0 }, "fielddata": { "memory_size": "0b", "memory_size_in_bytes": 0, "evictions": 0 }, "query_cache": { "memory_size": "0b", "memory_size_in_bytes": 0, "total_count": 0, "hit_count": 0, "miss_count": 0, "cache_size": 0, "cache_count": 0, "evictions": 0 }, "completion": { "size": "0b", "size_in_bytes": 0 }, "segments": { "count": 3781, "memory": "2gb", "memory_in_bytes": 2174286255, "terms_memory": "1.7gb", "terms_memory_in_bytes": 1863786029, "stored_fields_memory": "105.6mb", "stored_fields_memory_in_bytes": 110789048, "term_vectors_memory": "0b", "term_vectors_memory_in_bytes": 0, "norms_memory": "31.9mb", "norms_memory_in_bytes": 33527808, "points_memory": "13.1mb", "points_memory_in_bytes": 13742470, "doc_values_memory": "145.3mb", "doc_values_memory_in_bytes": 152440900, "index_writer_memory": "0b", "index_writer_memory_in_bytes": 0, "version_map_memory": "0b", "version_map_memory_in_bytes": 0, "fixed_bit_set": "0b", "fixed_bit_set_memory_in_bytes": 0, "max_unsafe_auto_id_timestamp": 1632340789677, "file_sizes": { } } }, "nodes": { "count": { "total": 3, "data": 3, "coordinating_only": 0, "master": 1, "ingest": 3 }, "versions": [ "5.6.13" ], "os": { "available_processors": 192, "allocated_processors": 96, "names": [ { "name": "Windows Server 2019", "count": 3 } ], "mem": { "total": "478.4gb", "total_in_bytes": 513717497856, "free": "119.7gb", "free_in_bytes": 128535437312, "used": "358.7gb", "used_in_bytes": 385182060544, "free_percent": 25, "used_percent": 75 } }, "process": { "cpu": { "percent": 5 }, "open_file_descriptors": { "min": -1, "max": -1, "avg": 0 } }, "jvm": { "max_uptime": "1.9d", "max_uptime_in_millis": 167165106, "versions": [ { "version": "1.8.0_261", "vm_name": "Java HotSpot(TM) 64-Bit Server VM", "vm_version": "25.261-b12", "vm_vendor": "Oracle Corporation", "count": 3 } ], "mem": { "heap_used": "5gb", "heap_used_in_bytes": 5460944144, "heap_max": "5.8gb", "heap_max_in_bytes": 6227755008 }, "threads": 835 }, "fs": { "total": "1.7tb", "total_in_bytes": 1920365228032, "free": "499.1gb", "free_in_bytes": 535939969024, "available": "499.1gb", "available_in_bytes": 535939969024 }, "plugins": [ ], "network_types": { "transport_types": { "netty4": 3 }, "http_types": { "netty4": 3 } } } }
The jvm.options file.
...ANSWER
Answered 2021-Oct-08 at 06:38My issue is solved, It is due to the heap size issue, actually I am running the ES as service and the heap size is by default 2 GB and it is not reflecting. I just install the new service with the updated options.jvm file with heap size of 10 GB, and then run my cluster. It reflect the heap size from 2 GB to 10 GB. And my problem is solved. Thanks for the suggestions.
to check your heap size use this command.
QUESTION
Anybody seen this messages before in mongodb sharded cluster 4.0.16 mongos during balancing:
...ANSWER
Answered 2021-Dec-02 at 15:52- This message is expected behaviour during balancing when there is read request for documents already migrated to other shard.
- The meaning is that the mongos is not able to establish remote cursor to the old shard since the config is reported stale and data is moved to the new shard.
- No fix is necessary this is informative message only.
QUESTION
Is it possible to Crop/Resize images per batch ?
I'm using Tensorflow dataset API as below:
...ANSWER
Answered 2021-Dec-01 at 14:51Generally, you can try something like this:
QUESTION
My company is developing a SaaS service to store events and provide analytics through dashboards.
Since we won't have deletes or updates, the idea is to create a columnar-based, OLAP architecture to benefit from compression and latency it provides, and PostgreSQL Citus is one platform we intend to evaluate.
The overall architecture is pretty standard: an API will receive the events and then store them on Kafka in JSON format. Then, those events will be sent to PostgreSQL. Some fields will be "jsonb" data type.
By reading the docs, the best practice is distribute tables by tenant id.
Just wanted to doucle-check a few things and would greatly appreciate someone's input:
- Does the architecture described above make sense? Is there anything we should change or pay attention to?
- Are there limitations in the number of nodes or shards that can be scaled out for this columnar approach?
- Is GIN index supported? (I believe it is, since it's not listed in 'Limitations')
Thanks!
...ANSWER
Answered 2021-Nov-22 at 07:46I've used citus for a multi-tenant service and distributing tables by the tenant id worked pretty well.
The overall architecture that you described makes sense but I'm not sure if someone from outside or at least without some details can give you much more to work with. Sending events through Kafka and processing them to be stored somewhere is a pretty standard way of working with events these days so nothing crazy is happening there.
There seem to be no limitations to scale out in terms of the number of nodes but what you should keep in mind is that how you set your shards count from the start. re-balancing will lock your tables and can take a while to finish so you want to keep it as small and easy to process as you can. Take a look here for more details: https://docs.citusdata.com/en/v10.2/faq/faq.html#how-do-i-choose-the-shard-count-when-i-hash-partition-my-data
GIN indexes are supported as they used it in their examples: https://docs.citusdata.com/en/v10.2/use_cases/multi_tenant.html?highlight=GIN#when-data-differs-across-tenants
Also, note that you won't have failover support in the community version. You have to go with the enterprise version which supports failover and also allows you to rebalance the tables without locking the entire table.
I wrote a simple service to handle the failover which you can use: https://github.com/Navid2zp/citus-failover
QUESTION
I have a Kinesis cluster that's pushing data into Amazon Redshift via Lambda.
Currently my lambda code looks something like this:
...ANSWER
Answered 2021-Oct-26 at 16:15The comment in your code gives me pause - "query = # prepare an INSERT query here". This seems to imply that you are reading the S3 data into Lambda and INSERTing the this data into Redshift. If so this is not a good pattern.
First off Redshift expects data to be brought into the cluster through COPY (or Spectrum or ...) but not through INSERT. This will create issues in Redshift with managing the transactions and create a tremendous waste or disk space / need for VACUUM. The INSERT approach for putting data in Redshift is an anti-pattern and shouldn't be done for even moderate sizes of data.
More generally the concern is the data movement impedance mismatch. Kinesis is lots of independent streams of data and code generating small files. Redshift is a massive database that works on large data segments. Mismatching these tools in a way that misses their designed targets will make either of them perform very poorly. You need to match the data requirement by batching up S3 into Redshift. This means COPYing many S3 files into Redshift in a single COPY command. This can be done with manifests or by "directory" structure in S3. "COPY everything from S3 path ..." This process of COPYing data into Redshift can be run every time interval (2 or 5 or 10 minutes). So you want your Kinesis Lambdas to organize the data in S3 (or add to a manifest) so that a "batch" of S3 files can be collected up for a COPY execution. This way a large number of S3 files can be brought into Redshift at once (its preferred data size) and will also greatly reduce your execute API calls.
Now if you have a very large Kinesis pipe set up and the data is very large there is another data movement "preference" to take into account. This only matters when you are moving a lot of data per minute. This extra preference is for S3. S3 being an object store means that there is a significant amount of time taken up by "looking up" a requested object key. It is about .5 sec. So reading a thousand S3 objects will take 500 require (in total) 500 seconds of key lookup time. Redshift will make requests to S3 in parallel, one per slice in the cluster, so some of this time is in parallel. If the files being read are 1KB in size the data transfer of the data, after S3 lookup is complete, will be about 1.25 sec. total. Again this time is in parallel but you can see how much time is spent in lookup vs. transfer. To get the maximum bandwidth out of S3 for reading many files, these files need to be 1GB in size (100MB is ok in my experience). You can see if you are to ingest millions of files per minute from Kinesis into Redshift you will need a process to combine many small files into bigger files to avoid this hazard of S3. Since you are using Lambda as your Kinesis reader I expect that you aren't to this data rate yet but it is good to have your eyes on this issue if you expect to expand to a very large scale.
Just because tools have high bandwidth doesn't mean that they can be piped together. Bandwidth comes in many styles.
QUESTION
I have a partitioned table with less than 600MB
data in the targeted partitions.
When I run an EXPORT
query with a subquery that returns ~1000 rows
, EXPORT
shards the data into 27(!) files.
This is not a big query result. I assume it's happening because the optimizer sees the 600MB
, but I'm not sure.
Has anyone else encountered this? I am GZIPing the results so concatenating would involve unzipping, appending and zipping again...
...ANSWER
Answered 2021-Oct-20 at 12:50I have run many various export scenarios in BigQuery
. I think there is only one use case when BigQuery export
would export multiple files where Table size
is partitioned
and smaller than 1 GB. It's using wildcards during exporting.
BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the URI except as part of the bucket name. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern.
I have tested this on using Public database
with partitioned and normal table:
- Partitioned table:
bigquery-public-data.covid19_geotab_mobility_impact.us_border_wait_times
- Normal table:
bigquery-public-data.covid19_google_mobility.mobility_report
Scenario 1:
Normal table: bigquery-public-data.covid19_google_mobility.mobility_report
QUESTION
I forked a Discord music bot from GitHub to ReplIt and then I try to follow steps to run the bot successfully!
I use Node.JS v.14!
When I run the bot I receive following error:
...ANSWER
Answered 2021-Sep-26 at 08:23You would have to force clean your npm cache and remove your node_modules
and reinstall them again to compile them against a new node version, for the same you may use the following commands in your Console / Shell:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install shards
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page