autoscaler | Automatically scale the capacity of your Spanner instances | Monitoring library
kandi X-RAY | autoscaler Summary
kandi X-RAY | autoscaler Summary
The Autoscaler tool for Cloud Spanner is a companion tool to Cloud Spanner that allows you to automatically increase or reduce the number of nodes in one or more Spanner instances, based on their utilization. When you create a Cloud Spanner instance, you choose the number of nodes that provide compute resources for the instance. As the instance's workload changes, Cloud Spanner does not automatically adjust the number of nodes in the instance. The Autoscaler monitors your instances and automatically adds or removes nodes to ensure that they stay within the recommended maximums for CPU utilization and the recommended limit for storage per node, plus or minus an allowed margin. Note that the recommended thresholds are different depending if a Spanner instance is regional or multi-region.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Parse payload
- Check to see if a scaling should be performed
- Build the metrics .
- Get the max metrics for a particular metric request .
- Get the metrics for a given span instance .
- Get metadata of a span instance .
- Loops through the dimensions of a span and returns the maximum number of metrics about the requested size .
- Analyse the size of a new span
- Writes log level
- Validate for custom metrics
autoscaler Key Features
autoscaler Examples and Code Snippets
Community Discussions
Trending Discussions on autoscaler
QUESTION
Concerning the Kubernetes Horizontal Autoscaler, are there any metrics related to the number of changes between certain time periods?
...ANSWER
Answered 2022-Feb-23 at 18:05Kubernetes does not provide such metrics, but you can get events
for a k8s resource.
An event in Kubernetes is an object in the framework that is automatically generated in response to changes with other resources—like nodes, pods, or containers.
The simplest way to get events for HPA:
QUESTION
I have a web app (react + node.js) running on App Engine.
I would like to kick off (from this web app) a Machine Learning job that requires a GPU (running in a container on AI platform or running on GKE using a GPU node pool like in this tutorial, but we are open to other solutions).
I was thinking of trying what is described at the end of this answer, basically making an HTTP request to start the job using project.job.create API.
More details on the ML job in case this is useful: it generates an output every second that is stored on Cloud Storage and then read in the web app.
I am looking for examples of how to set this up? Where would the job configuration live and how should I set up the API call to kick off that job? Are the there other ways to achieve the same result?
Thank you in advance!
...ANSWER
Answered 2022-Feb-09 at 11:47On Google Cloud, all is API, and you can interact with all the product with HTTP request. SO you can definitively achieve what you want.
I personally haven't an example but you have to build a JSON job description and post it to the API.
Don't forget, when you interact with Google Cloud API, you have to add an access token in the Authorization: Bearer
header
Where should be your job config description? It depends...
If it is strongly related to your App Engine app, you can add it in App Engine code itself and have it "hard coded". The downside of that option is anytime you have to update the configuration, you have to redeploy a new App Engine version. But if your new version isn't correct, a rollback to a previous and stable version is easy and consistent.
If you prefer to update differently your config file and your App Engine code, you can store the config out of App Engine code, on Cloud Storage for instance. Like that, the update is simple and easy: update the config on Cloud Storage to change the job configuration. However there is no longer relation between the App Engine version and the config version. And the rollback to a stable version can be more difficult.
You can also have a combination of both, where you have a default job configuration in your App Engine code, and an environment variable potentially set to point to a Cloud Storage file that contain a new version of the configuration.
I don't know if it answers all your questions. Don't hesitate to comment if you want more details on some parts.
QUESTION
We have setup a GKE cluster using Terraform with private and shared networking:
Network configuration:
...ANSWER
Answered 2022-Feb-10 at 15:52I have been missing the peering configuration documented here: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing
QUESTION
I'm running a google cloud composer GKE cluster. I have a default node pool of 3 normal CPU nodes and one nodepool with a GPU node. The GPU nodepool has autoscaling activated.
I want to run a script inside a docker container on that GPU node.
For the GPU operating system I decided to go with cos_containerd instead of ubuntu.
I've followed https://cloud.google.com/kubernetes-engine/docs/how-to/gpus and ran this line:
...ANSWER
Answered 2022-Feb-02 at 18:17Can i do that with kubectl apply ? Ideally I would like to only run that yaml code onto the GPU node. How can I achieve that?
Yes, You can run the Deamon set on each node which will run the command on Nodes.
As you are on GKE and Daemon set will also run the command or script on New nodes also which are getting scaled up also.
Daemon set is mainly for running applications or deployment on each available node in the cluster.
We can leverage this deamon set and run the command on each node that exist and is also upcoming.
Example YAML :
QUESTION
I'm trying to set up FluentBit for my EKS cluster in Terraform, via this module, and I have couple of questions:
cluster_identity_oidc_issuer - what is this? Frankly, I was just told to set this up, so I have very little knowledge about FluentBit, but I assume this "issuer" provides an identity with needed permissions. For example, Okta? We use Okta, so what would I use as a value in here?
cluster_identity_oidc_issuer_arn - no idea what this value is supposed to be.
worker_iam_role_name - as in the role with autoscaling capabilities (oidc)?
This is what eks.tf looks like:
...ANSWER
Answered 2022-Feb-01 at 13:47Since you are using a Terraform EKS module, you can access attributes of the created resources by looking at the Outputs
tab [1]. There you can find the following outputs:
cluster_id
cluster_oidc_issuer_url
oidc_provider_arn
They are accessible by using the following syntax:
QUESTION
I have created a k8s cluster with kops (1.21.4) on AWS and as per the docs on autoscaler. I have done the required changes to my cluster but when the cluster starts, the cluster-autoscaler pod is unable to schedule on any node. When I describe the pod, I see the following:
...ANSWER
Answered 2022-Jan-07 at 04:12You need to check the pod/deployment for nodeSelector property. Make sure that your desired nodes have this label.
Also, if you want to schedule pods on the master node, you must remove the taint first
QUESTION
According to the K8s documentation, to avoid flapping of replicas property stabilizationWindowSeconds
can be used
The stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating. The stabilization window is used by the autoscaling algorithm to consider the computed desired state from the past to prevent scaling.
When the metrics indicate that the target should be scaled down the algorithm looks into previously computed desired states and uses the highest value from the specified interval.
From what I understand from documentation, with the following hpa configuration:
...ANSWER
Answered 2022-Jan-04 at 07:34There is a bug in k8s HPA in v1.20, check the issue. Upgrading to v1.21 fixed the problem, deployment is scaling without flapping after the upgrade.
On the picture scaling of the deployment over 2 days:
QUESTION
I can use GKE Autopilot to run arbitrary workloads on a sandbox project (with default networks, default service account, default firewall rules) just fine.
But I need to create a GKE Autopilot cluster in an existing project which isn't using the default settings for a few different things like networking and when I try, the pods never get run. My problem lies in identifying the underlying reason for the failure and which part of project setup is preventing GKE Autopilot to work.
The error messages and logs are very very scarse. The only things that I see are:
- in the workloads UI, for my pod, it says "Pod unschedulable"
- in the pod UI, under events, it says "no nodes available to schedule pods" and "pod triggered scale-up: [{...url-of-an-instance-group...}]"
- under the cluster autoscaler logs, there is a "scale.up.error.waiting.for.instances.timeout" buried in a resultInfo log (with a reference to a instance group url)
I can't find anything online about why the scaling up would fail in the Autopilot mode which is supposed to be such a hands-off experience. I understand I'm not giving much details about the pod specification (any would fail!) or my project settings, but simply where to look next would be helpful in my current situation.
...ANSWER
Answered 2021-Dec-20 at 21:51Ensure that the default Compute Engine service account (-compute@developer.gserviceaccount.com) is not disabled.
Run the following command to check that disabled
field is not set to true
QUESTION
I am running a GPU intensive workload on demand on GKE Standard, where I have created the appropriate node pool with minimum 0 and maximum 5 nodes. However, when a Job is scheduled on the node pool, GKE presents the following error:
...ANSWER
Answered 2021-Dec-15 at 14:541 node(s) had taint {nvidia.com/gpu: present}, that the pod didn't tolerate...
Try add tolerations
to your job's pod spec:
QUESTION
I have a GKE cluster which doesn't scale up when a particular deployment needs more resources.
I've checked the cluster autoscaler logs and it has entries with this error:
no.scale.up.nap.pod.zonal.resources.exceeded
. The documentation for this error says:
Node auto-provisioning did not provision any node group for the Pod in this zone because doing so would violate resource limits.
I don't quite understand which resource limits are mentiond in the documentation and why it prevents node-pool from scaling up?
If I scale cluster up manually - deployment pods are scaled up and everything works as expected, so, seems it's not a problem with project quotas.
...ANSWER
Answered 2021-Dec-11 at 00:10Limits for clusters that you define are enforced based on the total CPU and memory resources used across your cluster, not just auto-provisioned pools.
When you are not using node auto provisioning (NAP), disable node auto provisioning feature for the cluster.
When you are using NAP, then update the cluster wide resource limits defined in NAP for the cluster .
Try a workaround by specifying the machine type explicitly in the workload spec. Ensure to use a supported machine family with GKE node auto-provisioning
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install autoscaler
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page