probes | Scripts and configuration for loading information
kandi X-RAY | probes Summary
kandi X-RAY | probes Summary
Collect HTCondor statistics and report into time-series database. All modules support Graphite, and there is some support for InfluxDB. Additionally report select job and slot Classads into Elasticsearch via Logstash. Note: this is a fork of the scripts used for monitoring the HTCondor pools at Fermilab, and while generally intended to be "generic" for any pool still may require some tweaking to work well for your pool. Copyright Fermi National Accelerator Laboratory (FNAL/Fermilab). See LICENSE.txt.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Post data to the pool
- Calculate the wall wall time in seconds
- Get the number of jobs in the pool
- Extract metrics from a job class
- Determine the bin name of the job class
- Find the best bin based on the given value
- Return the remote execution time of the job class
- Send EC2 instances
- Calculate CPU utilization
- Return the number of EC2 instances
- Get startd pools
- Get resource utilization for pooling
- Sanitize key
- Send a dict to Graphite
- Get options
- The main loop
- Sends a metric
probes Key Features
probes Examples and Code Snippets
def check_collective_ops_peer_health(self, task, timeout_in_ms):
"""Check collective peer health.
This probes each task to see if they're still alive. Note that restarted
tasks are considered a different one, and they're considered not h
Community Discussions
Trending Discussions on probes
QUESTION
Here's the code:
...ANSWER
Answered 2022-Feb-13 at 08:24Solved.
Client actually receives both messages, but first one never shows up in Firefox network debugger for some reason.
It appears to be a Firefox issue.
QUESTION
I'm running a 2 GKE private cluster set up in europe-west2. I have a dedicated config cluster for MCI and a worker cluster for workloads. Both clusters are registered to Anthos hub and ingress feat enabled on config cluster. In addition worker cluster runs latest ASM 1.12.2.
As far as MCI is concerned my deployment is 'standard' as in based on available docs (ie https://cloud.google.com/architecture/distributed-services-on-gke-private-using-anthos-service-mesh#configure-multi-cluster-ingress, terraform-example-foundation repo etc).
Everything works but I'm hitting an intermittent connectivity issue no matter how many times I redeploy entire stack. My eyes are bleeding from staring at logging dashboard. I ran out of dots to connect.
I'm probing some endpoints presented from my cluster which most of the time returns 200 with following logged under resource.type="http_load_balancer"
:
ANSWER
Answered 2022-Feb-05 at 13:42I had a same/similar issue when using a HTTPS with MultiClusterIngress.
Google support suggested to use a literal static IP for the annotation:
QUESTION
I am running MongoDB as a StatefulSet in Kubernetes.
I am trying to use startup/liveness probes, I noticed some helm charts use the MongoDB "ping" command.
As the documentationsays,
The ping command is a no-op used to test whether a server is responding to commands. This command will return immediately even if the server is write-locked:
What does it mean? When a server is starting or in the midst of initial sync, what will the command return? Many thanks!
...ANSWER
Answered 2021-Dec-10 at 17:35Not sure if the ping is a good idea, you don't care about the general state of the server, you care that it can receive connections.
liveness probes have a timeout, so it's possible that in the future when you're starting a new replica the new pod in the stateful set will fail while waiting for the replication to end.
You should use the rs.status() and get the "myState" field.
myState is an integer flag between 0-10. See this for all the possible statuses.
And if, for whatever reason rs.status() command fails, that means that the ping would also fail.
However, a successful ping doesn't mean that the server is ready to receive connections and serve data, which is what you really care about.
probesstartup probe, myState equals to 1 or 2
this means that the startup probe will wait patiently until the server is ready, regardless if it's a primary or replica.
readiness probes, myState equals to 1 or 2
this means that, whenever a replica needs to rollback or is recovering or whatever reason that mongod decides that it's not ready to accept connections or serve data, this will let kubernetes know that this pod is not ready, and will route requests to other pods in the sts.
livenes probe, myState is NOT equals to 6, 8 or 10
That means that, unless the server status is UNKOWN, DOWN or REMOVED, kubernetes will assume that this server is alive.
So, let's test a scenario!
- sts started, first pod is on STARTUP, myState = 0
- startup probe waits
- first MongoDB node is ready, myState = 1
- startup probe finally passed, now readiness and liveness probes start acting
- new replica triggered, second pod is on STARTUP, myState = 0
- new replica successfully joins the set, myState = 5
- new replica is ready, myState = 2
- startup probe finally passed, now readiness and liveness probes start acting
- time for some action
- a massive operation that altered hundreds of documents was rolledback on the primary
- second pod is now on ROLLBACK, myState = 9, readiness probe failed, second pod is now NOT READY
- all connections are now sent to the PRIMARY
- second pod has finished the rollback
- second pod is now back as a SECONDARY, myState = 2, liveness probe succeeds and pod is back at the READY state
- the MongoDB dba messed up and issued a command that removed the secondary from the replicationset, myState = 10
- liveness probe fails, kubernetes kills the pod
- sts wants 2 replicas and starts a second pod again ...
all good :)
QUESTION
Please tell me what is the difference between the parameters: CURLOPT_TCP_KEEPIDLE and CURLOPT_TCP_KEEPINTVL ?
CURLOPT_TCP_KEEPIDLE: Sets the delay, in seconds, that the operating system will wait while the connection is idle before sending keepalive probes. Not all operating systems support this option.
CURLOPT_TCP_KEEPINTVL: Sets the interval, in seconds, that the operating system will wait between sending keepalive probes. Not all operating systems support this option.
-I understand it like this: CURLOPT_TCP_KEEPIDLE - this means that how long will the OS wait for some "keepalive probes" from the server side before the OS thinks that the connection has drop ?
-But I can't understand this: CURLOPT_TCP_KEEPINTVL - set interval...in which OS will wait between .... Between what? Interval between what and what ?
...ANSWER
Answered 2021-Dec-03 at 15:32TCP keep alive sends "keep alive" probes (small IP packages) between both endpoints.
If no data has been transferred over the TCP connection for a certain period, the TCP endpoint will send a keep alive probe. This period is CURLOPT_TCP_KEEPIDLE
.
If the other endpoint is still connected, the other endpoint will reply to the keep alive probe.
If the other endpoint does not reply to the keep alive probe, the TCP endpoint will send another keep alive probe after a certain period. This period is CURLOPT_TCP_KEEPINTVL
.
The TCP endpoint will keep sending keep alive probes until the other endpoint sends a reply OR a maximum number of keep alive probes has been sent. If the maximum number of keep alive probes has been sent without a reply form the other endpoint, the TCP connection is no longer connected.
QUESTION
I have setup EFK stack in K8s cluster. Currently fluentd is scraping logs from all the containers.
I want it to only scrape logs from containers A
, B
, C
and D
.
If I had some prefix with as A-app
I could do something like below.
ANSWER
Answered 2021-Aug-20 at 13:53To scrape logs only from specific Pods, you can use:
QUESTION
I'm struggling to expose a service in an AWS cluster to outside and access it via a browser. Since my previous question haven't drawn any answers, I decided to simplify the issue in several aspects.
First, I've created a deployment which should work without any configuration. Based on this article, I did
kubectl create namespace tests
created file
...probe-service.yaml
based onpaulbouwer/hello-kubernetes:1.8
and deployed itkubectl create -f probe-service.yaml -n tests
:
ANSWER
Answered 2021-Nov-16 at 13:46Well, I haven't figured this out for ArgoCD yet (edit: figured, but the solution is ArgoCD-specific), but for this test service it seems that path resolving is the source of the issue. It may be not the only source (to be retested on test2 subdomain), but when I created a new subdomain in the hosted zone (test3, not used anywhere before) and pointed it via A
entry to the load balancer (as "alias" in AWS console), and then added to the ingress a new rule with /
path, like this:
QUESTION
The ingress-nginx
pod I have helm-installed into my EKS cluster is perpetually failing, its logs indicating the application cannot bind to 0.0.0.0:8443
(INADDR_ANY:8443
). I have confirmed that 0.0.0.0:8443
is indeed already bound in the container, but bc I don't yet have root access to the container I've been unable to glean the culprit process/user.
I have created this issue on the kubernetes ingress-nginx project that I'm using, but also wanted to reach out to a wider SO community that might lend insights, solutions and troubleshooting suggestions for how to get past this hurdle.
Being a newcomer to both AWS/EKS and Kubernetes, it is likely that there is some environment configuration error causing this issue. For example, is it possible that this could be caused by a misconfigured AWS-ism such as the VPC (its Subnets or Security Groups)? Thank you in advance for your help!
The linked GitHub issue provides copious details about the Terraform-provisioned EKS environment as well as the Helm-installed deployment of ingress-nginx
. Here are some key details:
- The EKS cluster is configured to only use Fargate workers, and has 3 public and 3 private subnets, all 6 of which are made available to the cluster and each of its Fargate profiles.
- It should also be noted that the cluster is new, and the ingress-nginx pod is the first attempt to deploy anything to the cluster, aside from kube-system items like coredns, which has been configured to run in Fargate. (which required manually removing the default ec2 annotation as described here)
- There are 6 fargate profiles, but only 2 that are currently in use:
coredns
andingress
. These are dedicated to kube-system/kube-dns and ingress-nginx, respectively. Other than the selectors' namespaces and labels, there is nothing "custom" about the profile specification. It has been confirmed that the selectors are working, both for coredns and ingress. I.e. the ingress pods are scheduled to run, but failing. - The reason why
ingress-nginx
is using port 8443 is that I first ran into this Privilege Escalation issue whose workaround requires one to disableallowPrivilegeEscalation
and change ports from privileged to unprivileged ones. I'm invokinghelm install
with the following values:
ANSWER
Answered 2021-Nov-16 at 14:26Posted community wiki answer based on the same topic and this similar issue (both on GitHub page). Feel free to expand it.
The problem is that 8443 is already bound for the webhook. That's why I used 8081 in my suggestion, not 8443. The examples using 8443 here had to also move the webhook, which introduces more complexity to the changes, and can lead to weird issues if you get it wrong.
An example with used 8081 port:
As well as those settings, you'll also need to use the appropriate annotations to run using NLB rather than ELB, so all-up it ends up looking something like
QUESTION
I have implemented a gRPC service, build it into a container, and deployed it using k8s, in particular AWS EKS, as a DaemonSet.
The Pod starts and turns to be in Running status very soon, but it takes very long, typically 300s, for the actual service to be accessible.
In fact, when I run kubectl logs
to print the log of the Pod, it is empty for a long time.
I have logged something at the very starting of the service. In fact, my code looks like
...ANSWER
Answered 2021-Oct-15 at 03:20When you use HTTP_PROXY for your solution, watchout how it may route differently from your underlying cluster network - which often result to unexpected timeout.
QUESTION
I'm trying to folllow Azure Tutorial on how to get Api Management under a vnet and accessible through an application gateway (WAF). I'm stuck trying to upload the root cert into application gateway. It says that the "Data for certificate is invalid", apparently Azure Application gateway doesn’t like Letsencrypt certs.
My certs are:
- mydomain.com.br
- api.mydomain.com.br
- developer.mydomain.com.br
- managemnet.mydomain.com.br
I have used acmesh to generate all certs:
...ANSWER
Answered 2021-Aug-30 at 21:17Why you want to add the Lets Encrypt Root CA cert on your application gateway?
From my understanding the Root CA from Lets Encrypt is ISRG Root X1
and this one should be already trusted by Clients (Browsers).You only want to add the Root CA if you have self signed certificates.
Here is a workflow with storing the certs in Azure Key Vault: https://techblog.buzyka.de/2021/02/make-lets-encrypt-certificates-love.html
Another Workflow here describes adding certs with ACME challenges: https://intelequia.com/blog/post/1012/automating-azure-application-gateway-ssl-certificate-renewals-with-let-s-encrypt-and-azure-automation
QUESTION
I have an (internal) K8s deployment (Python, TensorFlow, Guinicorn) with around 60 replicas and an attached K8s service to distribute incoming HTTP requests. Each of these 60 pods can only actually process one HTTP request at a time (because of TensorFlow reasons). Processing a request takes between 1 and 4 seconds. If a second request is sent to a pod while it's still processing one, the second request is just queued in the Gunicorn backlog.
Now I'd like to reduce the probability of that queuing happening as much as possible, i.e., have new requests routed to one of the non-occupied pods as long as such a non-occupied one exists.
Round-robin would not do the trick, because not every request takes the same amount of time to answer (see above).
The Python application itself could make the endpoint used for the ReadinessProbe fail while it's processing a normal request, but as far as I understand, readiness probes are not meant for something that dynamic (K8s would need to poll them multiple times per second).
So how could I achieve the goal?
...ANSWER
Answered 2021-Sep-24 at 04:55Can't you implement the pub/sub or message broker in between?
saver data into a queue based on the ability you worker will fetch the message or data from queue and request will get processed.
You can use Redis for creating queues and in queue, you can use pub/sub also possible using the library. i used one before in Node JS however could be possible to implement the same using python also.
in 60 replicas ideally worker or we can say scriber will be running.
As soon as you get a request one application will publish it and scribers will be continuously working for processing those messages.
We also implemented one step further, scaling the worker count automatically depends on the message count in the queue.
This library i am using with the Node js : https://github.com/OptimalBits/bull
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install probes
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page