health | extensible health check library for Go applications | Monitoring library
kandi X-RAY | health Summary
kandi X-RAY | health Summary
An easy to use, extensible health check library for Go applications.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of health
health Key Features
health Examples and Code Snippets
def _check_health(self):
while True:
if self._check_health_thread_should_stop.is_set():
return
for job in self._cluster_spec.jobs:
for task_id in range(self._cluster_spec.num_tasks(job)):
peer = "/job:{}/repl
def _start_check_health_thread(self):
# Use a dummy all-reduce as a barrier to wait for all workers to be up,
# otherwise the check health may fail immediately.
# Use array_ops.identity to create the dummy tensor so that we have a new
def check_collective_ops_peer_health(self, task, timeout_in_ms):
"""Check collective peer health.
This probes each task to see if they're still alive. Note that restarted
tasks are considered a different one, and they're considered not h
Community Discussions
Trending Discussions on health
QUESTION
I'm following a tutorial https://docs.openfaas.com/tutorials/first-python-function/,
currently, I have the right image
...ANSWER
Answered 2022-Mar-16 at 08:10If your image has a latest
tag, the Pod's ImagePullPolicy
will be automatically set to Always
. Each time the pod is created, Kubernetes tries to pull the newest image.
Try not tagging the image as latest
or manually setting the Pod's ImagePullPolicy
to Never
.
If you're using static manifest to create a Pod, the setting will be like the following:
QUESTION
Is there a way of joining two dataframes via where a row in the first dataframe is joined with every row in the second dataframe if they share a word in common?
For example:
...ANSWER
Answered 2022-Mar-15 at 18:03With fuzzy_join
:
QUESTION
Since @nestjs/terminus doesn't provide a health check for Prisma, I'm trying to create it based on their Mongoose health check.
When I try:
...ANSWER
Answered 2021-Oct-14 at 14:41A naive copy of the mongoose implementation isn't going to work because there are differences between the NestJSMongoose
type/module and Prisma
. In particular, getConnectionToken
does not exist inside the Prisma
package.
I can't comment on what the best way would be to extend terminus to support prisma. You might have to dig a bit into the terminus
interface for that. However, a simple way to get a health check/ping in Prisma is to use the following query:
QUESTION
I've been trying to get over this but I'm out of ideas for now hence I'm posting the question here.
I'm experimenting with the Oracle Cloud Infrastructure (OCI) and I wanted to create a Kubernetes cluster which exposes some service.
The goal is:
- A running managed Kubernetes cluster (OKE)
- 2 nodes at least
- 1 service that's accessible for external parties
The infra looks the following:
- A VCN for the whole thing
- A private subnet on 10.0.1.0/24
- A public subnet on 10.0.0.0/24
- NAT gateway for the private subnet
- Internet gateway for the public subnet
- Service gateway
- The corresponding security lists for both subnets which I won't share right now unless somebody asks for it
- A containerengine K8S (OKE) cluster in the VCN with public Kubernetes API enabled
- A node pool for the K8S cluster with 2 availability domains and with 2 instances right now. The instances are ARM machines with 1 OCPU and 6GB RAM running Oracle-Linux-7.9-aarch64-2021.12.08-0 images.
- A namespace in the K8S cluster (call it staging for now)
- A deployment which refers to a custom NextJS application serving traffic on port 3000
And now it's the point where I want to expose the service running on port 3000.
I have 2 obvious choices:
- Create a LoadBalancer service in K8S which will spawn a classic Load Balancer in OCI, set up it's listener and set up the backendset referring to the 2 nodes in the cluster, plus it adjusts the subnet security lists to make sure traffic can flow
- Create a Network Load Balancer in OCI and create a NodePort on K8S and manually configure the NLB to the ~same settings as the classic Load Balancer
The first one works perfectly fine but I want to use this cluster with minimal costs so I decided to experiment with option 2, the NLB since it's way cheaper (zero cost).
Long story short, everything works and I can access the NextJS app on the IP of the NLB most of the time but sometimes I couldn't. I decided to look it up what's going on and turned out the NodePort that I exposed in the cluster isn't working how I'd imagine.
The service behind the NodePort is only accessible on the Node that's running the pod in K8S. Assume NodeA is running the service and NodeB is just there chilling. If I try to hit the service on NodeA, everything is fine. But when I try to do the same on NodeB, I don't get a response at all.
That's my problem and I couldn't figure out what could be the issue.
What I've tried so far:
- Switching from ARM machines to AMD ones - no change
- Created a bastion host in the public subnet to test which nodes are responding to requests. Turned out only the node responds that's running the pod.
- Created a regular LoadBalancer in K8S with the same config as the NodePort (in this case OCI will create a classic Load Balancer), that works perfectly
- Tried upgrading to Oracle 8.4 images for the K8S nodes, didn't fix it
- Ran the Node Doctor on the nodes, everything is fine
- Checked the logs of kube-proxy, kube-flannel, core-dns, no error
- Since the cluster consists of 2 nodes, I gave it a try and added one more node and the service was not accessible on the new node either
- Recreated the cluster from scratch
Edit: Some update. I've tried to use a DaemonSet instead of a regular Deployment for the pod to ensure that as a temporary solution, all nodes are running at least one instance of the pod and surprise. The node that was previously not responding to requests on that specific port, it still does not, even though a pod is running on it.
Edit2: Originally I was running the latest K8S version for the cluster (v1.21.5) and I tried downgrading to v1.20.11 and unfortunately the issue is still present.
Edit3: Checked if the NodePort is open on the node that's not responding and it is, at least kube-proxy is listening on it.
...ANSWER
Answered 2022-Jan-31 at 12:06Might not be the ideal fix, but can you try changing the externalTrafficPolicy to Local. This would prevent the health check on the nodes which don't run the application to fail. This way the traffic will only be forwarded to the node where the application is . Setting externalTrafficPolicy to local is also a requirement to preserve source IP of the connection. Also, can you share the health check config for both NLB and LB that you are using. When you change the externalTrafficPolicy, note that the health check for LB would change and the same needs to be applied to NLB.
Edit: Also note that you need a security list/ network security group added to your node subnet/nodepool, which allows traffic on all protocols from the worker node subnet.
QUESTION
So this seems to be an issue talked about here and there on StackOverflow with no real solution. So I have a bunch of tests that all pass when run individual. They even pass when run as a full test suite, EXCEPT when I add in my TestCase ExploreFeedTest
. Now ExploreFeedTest
passes when run by itself and it actually doesn't fail when run in the full test suite as in running python manage.py test
, it causes another test HomeTest
to fail, which passes on it's own and passes when ExploreFeedTest
is commented out from the init.py
under the test
folder. I hear this is an issue with Django not cleaning up data properly? All my TestCase
classes are from django.test.TestCase
, because apparently if you don't use that class Django doesn't teardown the data properly, so I don't really know how to solve this. I'm also running Django 3.2.9, which is supposedly the latest. Anyone have a solution for this?
ExploreFeedTest.py
...ANSWER
Answered 2021-Dec-23 at 10:31I posted the answer on the stack overflow question
I was also using factory boy, which doesn't seem to play nice with test suite. Test suite doesn't seem to know how to rollback the DB without getting rid of factory boy generated data.
QUESTION
We've had a working Ansible AWX instance running on v5.0.0 for over a year, and suddenly all jobs stop working -- no output is rendered. They will start "running" but hang indefinitely without printing out any logging.
The AWX instance is running in a docker compose container setup as defined here: https://github.com/ansible/awx/blob/5.0.0/INSTALL.md#docker-compose
ObservationsStandard troubleshooting such as restarting of containers, host OS, etc. hasn't helped. No configuration changes in either environment.
Upon debugging an actual playbook command, we observe that the command to run a playbook from the UI is like the below:
ssh-agent sh -c ssh-add /tmp/awx_11021_0fmwm5uz/artifacts/11021/ssh_key_data && rm -f /tmp/awx_11021_0fmwm5uz/artifacts/11021/ssh_key_data && ansible-playbook -vvvvv -u ubuntu --become --ask-vault-pass -i /tmp/awx_11021_0fmwm5uz/tmppo7rcdqn -e @/tmp/awx_11021_0fmwm5uz/env/extravars playbook.yml
That's broken down into three commands in sequence:
ssh-agent sh -c ssh-add /tmp/awx_11021_0fmwm5uz/artifacts/11021/ssh_key_data
rm -f /tmp/awx_11021_0fmwm5uz/artifacts/11021/ssh_key_data
ansible-playbook -vvvvv -u ubuntu --become --ask-vault-pass -i /tmp/awx_11021_0fmwm5uz/tmppo7rcdqn -e @/tmp/awx_11021_0fmwm5uz/env/extravars playbook.yml
You can see in part 3, the -vvvvv
is the debugging argument -- however, the hang is happening on command #1. Which has nothing to do with ansible or AWX specifically, but it's not going to get us much debugging info.
I tried doing an strace
to see what is going on, but for reasons given below, it is pretty difficult to follow what it is actually hanging on. I can provide this output if it might help.
So one natural question with command #1 -- what is 'ssh_key_data'?
Well it's what we set up to be the Machine credential in AWX (an SSH key) -- it hasn't changed in a while and it works just fine when used in a direct SSH command. It's also apparently being set up by AWX as a file pipe:
prw------- 1 root root 0 Dec 10 08:29 ssh_key_data
Which starts to explain why it could be potentially hanging (if nothing is being read in from the other side of the pipe).
Running a normal ansible-playbook from command line (and supplying the SSH key in a more normal way) works just fine, so we can still deploy, but only via CLI right now -- it's just AWX that is broken.
ConclusionsSo the question then becomes "why now"? And "how to debug"? I have checked the health of awx_postgres, and verified that indeed the Machine credential is present in an expected format (in the main_credential
table). I have also verified that can use ssh-agent on the awx_task container without the use of that pipe keyfile. So it really seems to be this piped file that is the problem -- but I haven't been able to glean from any logs where the other side of the pipe (sender) is supposed to be or why they aren't sending the data.
ANSWER
Answered 2021-Dec-13 at 04:21Had the same issue starting this Friday in the same timeframe as you. Turned out that Crowdstrike (falcon sensor) Agent was the culprit. I'm guessing they pushed a definition update that is breaking or blocking fifo pipes. When we stopped the CS agent, AWX started working correctly again, with no issues. See if you are running a similar security product.
QUESTION
I have an application running on my local machine that uses React -> gRPC-Web -> Envoy -> Go app and everything runs with no problems. I'm trying to deploy this using GKE Autopilot and I just haven't been able to get the configuration right. I'm new to all of GCP/GKE, so I'm looking for help to figure out where I'm going wrong.
I was following this doc initially, even though I only have one gRPC service: https://cloud.google.com/architecture/exposing-grpc-services-on-gke-using-envoy-proxy
From what I've read, GKE Autopilot mode requires using External HTTP(s) load balancing instead of Network Load Balancing as described in the above solution, so I've been trying to get that to work. After a variety of attempts, my current strategy has an Ingress, BackendConfig, Service, and Deployment. The deployment has three containers: my app, an Envoy sidecar to transform the gRPC-Web requests and responses, and a cloud SQL proxy sidecar. I eventually want to be using TLS, but for now, I left that out so it wouldn't complicate things even more.
When I apply all of the configs, the backend service shows one backend in one zone and the health check fails. The health check is set for port 8080 and path /healthz which is what I think I've specified in the deployment config, but I'm suspicious because when I look at the details for the envoy-sidecar container, it shows the Readiness probe as: http-get HTTP://:0/healthz headers=x-envoy-livenessprobe:healthz. Does ":0" just mean it's using the default address and port for the container, or does indicate a config problem?
I've been reading various docs and just haven't been able to piece it all together. Is there an example somewhere that shows how this can be done? I've been searching and haven't found one.
My current configs are:
...ANSWER
Answered 2021-Oct-14 at 22:35Here is some documentation about Setting up HTTP(S) Load Balancing with Ingress. This tutorial shows how to run a web application behind an external HTTP(S) load balancer by configuring the Ingress resource.
Related to Creating a HTTP Load Balancer on GKE using Ingress, I found two threads where instances created are marked as unhealthy.
In the first one, they mention the necessity to manually enable a firewall rule to allow http load balancer ip range to pass health check.
In the second one, they mention that the Pod’s spec must also include containerPort. Example:
QUESTION
We have a bunch of health checks against third-party services. We want them to run periodically because when they go down it affects our app just like a bug in our code. Knowing that "it's them not us" reduces significant troubleshooting time.
We've set this health check up via github actions with a scheduled run, but we want a HealthCheck per third-party service. That way, the slack message on failure will be very specific of what is down. But that is going to create a lot of duplicated yml content.
I discovered something called github composite actions and it seems to be intended for solving this problem, but I can't find information about whether or not a composite action can live in a private repository.
The documentation of the uses
key only mentions public repositories when it mentions repositories at all. Is there a way to make a composite action in a private repository and use it?
I tried making their hello world example, ran it, and it ran correctly. Then I made the action repo private, and the repo using the action's build failed saying:
...ANSWER
Answered 2021-Dec-14 at 04:10You have to check out the repository containing your action using a personal access token first, then use a relative path to where you checked it out:
QUESTION
Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception
I have a mapping as such (simplified and obfuscated)
...ANSWER
Answered 2021-Dec-07 at 10:49Elastic
flatten objects. Such that
QUESTION
I am using actuator and in the application.properties file i have the following fields
...ANSWER
Answered 2021-Nov-26 at 14:03I solved the problem, just add in the application.properties file
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install health
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page