palanteer | Visual Python and C++ nanosecond profiler , logger , tests | Monitoring library
kandi X-RAY | palanteer Summary
Support
Quality
Security
License
Reuse
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here
palanteer Key Features
palanteer Examples and Code Snippets
Trending Discussions on Monitoring
Trending Discussions on Monitoring
QUESTION
I need to get the IP numbers that are connecting to the EC2 instance then add them to AWS security group as a security group rule. So only those machines will have the permission to connect to instance. I don't need the port number that they're connecting to instance.
I installed iptraf-ng but app is very slow on the instance. Any other suggestions to capture the connecting IP's to instance so I can add them faster to security group rule?
ANSWER
Answered 2022-Apr-08 at 16:12You can use VPC Flow logs to monitor the traffic to the VPC (which will include the traffic that is going to the EC2 instance).
QUESTION
I have a problem with checking my service on other windows or Linux servers.
My problem is that I have to make a request from one server to the other servers and check if the vital services of those servers are active or disabled.
I wrote Python code to check for services, which only works on a local system.
import psutil
def getService(name):
service = None
try:
service = psutil.win_service_get(name)
service = service.as_dict()
except Exception as ex:
print(str(ex))
return service
service = getService('LanmanServer')
if service:
print("service found")
else:
print("service not found")
if service and service['status'] == 'running':
print("service is running")
else:
print("service is not running")
Does this code have this feature? Or suggest another code؟
I have reviewed suggestions such as using server agents (influx, ...), which are not working for my needs.
ANSWER
Answered 2022-Mar-08 at 17:46As far as I know, psutil
can only be used for gathering information about local processes, and is not suitable for retrieving information about processes running on other hosts. If you want to check whether or not a process is running on another host, there are many ways to approach this problem, and the solution depends on how deep you want to go (or need to go), and what your local situation is. From the top of my head, here are some ideas:
If you are only dealing with network services with exposed ports:
A very simple solution would involve using a script and a port scanner (nmap); if a port that a service is listening behind, is open, then we can assume that the service is running. Run the script every once in a while to check up on the services, and do your thing.
If you want to stay in Python, you can achieve the same end result by using Python's
socket
module to try and connect to a given host and port to determine whether or not the port that a service is listening behind, is open.A Python package or tool for monitoring network services on other hosts like this probably already exists.
If you want more information and need to go deeper, or you want to check up on local services, your solution will have to involve a local monitor process on each host, and connecting to that process to gather information.
- You can use your code to implement a server that lets clients connect to it, to check up on the services running on that host. (Check the
socket
module's official documentation for examples on how to implement clients and servers.)
Here's the big thing though. Based on your question and how it was asked, I would assume that you do not have the experience nor the insight to implement this in a secure way yet. If you're using this for a simple hobby/student project, roll out your own solution, and learn. Otherwise, I would recommend that you check out an existing solution like Nagios, and follow the security recommendations very closely.
QUESTION
I am trying to set up a dashboard on Datadog that will show me the streaming metrics for my streaming job. The job itself contains two tasks one task has 2 streaming queries and the other has 4 (Both tasks use the same cluster). I followed the instructions here to install Datadog on the driver node. However when I go to datadog and try to create a dashboard there is no way to differentiate between the 6 different streaming queries so they are all lumped together (none of the tags for the metrics are different per query).
ANSWER
Answered 2022-Mar-11 at 18:18After some digging I found there is an option you can enable via the init script called enable_query_name_tag which is disabled by default as it can cause there to be a ton of tags created when you are not using query names.
The modification is shown here:
instances:
- spark_url: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
spark_cluster_mode: spark_standalone_mode
cluster_name: \${hostip}
streaming_metrics: true
enable_query_name_tag: true <----
QUESTION
I have a metric with 2 labels. Both labels can have 2 values A or B.
I'd like to sum all the values and exclude the case when Label1=A and Label2=B.
sum by (Label1,Label2)(metric{?})
Is it possible ?
ANSWER
Answered 2022-Mar-02 at 17:51Try the following query:
sum by (Label1,Label2) (metric unless metric{Label1="A",Label2="B"})
QUESTION
I'm trying to set up Prometheus-to-Prometheus metrics flow, I was able to do it by flag --enable-feature=remote-write-receiver
.
However I need to have mTLS there, can someone advice a manual or post a config sample?
Appreciate you help
ANSWER
Answered 2022-Feb-24 at 06:08There is a second config file with experimental options related to HTTP server, and it has options to enable TLS:
tls_server_config:
# Certificate and key files for server to use to authenticate to client.
cert_file:
key_file:
# Server policy for client authentication. Maps to ClientAuth Policies.
# For more detail on clientAuth options:
# https://golang.org/pkg/crypto/tls/#ClientAuthType
#
# NOTE: If you want to enable client authentication, you need to use
# RequireAndVerifyClientCert. Other values are insecure.
client_auth_type: RequireAndVerifyClientCert # default = "NoClientCert"
# CA certificate for client certificate authentication to the server.
client_ca_file:
The documentation on this file is located at the HTTPS AND AUTHENTICATION article. Note that after creating this file, you have to start Prometheus with the extra option:
--web.config.file=/path/to/the/file.yml
The above is to be configured on the receiving part. The sending part needs a client TLS certificate configured in its remote_write
:
remote_write:
- url: https://prometheus.example.com
tls_config:
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config
cert_file:
key_file:
QUESTION
I have the following docker-compose file:
version: '3.4'
services:
serviceA:
image:
command:
labels:
servicename: "service-A"
ports:
- "8080:8080"
serviceB:
image:
command:
labels:
servicename: "service-B"
ports:
- "8081:8081"
prometheus:
image: prom/prometheus:v2.32.1
container_name: prometheus
volumes:
- ./prometheus:/etc/prometheus
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
expose:
- 9090
labels:
org.label-schema.group: "monitoring"
volumes:
prometheus_data: {}
The docker-compose contain also Prometheus instance with the following configuration:
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090', 'serviceA:8080', 'serviceB:8081']
ServiceA and ServiceB exposing prometheus metrics(each one on it's own port).
When there is one instance from each service everything works fine but when i want to scale the services and run more than one instance the prometheus metrics collection started to messed up the metrics collection and the data is corrupted.
I looked for docker-compose service discovery for this issue but didn't found suitable one. How can I solve this?
ANSWER
Answered 2022-Feb-19 at 17:59The solution to this problem is to use an actual service discovery instead of static targets. This way Prometheus will scrape each replica during each iteration.
If it is just docker-compose (I mean, not Swarm), you can use DNS service discovery (dns_sd_config) to obtain all IPs belonging to a service:
# docker-compose.yml
version: "3"
services:
prometheus:
image: prom/prometheus
test-service: # <- this
image: nginx
deploy:
replicas: 3
---
# prometheus.yml
scrape_configs:
- job_name: test
dns_sd_configs:
- names:
- test-service # goes here
type: A
port: 80
This is the simplest way to get things up and running.
Moving next, you can use the dedicated Docker service discovery: docker_sd_config. Apart from the target list, it gives you more data in labels (e.g. container name, image version, etc) but it also requires a connection to the Docker daemon to get this data. In my opinion, this is an overkill for a development environment, but it might be essential in production. Here is an example configuration, boldly copy-pasted from https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-docker.yml :
# A example scrape configuration for running Prometheus with Docker.
scrape_configs:
# Make Prometheus scrape itself for metrics.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
# Create a job for Docker daemon.
#
# This example requires Docker daemon to be configured to expose
# Prometheus metrics, as documented here:
# https://docs.docker.com/config/daemon/prometheus/
- job_name: "docker"
static_configs:
- targets: ["localhost:9323"]
# Create a job for Docker Swarm containers.
#
# This example works with cadvisor running using:
# docker run --detach --name cadvisor -l prometheus-job=cadvisor
# --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro
# --mount type=bind,src=/,dst=/rootfs,ro
# --mount type=bind,src=/var/run,dst=/var/run
# --mount type=bind,src=/sys,dst=/sys,ro
# --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro
# google/cadvisor -docker_only
- job_name: "docker-containers"
docker_sd_configs:
- host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
relabel_configs:
# Only keep containers that have a `prometheus-job` label.
- source_labels: [__meta_docker_container_label_prometheus_job]
regex: .+
action: keep
# Use the task labels that are prefixed by `prometheus-`.
- regex: __meta_docker_container_label_prometheus_(.+)
action: labelmap
replacement: $1
At last, there is the dockerswarm_sd_config which is to be used, obviously, with Docker Swarm. This is the most complex thing of the trio and thus, there is a comprehensive official setup guide. Like the docker_sd_config
it has additional information about containers in labels and even more than that (for example, it can tell on which node the container is). An example configuration is available here: https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-dockerswarm.yml , but you should really read the docs to be able to understand it and tune for yourself.
QUESTION
I'm new to monitoring the k8s cluster with prometheus, node exporter and so on.
I want to know that what the metrics exactly mean for though the name of metrics are self descriptive.
I already checked the github of node exporter, but I got not useful information.
Where can I get the descriptions of node exporter metrics?
Thanks
ANSWER
Answered 2022-Feb-10 at 08:34There is a short description along with each of the metrics. You can see them if you open node exporter in browser or just curl http://my-node-exporter:9100/metrics
. You will see all the exported metrics and lines with # HELP
are the description ones:
# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 2.59840376e+07
Grafana can show this help message in the editor: Prometheus (with recent experimental editor) can show it too: And this works for all metrics, not just node exporter's. If you need more technical details about those values, I recommend searching for the information in Google and man
pages (if you're on Linux). Node exporter takes most of the metrics from /proc
almost as-is and it is not difficult to find the details. Take for example node_memory_KReclaimable_bytes
. 'Bytes' suffix is obviously the unit, node_memory
is just a namespace prefix, and KReclaimable
is the actual metric name. Using man -K KReclaimable
will bring you to the proc(5) man page, where you can find that:
KReclaimable %lu (since Linux 4.20)
Kernel allocations that the kernel will attempt to
reclaim under memory pressure. Includes
SReclaimable (below), and other direct allocations
with a shrinker.
Finally, if this intention to learn more about the metrics is inspired by the desire to configure alerts for your hardware, you can skip to the last part and grab some alerts shared by the community from here: https://awesome-prometheus-alerts.grep.to/rules#host-and-hardware
QUESTION
Say I have two metrics in Prometheus, both counters:
Ok:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status="200"}
Failure:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}
Total:
nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}
My question is how to find on which RPS
failures occurred as promQL
query
I'm expecting the following response:
400
Means, that if pod receives > 400 RPS, Failure
metric begin to happen
sum((sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}[$__rate_interval])) without (status))
and
(sum(rate(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status !="200"}[$__rate_interval])) without (status) > 0))
ANSWER
Answered 2022-Feb-08 at 18:32You need the following query:
rps_total and (rps_failure > 0)
The and
binary operation is used for matching right-hand time series to the left-hand series with the same set of labels. See these docs for details on matching rules.
Let's substitute rps_total
and rps_failure
with the actual time series given matching rules mentioned above.
The
rps_total
is substituted withsum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
. Thesum(...) without (status)
is needed in order to sum metrics across all thestatus
labels grouped by the remaining labels.The
rps_failure
is substituted withsum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status)
Then the final PromQL query will look like:
(
sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service"}) without (status)
and
(sum(nginx_ingress_controller_requests{prometheus_from="$cluster", ingress="brand-safety-phoenix-service", status!="200"}) without (status) > 0)
)
QUESTION
It may be a vague question but I couldn't find any documentation regarding the same. Does Google cloud platform have provision to integrate with OpsGenie?
Basically we have set up few alerts in GCP for our Kubernetes Cluster monitoring
and we want them to be feeded to OpsGenie
for Automatic call outs in case of high priority incidents.
Is it possible?
QUESTION
I’ve a PVC in RWX. 2 pods use this PVC. I want to know which pods ask volume to the PVC and when. How can I manage that?
ANSWER
Answered 2021-Dec-03 at 15:33As far as i know there is no direct way to figure out a PVC is used by which pod To get that info possible workaround is grep through all the pods for the respective pvc :
Ex:
- To display all the pods and their respective pvcs:
kubectl get pods -o jsonpath='{"POD"}{"\t"}{"PVC Name"}{"\n"}{range .items[*]}{.metadata.name}{"\t"}{range .spec.volumes[*]}{.persistentVolumeClaim.claimName}{"\t"}{end}{"\n"}{end}'
POD PVC Name
web1-0 www1-web1-0
web16-0 www16-web16-0
- To get information about a particular PVC (in this case:www16-web16-0 ) Using grep :
kubectl get pods -o jsonpath='{"POD"}{"\t"}{"PVC Name"}{"\n"}{range .items[*]}{.metadata.name}{"\t"}{range .spec.volumes[*]}{.persistentVolumeClaim.claimName}{"\t"}{end}{"\n"}{end}' | grep 'POD\|www16-web16-0'
POD PVC Name
web16-0 www16-web16-0
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install palanteer
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page