swarmprom | Docker Swarm instrumentation with Prometheus Grafana | Continuous Deployment library

by stefanprodan Shell Version: v1.8 License: MIT

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | swarmprom Summary

swarmprom is a Shell library typically used in Devops, Continuous Deployment, Docker, Prometheus, Grafana applications. swarmprom has no bugs, it has no vulnerabilities, it has a Permissive License and it has medium support. You can download it from GitHub.

Swarmprom is a starter kit for Docker Swarm monitoring with Prometheus, Grafana, cAdvisor, Node Exporter, Alert Manager and Unsee.

Support

Quality

Security

License

Reuse

Support

swarmprom has a medium active ecosystem.

It has 1823 star(s) with 718 fork(s). There are 66 watchers for this library.

It had no major release in the last 12 months.

There are 62 open issues and 47 have been closed. On average issues are closed in 66 days. There are 4 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of swarmprom is v1.8

Quality

swarmprom has 0 bugs and 0 code smells.

Security

swarmprom has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

swarmprom code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

swarmprom is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

swarmprom releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of swarmprom

Get all kandi verified functions for this library.

swarmprom Key Features

No Key Features are available at this moment for swarmprom.

swarmprom Examples and Code Snippets

No Code Snippets are available at this moment for swarmprom.

Community Discussions

Trending Discussions on swarmprom

Error query in Grafana/Prometheus with node-exporter in docker swarm mode

How to configurate prometheus.yml to scrape only running containers for node-exporter

QUESTION

Error query in Grafana/Prometheus with node-exporter in docker swarm mode

Asked 2021-Mar-08 at 16:32

I am struggling a query problem with Grafana variable query in Dashboard configuration. The query variable should return the number of nodes joined the swarm but it did not. In my case, I only have one swarm node but the variable in Grafana returns up to 5 nodes. I relly don't understand what causes the error.

Here is the situation: I set up docker swarm in my laptop as a manager, only my laptop with the swarm mode, no other nodes joined. I used the source from https://github.com/stefanprodan/swarmprom to monitor the host by node-exporter. I kept the prometheus.yml as original.

when I executes the metric from prometheus, only one host returned. This is correct because I only had one node. You can see the figure below

But when I did the query in Grafana, Grafana returned 5 hosts. It was really strange here. I dont know why I got 5 hosts because I had only one swarm node.

I did check the git repo again with play-with-docker, configured one manager node and 2 client nodes. Everything worked fine. The query in Grafana returned 3 hosts.

Here is the query formula: label_values(node_uname_info{job="node-exporter"}, instance) Thank you so much for you supporting in advance.

...

ANSWER

Answered 2021-Mar-08 at 16:32

What you have faced is a consequence of ephemeral container nature, one of the challenges in monitoring container applications. Before we go into any solution options, let us see ...

How it did happen that Grafana shows more instances than there is.

Prometheus is a time-series database. Once in a while it contacts its scraping targets and collects metrics. Those metrics are saved with a time stamp and a set of labels, one of which is the 'instance' label in question.

The instance label normally consists of an address (a host/domain name or an IP-address) and a port, that prometheus uses to scrape metrics. In this example instance address is an IP-address, because the list of targets is obtained through a DNS-server (dns_sd_configs in job definition).

When you deployed the stack, docker created at least one container for each service, including node-exporter and prometheus. Soon after that prometheus started obtaining metrics from node-exporter instance, however after some time node-exporter container was recreated. Either you updated it, or killed it, or it's crashed - I can't know, but the key is - you had a new container. The new node-exporter container got a different IP-address and because of that metrics from the new instance received a different 'instance' label.

Remember that prometheus is a time series database? You have not lost metrics from the instance that went offline, they're still in the database. It is just at this point you had started collecting node-exporter metrics with a different label set (new IP-address in the 'instance' label at least). When Grafana queries labels for you, it requests metrics from the period currently set on the dashboard. Since the period was 'today', you've seen instances that were present today. In other words when you request a list of possible instance values, you receive a list of values for the period without any filtering for currently active instances.

General solution.

You need to use some static label(s) for this task. An 'instance' or a 'pod_name' (K8s) labels are a poor choice if you don't like to see dead instances in the list. Pick a label that represents the thing or unit you want to watch and stick to it. Since node-exporter is to monitor node metrics, I think a host name label will do.

If you see no way in avoiding use of dynamic labels, you can use a short time range on the dashboard, so that the label_values() function does not return long dead labels. You'd like to set variable refresh option to 'On Time Range Change', so that you can use a short dashboard interval to see and pick currently active instances, and a long period for any other case.

An option for this particular problem.

As I said previously, using a host name label will be better in this case. The problem is - there is no such label in the metric in question. Checking swarmprom repo, I found that this node-exporter was made to expose a host name via node_meta label (here). So it is possible to map a host name to an instance(s) using chained variables.

Another problem is that this solution may require changes in panel queries. Since one host name can resolve into multiple instances, it is essential that panel queries use regex match for 'instance' label (that is =~ instead of =).

Here's how to do all this:

Create a new variable called 'hostname', set refresh option to 'On Time Range Change', and use this for the query field:

Source https://stackoverflow.com/questions/66518524

QUESTION

How to configurate prometheus.yml to scrape only running containers for node-exporter

Asked 2021-Mar-08 at 15:42

I have a problem with the grafana/prometheus when I used node-exporter to collect host's resources from docker swarm nodes.

I tested with only one swarm node. When I used the query
label_values(node_uname_info{job="node-exporter"}, instance) in Grafana variables. The result returned the old ip of stopped containers and the ips of running container as well. I want it only returns the ip of running container. You can see the image below, it shows the ip of node-exported containers all the time.

But actually, one one container is running with the ip 10.0.1.12:9100. The other ips were the old ip of node-exporter containers that started and stopped. Here is the time-series that these contianer were created.

I think we can configurate the scrape method in prometheus.yml with the #relabel_config but I am not familiar with it. Here is the scrape method I got from https://github.com/stefanprodan/swarmprom.

...

ANSWER

Answered 2021-Mar-08 at 15:42

Based on the last comment, you can modify the queries using the following pattern:

Source https://stackoverflow.com/questions/66526433

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install swarmprom

Clone this repository and run the monitoring stack:.
Docker CE 17.09.0-ce or Docker EE 17.06.2-ee-3
Swarm cluster with one manager and a worker node
Docker engine experimental enabled and metrics address set to 0.0.0.0:9323
prometheus (metrics database) http://<swarm-ip>:9090
grafana (visualize metrics) http://<swarm-ip>:3000
node-exporter (host metrics collector)
cadvisor (containers metrics collector)
dockerd-exporter (Docker daemon metrics collector, requires Docker experimental metrics-addr to be enabled)
alertmanager (alerts dispatcher) http://<swarm-ip>:9093
unsee (alert manager dashboard) http://<swarm-ip>:9094
caddy (reverse proxy and basic auth provider for prometheus, alertmanager and unsee)
If you have a Docker Swarm cluster with a global Traefik set up as described in DockerSwarm.rocks, you can deploy Swarmprom integrated with that global Traefik proxy. This way, each Swarmprom service will have its own domain, and each of them will be served using HTTPS, with certificates generated (and renewed) automatically.
Navigate to http://<swarm-ip>:3000 and login with user admin password admin. You can change the credentials in the compose file or by supplying the ADMIN_USER and ADMIN_PASSWORD environment variables at stack deploy.
Name: Prometheus
Type: Prometheus
Url: http://prometheus:9090
Access: proxy
Cluster up-time, number of nodes, number of CPUs, CPU idle gauge
System load average graph, CPU usage graph by node
Total memory, available memory gouge, total disk space and available storage gouge
Memory usage graph by node (used and cached)
I/O usage graph (read and write Bps)
IOPS usage (read and write operation per second) and CPU IOWait
Running containers graph by Swarm service and node
Network usage graph (inbound Bps, outbound Bps)
Nodes list (instance, node ID, node name)
Number of nodes, stacks, services and running container
Swarm tasks graph by service name
Health check graph (total health checks and failed checks)
CPU usage graph by service and by container (top 10)
Memory usage graph by service and by container (top 10)
Network usage graph by service (received and transmitted)
Cluster network traffic and IOPS graphs
Docker engine container and network actions by node
Docker engine list (version, node id, OS, kernel, graph driver)
Uptime, local storage memory chunks and series
CPU usage graph
Memory usage graph
Chunks to persist and persistence urgency graphs
Chunks ops and checkpoint duration graphs
Target scrapes, rule evaluation duration, samples ingested rate and scrape duration graphs

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: