Python-3-basics-series | http : | Time Series Database library
kandi X-RAY | Python-3-basics-series Summary
Support
Quality
Security
License
Reuse
- Simple addition
- Example example
- Example thread
- Basic basic window
Python-3-basics-series Key Features
Python-3-basics-series Examples and Code Snippets
Trending Discussions on Time Series Database
Trending Discussions on Time Series Database
QUESTION
I've an application, and I'm running one instance of this application per AWS region. I'm trying to instrument the application code with Prometheus metrics client, and will be exposing the collected metrics to the /metrics
endpoint. There is a central server which will scrape the /metrics
endpoints across all the regions and will store them in a central Time Series Database.
Let's say I've defined a metric named: http_responses_total
then I would like to know its value aggregated over all the regions along with individual regional values. How do I store this region
information which could be any one of the 13 regions and env
information which could be dev
or test
or prod
along with metrics so that I can slice and dice metrics based on region
and env
?
I found a few ways to do it, but not sure how it's done in general, as it seems a pretty common scenario:
- Storing
region
andenv
info as labels with each of the metrics (not recommended: https://prometheus.io/docs/instrumenting/writing_exporters/#target-labels-not-static-scraped-labels) - Using target labels - I have
region
andenv
value with me in the application and would like to set this information from the application itself instead of setting them in scrape config - Keeping a separate gauge metric to record
region
andenv
info as labels (like described here: https://www.robustperception.io/exposing-the-software-version-to-prometheus) - this is how I'm planning to store my applicationversion
info in tsdb but the difference between appversion
info andregion
info is: the version keeps changing across releases however region is which I get from the config file is constant. So, not sure if this is a good way to do it.
I'm new to Prometheus. Could someone please suggest how I should store this region
and env
information? Are there any other better ways?
ANSWER
Answered 2022-Mar-09 at 17:53All the proposed options will work, and all of them have downsides.
The first option (having env
and region
exposed by the application with every metric) is easy to implement but hard to maintain. Eventually somebody will forget to about these, opening a possibility for an unobserved failure to occur. Aside from that, you may not be able to add these labels to other exporters, written by someone else. Lastly, if you have to deal with millions of time series, more plain text data means more traffic.
The third option (storing these labels in a separate metric) will make it quite difficult to write and understand queries. Take this one for example:
sum by(instance) (node_arp_entries) and on(instance) node_exporter_build_info{version="0.17.0"}
It calculates a sum
of node_arp_entries
for instances with node-exporter version="0.17.0"
. Well more specifically it calculates a sum for every instance and then just drops those with a wrong version, but you got the idea.
The second option (adding these labels with Prometheus as a part of scrape configuration) is what I would choose. To save the words, consider this monitoring setup:
Datacener Prometheus Regional Prometheus Global Prometheus 1. Collects metrics from local instances. 2. Addsdc
label to each metric. 3. Pushes the data into the regional Prometheus -> 1. Collects data on datacenter scale. 2. Adds region
label to all metrics. 3. Pushes the data into the global instance -> Simply collects and stores the data on global scale
This is the kind of setup you need on Google scale, but the point is the simplicity. It's perfectly clear where each label comes from and why. This approach requires you to make Prometheus configuration somewhat more complicated, and the less Prometheus instances you have, the more scrape configurations you will need. Overall, I think, this option beats the alternatives.
QUESTION
I'm working on attaching Amazon EKS (NFS) to Kubernetes pod using terraform.
Everything runs without an error and is created:
- Pod victoriametrics
- Storage Classes
- Persistent Volumes
- Persistent Volume Claims
However, the volume victoriametrics-data
doesn't attach to the pod. Anyway, I can't see one in the pod's shell. Could someone be so kind to help me understand where I'm wrong, please?
I have cut some unimportant code for the question to get code shorted.
resource "kubernetes_deployment" "victoriametrics" {
...
spec {
container {
image = var.image
name = var.name
...
volume_mount {
mount_path = "/data"
mount_propagation = "None"
name = "victoriametrics-data"
read_only = false
}
}
volume {
name = "victoriametrics-data"
}
}
}
...
}
resource "kubernetes_csi_driver" "efs" {
metadata {
name = "${local.cluster_name}-${local.namespace}"
annotations = {
name = "For store data of ${local.namespace}."
}
}
spec {
attach_required = true
pod_info_on_mount = true
volume_lifecycle_modes = ["Persistent"]
}
}
resource "kubernetes_storage_class" "efs" {
metadata {
name = "efs-sc"
}
storage_provisioner = kubernetes_csi_driver.efs.id
reclaim_policy = "Retain"
mount_options = ["file_mode=0700", "dir_mode=0777", "mfsymlinks", "uid=1000", "gid=1000", "nobrl", "cache=none"]
}
resource "kubernetes_persistent_volume" "victoriametrics" {
metadata {
name = "${local.cluster_name}-${local.namespace}"
}
spec {
storage_class_name = "efs-sc"
persistent_volume_reclaim_policy = "Retain"
volume_mode = "Filesystem"
access_modes = ["ReadWriteMany"]
capacity = {
storage = var.size_of_persistent_volume_claim
}
persistent_volume_source {
nfs {
path = "/"
server = local.eks_iput_target
}
}
}
}
resource "kubernetes_persistent_volume_claim" "victoriametrics" {
metadata {
name = local.name_persistent_volume_claim
namespace = local.namespace
}
spec {
access_modes = ["ReadWriteMany"]
storage_class_name = "efs-sc"
resources {
requests = {
storage = var.size_of_persistent_volume_claim
}
}
volume_name = kubernetes_persistent_volume.victoriametrics.metadata.0.name
}
}
kind: Deployment
apiVersion: apps/v1
metadata:
name: victoriametrics
namespace: victoriametrics
labels:
k8s-app: victoriametrics
purpose: victoriametrics
annotations:
deployment.kubernetes.io/revision: '1'
name: >-
VictoriaMetrics - The High Performance Open Source Time Series Database &
Monitoring Solution.
spec:
replicas: 1
selector:
matchLabels:
k8s-app: victoriametrics
purpose: victoriametrics
template:
metadata:
name: victoriametrics
creationTimestamp: null
labels:
k8s-app: victoriametrics
purpose: victoriametrics
annotations:
name: >-
VictoriaMetrics - The High Performance Open Source Time Series
Database & Monitoring Solution.
spec:
containers:
- name: victoriametrics
image: 714154805721.dkr.ecr.us-east-1.amazonaws.com/victoriametrics:v1.68.0
ports:
- containerPort: 8428
protocol: TCP
- containerPort: 2003
protocol: TCP
- containerPort: 2003
protocol: UDP
volumeMounts:
- mountPath: /data
name: victoriametrics-data
- mountPath: /var/log
name: varlog
env:
- name: Name
value: victoriametrics
resources:
limits:
cpu: '1'
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
volumes:
- name: victoriametrics-data
emptyDir: {}
- name: varlog
emptyDir: {}
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
automountServiceAccountToken: true
shareProcessNamespace: false
securityContext: {}
schedulerName: default-scheduler
tolerations:
- key: k8s-app
operator: Equal
value: victoriametrics
effect: NoSchedule
enableServiceLinks: true
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
minReadySeconds: 15
revisionHistoryLimit: 10
progressDeadlineSeconds: 300
ANSWER
Answered 2021-Nov-10 at 02:26You need to use the persistent volume claim that you have created instead of emptyDir
in your deployment:
kind: Deployment
apiVersion: apps/v1
metadata:
name: victoriametrics
...
volumes:
- name: victoriametrics-data
persistentVolumeClaim:
claimName:
QUESTION
I have an InfluxDB Version 1.8.9, but I can't start it. In this example I'm logged in as a root.
netstat -lptn
gives me a range of services, none of them seem to listen to 8086. (there are other services running like grafana or MySQL, which seem to work fine)
To further confirm nothing is on 8086,I listened to that related Issue run: open server: open service: listen tcp :8086: bind: address already in use on starting influxdb and run
netstat -a | grep 8086
which results in no results.
My config file on /etc/influxdb/influxdb.conf looks like this:
reporting-disabled = false
bind-address = "127.0.0.1:8086"
[meta]
#dir = "/root/.influxdb/meta"
dir = "/var/lib/influxdb/meta"
retention-autocreate = true
logging-enabled = true
[data]
dir = "/var/lib/influxdb/data"
index-version = "inmem"
wal-dir = "/var/lib/influxdb/wal"
wal-fsync-delay = "0s"
validate-keys = false
strict-error-handling = false
query-log-enabled = true
cache-max-memory-size = 1073741824
cache-snapshot-memory-size = 26214400
cache-snapshot-write-cold-duration = "10m0s"
compact-full-write-cold-duration = "4h0m0s"
compact-throughput = 50331648
compact-throughput-burst = 50331648
max-series-per-database = 1000000
max-values-per-tag = 100000
max-concurrent-compactions = 0
max-index-log-file-size = 1048576
series-id-set-cache-size = 100
series-file-max-concurrent-snapshot-compactions = 0
trace-logging-enabled = false
tsm-use-madv-willneed = false
...
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
suppress-write-log = false
write-tracing = false
flux-enabled = false
flux-log-enabled = false
pprof-enabled = true
pprof-auth-enabled = false
debug-pprof-enabled = false
ping-auth-enabled = false
prom-read-auth-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
https-private-key = ""
max-row-limit = 0
max-connection-limit = 0
shared-secret = ""
realm = "InfluxDB"
unix-socket-enabled = false
unix-socket-permissions = "0777"
bind-socket = "/var/run/influxdb.sock"
max-body-size = 25000000
access-log-path = ""
max-concurrent-write-limit = 0
max-enqueued-write-limit = 0
enqueued-write-timeout = 30000000000
...
So i tried to start my database:
service influxdb start
Which gives me
ob for influxdb.service failed because a timeout was exceeded. See "systemctl status influxdb.service" and "journalctl -xe" for details.
result of systemctl status influxdb.service
● influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
Active: activating (start) since Tue 2021-09-21 18:37:12 CEST; 1min 7s ago
Docs: https://docs.influxdata.com/influxdb/
Main PID: 32016 (code=exited, status=1/FAILURE); Control PID: 5874 (influxd-systemd)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/influxdb.service
├─5874 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh
└─5965 sleep 10
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515897Z lvl=info msg="Registered diagnostics client" log_id=0WjJLI7l000 service=monitor name=runtime
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515907Z lvl=info msg="Registered diagnostics client" log_id=0WjJLI7l000 service=monitor name=network
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515923Z lvl=info msg="Registered diagnostics client" log_id=0WjJLI7l000 service=monitor name=system
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515977Z lvl=info msg="Starting precreation service" log_id=0WjJLI7l000 service=shard-precreation check_interval=10m advanc
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.515995Z lvl=info msg="Starting snapshot service" log_id=0WjJLI7l000 service=snapshot
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516015Z lvl=info msg="Starting continuous query service" log_id=0WjJLI7l000 service=continuous_querier
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516011Z lvl=info msg="Storing statistics" log_id=0WjJLI7l000 service=monitor db_instance=_internal db_rp=monitor interval=
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516037Z lvl=info msg="Starting HTTP service" log_id=0WjJLI7l000 service=httpd authentication=false
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: ts=2021-09-21T16:37:23.516052Z lvl=info msg="opened HTTP access log" log_id=0WjJLI7l000 service=httpd path=stderr
Sep 21 18:37:23 s22227708 influxd-systemd-start.sh[5874]: run: open server: open service: listen tcp :8086: bind: address already in use
I can't really understand where I did something wrong, since I configured :8086 in the config file. Can you help me?
ANSWER
Answered 2021-Sep-21 at 17:57It appears to be a typo in the configuration file. As stated in the documentation, the configuration file should hold http-bind-address
instead of bind-address
. As well as a locked port by the first configuration.
The first few lines of the file /etc/influxdb/influxdb.conf
should look like so:
reporting-disabled = false
http-bind-address = "127.0.0.1:8086"
A suggested approach would be to:
bind-address
tohttp-bind-address
- Changing the port from default
8086
to a known free port - (Optional) Back to the default port.
QUESTION
I'm trying to find a time series database for the following scenario:
- Some sensor on raspberry pi provides the realtime data.
- Some application takes the data and pushes to the time series database.
- If network is off (GSM modem ran out of money or rain or something else), store data locally.
- Once network is available the data should be synchronised to the time series database in the cloud. So no missing data and no duplicates.
- (Optionally) query database from Grafana
I'm looking for time series database that can handle 3. and 4. for me. Is there any?
I can start Prometheus in federated mode (Can I?) and keep one node on raspberry pi for initial ingestion and another node in the cloud for collecting the data. But that setup would instantly consume 64mb+ of memory for Prometheus node.
ANSWER
Answered 2021-Sep-14 at 22:08Take a look at vmagent. It can be installed at every device where metrics from local sensors must be collected (e.g. at the edge), and collect all these metrics via various popular data ingestion protocols. Then it can push the collected metrics to a centralized time series database such as VictoriaMetrics. Vmagent buffers the collected metrics on the local storage when the connection to a centralized database is unavailable, and pushes the buffered data to the database as soon as the connection is recovered. Vmagent works on Rasberry PI and on any device with ARM, ARM64 or AMD64 architecture.
See use cases for vmagent for more details.
QUESTION
I am trying to incorporate the time series database with the laboratory real time monitoring equipment. For scalar data such as temperature the line protocol works well:
temperature,site=reactor temperature=20.0 1556892576842902000
For 1D (e.g., IR Spectrum) or higher dimensional data, I came up two approaches to write data.
- Write each element of the spectrum as field set as shown below. This way I can query individual frequency and perform analysis or visualization using the existing software. However, each record will easily contain thousands of field sets due to the high resolution of the spectrometer. My concern is whether the line protocol is too chunky and the storage can get inefficient or not.
ir_spectrum,site=reactor w1=10.0,w2=11.2,w3=11.3,......,w4000=2665.2 1556892576842902000
- Store the vector as a serialized string (e.g., JSON). This way I may need some plugins to adapt the data to the visualization tools such as Grafana. But the protocol will look cleaner. I am not sure whether the storage layout is better than the first approach or not.
ir_spectrum,site=reactor data="[10.0, 11.2, 11.3, ......, 2665.2]" 1556892576842902000
I wonder whether there is any recommended way to store the high dimensional data? Thanks!
ANSWER
Answered 2021-Sep-05 at 11:04The first approach is better from the performance and disk space usage PoV. InfluxDB stores each field in a separate column. If a column contains similar numeric values, then it may be compressed better compared to the column with JSON strings. This also improves query speed when selecting only a subset of fields or filtering on a subset of fields.
P.S. InfluxDB may need high amounts of RAM for big number of fields and big number of tag combinations (aka high cardinality). In this case there are alternative solutions, which support InfluxDB line protocol and require lower amounts of RAM for high cardinality time series. See, for example, VictoriaMetrics.
QUESTION
I use a time series database to report some network metrics, such as the download time or DNS lookup time for some endpoints. However, sometimes the measure fails like if the endpoint is down, or if there is a network issue. In theses cases, what should be done according to the best practices? Should I report an impossible value, like -1
, or just not write anything at all in the database?
The problem I see when not writing anything, is that I cannot know if my test is not running anymore, or if it is a problem with the endpoint/network.
ANSWER
Answered 2021-Jun-08 at 13:53The best practice is to capture the failures in their own time series for separate analysis.
Failures or bad readings will skew the series, so they should be filtered out or replaced with a projected value for 'normal' events. The beauty of a time series is that one measure (time) is globally common, so it is easy to project between two known points when one is missing.
The failure information is also important, as it is an early indicator to issues or outages on your target. You can record the network error and other diagnostic information to find trends and ensure it is the client and not your server having the issue. Further, there can be several instances deployed to monitor the same target so that they cancel each other's noise.
You can also monitor a known endpoint like google's 204 page to ensure network connectivity. If all the monitors report an error connecting to your site but not to the known endpoint, your server is indeed down.
QUESTION
I have a database that is being used to create a time series. The date column in the time series database is formatted as a POSIXct date format.
Databasedata <- structure(list(TimeStep = c("1", "1", "1", "1", "10", "10", "10",
"10", "11", "11", "11", "11", "12", "12", "12", "12", "2", "2",
"2", "2", "3", "3", "3", "3", "4", "4", "4", "4", "5", "5", "5",
"5", "6", "6", "6", "6", "7", "7", "7", "7", "8", "8", "8", "8",
"9", "9", "9", "9"), Shelter = structure(c(1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("Low",
"High"), class = c("ordered", "factor")), Site_long = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Hanauma Bay",
"Waikiki"), class = c("ordered", "factor")), mean = c(0, 0.00015328484505956,
7.04072939791738e-06, 0.000682330358210582, 2.66666666666667e-06,
6.00605206484062e-07, 0, 0.000386079421618121, 0, 0.000107757674499178,
3.38820905388829e-05, 0.000309370042627687, 2.66666666666667e-06,
0.000258419039004161, 3.40590905388829e-05, 0.000600512507641285,
0, 0.000252408374787308, 7.40825319693649e-05, 0.000656756473864922,
8.69852169224144e-05, 7.75576156462366e-05, 8.33333333333333e-08,
0.000665098673793383, 0, 2.04566422429192e-05, 5.31031263835315e-05,
0.000647037987463065, 0, 0.000198961690069793, 0, 0.000423723440028656,
8.16572939791738e-06, 0.00012785593538717, 2.34677162777718e-08,
0.000224655093601419, 3.33333333333333e-07, 7.42595319693649e-05,
0.00039116506393873, 2.13882406395912e-05, 2.66666666666667e-06,
0.000107081818106607, 4.16666666666667e-08, 3.77742205841092e-05,
1.88237465916921e-05, 0.00010916313850296, 0, 7.6903907646831e-05
), sd = c(0, 0.00015966459922941, 9.95709500353371e-06, 0.000482307005382674,
3.77123616632825e-06, 8.0011788810835e-07, 0, 0.000440508169853482,
0, 0.000152392364726523, 4.79165119616413e-05, 0.000309520641549238,
3.77123616632825e-06, 0.000184777835884834, 4.81668277621813e-05,
0.000477723602613157, 0, 0.000179807492264746, 0.000104768521446014,
0.000482423300574339, 0.000123015673497644, 0.000105040104106768,
1.17851130197758e-07, 0.000498359190138349, 0, 2.55445716349544e-05,
7.50991615360028e-05, 0.000105303327368202, 0, 0.000185382956021377,
0, 0.000316153398426712, 1.15480852612034e-05, 9.37823230585177e-05,
3.31883626379487e-08, 0.000297053654293301, 4.71404520791032e-07,
0.000104643588086737, 0.000406298601551981, 1.51238560888153e-05,
3.77123616632825e-06, 0.000145843975969992, 5.8925565098879e-08,
2.23804174757353e-05, 2.66207977246453e-05, 0.000145131278731618,
0, 5.83391012319395e-05), lower = c(0, 6.11024457144067e-05,
1.29199791728051e-06, 0.000403870278820852, 4.89342450859397e-07,
1.38656928401271e-07, 0, 0.000131751911172984, 0, 1.97739017018764e-05,
6.21747945920136e-06, 0.000130668216909489, 4.89342450859397e-07,
0.000151737505715775, 6.24995956437715e-06, 0.000324698657074341,
0, 0.000148596470725945, 1.35943979099095e-05, 0.000378229251414979,
1.59620847140065e-05, 1.69126832644878e-05, 1.52919515893562e-08,
0.000377370861213884, 0, 5.70847693314461e-06, 9.74460525480512e-06,
0.000586241083060471, 0, 9.19307905076825e-05, 0, 0.000241192190341779,
1.49843926373682e-06, 7.37106859241061e-05, 4.30640617478918e-09,
5.31510863314202e-05, 6.11678063574246e-08, 1.38435282185187e-05,
0.000156588456961325, 1.26564782555285e-05, 4.89342450859397e-07,
2.2878759320647e-05, 7.64597579467808e-09, 2.48528805299173e-05,
3.45422185932556e-06, 2.53715556594257e-05, 0, 4.32218118462899e-05
), upper = c(0, 0.000245467244404714, 1.27894608785542e-05, 0.000960790437600311,
4.84399088247394e-06, 1.06255348456685e-06, 0, 0.000640406932063258,
0, 0.00019574144729648, 6.15467016185644e-05, 0.000488071868345885,
4.84399088247394e-06, 0.000365100572292546, 6.18682215133886e-05,
0.000876326358208229, 0, 0.00035622027884867, 0.00013457066602882,
0.000935283696314864, 0.000158008349130822, 0.000138202548027985,
1.51374715077311e-07, 0.000952826486372883, 0, 3.52048075526939e-05,
9.64616475122579e-05, 0.00070783489186566, 0, 0.000305992589631903,
0, 0.000606254689715534, 1.48330195320979e-05, 0.000182001184850233,
4.26290263807544e-08, 0.000396159100871418, 6.05498860309242e-07,
0.000134675535720211, 0.000625741670916134, 3.01200030236539e-05,
4.84399088247394e-06, 0.000191284876892567, 7.56873575386553e-08,
5.06955606383012e-05, 3.41932713240586e-05, 0.000192954721346495,
0, 0.000110586003447372), Date_new = structure(c(17311, 17323,
17311, 17323, 18154, 18149, 18154, 18149, 18244, 18240, 18244,
18240, 18309, 18338, 18310.6666666667, 18338, 17419, 17414, 17419,
17414, 17503, 17498, 17503, 17498, 17596.3333333333, 17561, 17605.3333333333,
17561, 17671, 17666, 17671, 17666, 17775, 17771, 17775, 17771,
17873, 17869, 17873, 17869, 17977, 17974, 17977, 17974, 18050,
18051, 18050, 18051), class = "Date")), row.names = c(NA, -48L
), groups = structure(list(TimeStep = c("1", "1", "10", "10",
"11", "11", "12", "12", "2", "2", "3", "3", "4", "4", "5", "5",
"6", "6", "7", "7", "8", "8", "9", "9"), Shelter = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Low", "High"), class = c("ordered",
"factor")), .rows = structure(list(1:2, 3:4, 5:6, 7:8, 9:10,
11:12, 13:14, 15:16, 17:18, 19:20, 21:22, 23:24, 25:26, 27:28,
29:30, 31:32, 33:34, 35:36, 37:38, 39:40, 41:42, 43:44, 45:46,
47:48), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, -24L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
When I plot these data using month only labels at three-month increments, I get labels at "Jul", "Oct", "Jan", and "Apr", when I actually want the month labels to be "Jun", "Sep", "Dec", and "Mar" instead. I have tried specifying the labels as a character vector of these month abbreviations to no avail.
Plot Codeurchin_time_series_plot <- ggplot(data = data, aes(x = Date_new, y = mean, fill = Shelter, shape = Site_long)) +
geom_point(aes(size = 3)) +
geom_line(aes(linetype = Shelter)) +
scale_linetype_manual(values=c("dashed", "solid")) +
scale_shape_manual(values = c(21, 24)) +
scale_fill_manual(values = c(NA, "black"), guide = guide_legend(override.aes = list(shape = 21))) +
scale_y_continuous(labels = scientific) +
guides(size = FALSE) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0) +
scale_x_date(date_breaks = "3 months", date_labels = "%b") +
labs(x = "Date", y = "Urchin biomass (Kg)") +
theme_bw() + theme(text = element_text(size = 14), panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), axis.title.x = element_blank(), axis.text = element_text(size = 16), legend.text = element_text(size = rel(1.5)), axis.text.y = element_text(angle = 90, hjust = .9), legend.title = element_text(size = rel(1.5)), legend.position = "none", axis.title.y = element_text(size = 18), axis.title = element_text(size = 16))
ANSWER
Answered 2021-May-18 at 21:58The solution I found is to expand the date range using the expand_limits() function in ggplot2 so that some days in May are included. By padding the range, I get the correct output
urchin_time_series_plot <- ggplot(data = plot_data_urchin, aes(x = Date_new, y = mean, fill = Shelter, shape = Site_long)) +
geom_point(aes(size = 3)) +
geom_line(aes(linetype = Shelter)) +
scale_linetype_manual(values=c("dashed", "solid")) +
scale_shape_manual(values = c(21, 24)) +
scale_fill_manual(values = c(NA, "black"), guide = guide_legend(override.aes = list(shape = 21))) +
scale_y_continuous(labels = scientific) +
guides(size = FALSE) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0) +
scale_x_date(date_breaks = "3 months", date_labels = "%b") +
expand_limits(x = as.Date(c("2017-05-15", "2020-04-01"))) +
labs(x = "Date", y = "Urchin biomass (Kg)") +
theme_bw() + theme(text = element_text(size = 14), panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), axis.title.x = element_blank(), axis.text = element_text(size = 16), legend.text = element_text(size = rel(1.5)), axis.text.y = element_text(angle = 90, hjust = .9), legend.title = element_text(size = rel(1.5)), legend.position = "none", axis.title.y = element_text(size = 18), axis.title = element_text(size = 16))
QUESTION
I have been experimenting with Splunk, trying to emulate some basic functionality from the OSISoft PI Time Series database.
I have two data points that I wish to display trends for over time in order to compare fluctuations between them, specifically power network MW analogue tags.
In PI this is very easy to do, however I am having difficulty figuring out how to do it in Splunk.
How do I achieve this given the field values "SubstationA_T1_MW", & "SubstationA_T2_MW" in the field Tag
?
The fields involved are TimeStamp
, Tag
, Value
, and Status
Edit:
Sample Input and Output listed below:
ANSWER
Answered 2021-Apr-29 at 12:41I suspect you're going to be most interested in timechart
for this
Something along the following lines may get you towards what you're looking for:
index=ndx sourcetype=srctp Value=* TimeStamp=* %NStatus=* (Tag=SubstationA_T1_MW OR Tag=SubstationA_T2_MW) earliest=-2h
| eval _time=strptime(TimeStamp,"%m/%d/%Y %H:%M:%S.%N")
| timechart span=15m max(Value) as Value by Tag
timechart
relies on the internal, hidden _time
field (which is in Unix epoch time) - so if _time
doesn't match TimeStamp
, you need the eval
statement I added to convert from your TimeStamp
to Unix epoch time in _time
(which I've assumed is in mm/dd/yyyy format).
Also, go take the free, self-paced Splunk Fundamentals 1 class
QUESTION
I would like to deploy the time series database QuestDB on GCP, but I do not see any instructions on the documentation. Could I get some steps?
ANSWER
Answered 2021-Apr-08 at 09:38This can be done in a few shorts steps on Compute Engine. When creating a new instance, choose the region and instance type, then:
- In the "Container" section, enable "Deploy a container image to this VM instance"
- type
questdb/questdb:latest
for the "Container image"
This will pull the latest QuestDB docker image and run it on your instance when launching. The rest of the setup steps are setting firewall rules to allow networking on the ports you require:
- port 9000 - web console & REST API
- port 8812 - PostgreSQL wire protocol
Source of this info is an ETL tutorial by Gabor Boros which deploys QuestDB to GCP and uses Cloud Functions for loading and processing data from a storage bucket.
QUESTION
I'm very new to SQL and time series database. I'm using crate database. I want to aggregate the data by day. But the I want to start each day start time is 9 AM not 12AM..
Time interval is 9 AM to 11.59 PM.
Unix time stamp is used to store the data. following is my sample database.
|sensorid | reading | timestamp|
====================================
|1 | 1616457600 | 10 |
|1 | 1616461200 | 100 |
|2 | 1616493600 | 1 |
|2 | 1616493601 | 10 |
Currently i grouped using following command. But it gives the start time as 12 AM.
select date_trunc('day', v.timestamp) as day,sum(reading)
from sensor1 v(timestamp)
group by (DAY)
From the above table. i want sum of 1616493600 and 1616493601 data (11 is result). because 1616457600 and 1616461200 are less than 9 am data.
ANSWER
Answered 2021-Mar-23 at 09:47You want to add nine hours to midnight:
date_trunc('day', v.timestamp) + interval '9' hour
Edit: If you want to exclude hours before 9:00 from the data you add up, you must add a WHERE
clause:
where extract(hour from v.timestamp) >= 9
Here is a complete query with all relevant data:
select
date_trunc('day', v.timestamp) as day,
date_trunc('day', v.timestamp) + interval '9' hour as day_start,
min(v.timestamp) as first_data,
max(v.timestamp) as last_data,
sum(reading) as total_reading
from sensor1 v(timestamp)
where extract(hour from v.timestamp) >= 9
group by day
order by day;
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Python-3-basics-series
You can use Python-3-basics-series like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page