Popular New Releases in Monitoring
netdata
v1.34.1
sentry
22.4.0
skywalking
9.0.0
osquery
uptime-kuma
Popular Libraries in Monitoring
by netdata c
58912 GPL-3.0
Real-time performance monitoring, done right! https://www.netdata.cloud
by getsentry python
30691 NOASSERTION
Sentry is cross-platform application monitoring, with a focus on error reporting.
by apache java
19150 Apache-2.0
APM, Application Performance Monitoring System
by osquery c++
18789 NOASSERTION
SQL powered operating system instrumentation, monitoring, and analytics.
by dianping java
16597 Apache-2.0
CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。
by louislam javascript
16089 MIT
A fancy self-hosted monitoring tool
by buger go
15413 NOASSERTION
GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes.
by openzipkin java
15220 Apache-2.0
Zipkin is a distributed tracing system
by allinurl c
14533 MIT
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
Trending New libraries in Monitoring
by louislam javascript
16089 MIT
A fancy self-hosted monitoring tool
by upgundecha javascript
6632 CC0-1.0
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
by SigNoz typescript
6127 MIT
SigNoz is an open-source APM. It helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
by pyroscope-io go
5738 Apache-2.0
Continuous Profiling Platform. Debug performance issues down to a single line of code
by didi go
4433 Apache-2.0
An enterprise-level cloud-native monitoring system, which can be used as drop-in replacement of Prometheus for alerting and management.
by pppscn java
3181 BSD-2-Clause
短信转发器——监控Android手机短信、来电、APP通知,并根据指定规则转发到其他手机:钉钉机器人、企业微信群机器人、飞书机器人、企业微信应用消息、邮箱、bark、webhook、Telegram机器人、Server酱、PushPlus、手机短信等。PS.这个APK主要是学习与自用,如有BUG请提ISSUE,同时欢迎大家提PR指正
by DataDog jupyter notebook
2336 CC-BY-SA-4.0
felixge's notes on the various go profiling methods that are available.
by felixge go
1814 MIT
🚀 fgprof is a sampling Go profiler that allows you to analyze On-CPU as well as Off-CPU (e.g. I/O) time together.
by arl go
1766 MIT
:rocket: Instant live visualization of your Go application runtime statistics (GC, MemStats, etc.) in the browser
Top Authors in Monitoring
1
67 Libraries
2876
2
60 Libraries
631
3
49 Libraries
45796
4
34 Libraries
5453
5
32 Libraries
157
6
26 Libraries
3131
7
25 Libraries
2626
8
21 Libraries
204
9
16 Libraries
8206
10
14 Libraries
78
1
67 Libraries
2876
2
60 Libraries
631
3
49 Libraries
45796
4
34 Libraries
5453
5
32 Libraries
157
6
26 Libraries
3131
7
25 Libraries
2626
8
21 Libraries
204
9
16 Libraries
8206
10
14 Libraries
78
Trending Kits in Monitoring
Here are some famous Java Monitoring Libraries. Some Java Monitoring Libraries' use cases include Performance Monitoring, Memory Usage Monitoring, Error Reporting, Metrics Collection, and Logging.
Java monitoring Libraries offers deep application performance monitoring with byte-code instrumentation for Java applications in development, QA, and production environments. Ensure that your end users have a satisfactory experience by simulating their journey through the applications.
Let us have a look at these libraries in detail below.
pinpoint
- Real-time monitoring of your application.
- Get transaction visibility at the code level.
- Without altering a single line of code, install APM Agents.
javamelody
- Measure and calculate statistics on real operation of an application depending on the usage of the application by users.
- Give facts about the average response times and number of executions.
- Verify the real improvement after optimizations.
zabbix
- Enables you to collect metrics.
- Offers its users a variety of clever, adjustable threshold defining choices.
- Functions and operations that are supported for the statistical analysis of historical data.
visualvm
- Monitors and troubleshoots applications running on Java 1.4+.
- Perfectly fits all the requirements of application developers, system administrators, and end users.
- Provides detailed information about Java applications while they are running on the Java Virtual Machine.
log4j
- Supports multiple output appenders per logger.
- Not limited to a set of predetermined facilities.
- Seed-optimized and thread-safe.
JHiccup
- Allows developers and system operators to easily create and analyze response time profiles.
- Unique in looking at underlying platform.
- Helps to avoid common pitfall in application performance characterization.
jamonapi
- Rich feature set to support encapsulation, parameterization, and reuse of presentation logic.
- Declare the arguments required for rendering.
- Allows passing arbitrary objects between templates.
relics
- Being able to monitor measurements and performance under pressure.
- Enhance output and identify bottlenecks.
- The capacity to probe deeply and evaluate the code stack.
Trending Discussions on Monitoring
How to fix SageMaker data-quality monitoring-schedule job that fails with 'FailureReason': 'Job inputs had no data'
Add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes
getifaddrs returning 'bad file descriptor'/crashing the application
Filter the parts of a Request Path which match against a Static Segment in Servant
CloudScheduler 403 Permission denied while creating
Elastic Beanstalk deleting generated files on config changes
GCP alert if Docker container exits
How to make Spring Boot Actuator available via JMX with globally enabled lazy initialization?
Monitor language specific API Calls for Microsoft Translator API
switchMap combined with mergeMap
QUESTION
How to fix SageMaker data-quality monitoring-schedule job that fails with 'FailureReason': 'Job inputs had no data'
Asked 2022-Feb-26 at 04:38I am trying to schedule a data-quality monitoring job in AWS SageMaker by following steps mentioned in this AWS documentation page. I have enabled data-capture for my endpoint. Then, trained a baseline on my training csv file and statistics and constraints are available in S3 like this:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6 role=get_execution_role(),
7 instance_count=1,
8 instance_type='ml.m5.large',
9 volume_size_in_gb=30,
10 max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21 baseline_dataset=baseline_data_uri,
22 dataset_format=DatasetFormat.csv(header=True),
23 output_s3_uri=baseline_results_uri,
24 wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26
Then I tried scheduling a monitoring job by following this example notebook for model-quality-monitoring in sagemaker-examples github repo, to schedule my data-quality-monitoring job by making necessary modifications with feedback from error messages.
Here's how tried to schedule the data-quality monitoring job from SageMaker Studio:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6 role=get_execution_role(),
7 instance_count=1,
8 instance_type='ml.m5.large',
9 volume_size_in_gb=30,
10 max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21 baseline_dataset=baseline_data_uri,
22 dataset_format=DatasetFormat.csv(header=True),
23 output_s3_uri=baseline_results_uri,
24 wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49 destination="/opt/ml/processing/input_data",
50 s3_input_mode="File",
51 s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56 monitor_schedule_name=monitor_schedule_name,
57 output_s3_uri=baseline_dir_uri,
58 constraints=baseline_constraints,
59 statistics=baseline_statistics,
60 schedule_cron_expression=CronExpressionGenerator.hourly(),
61 enable_cloudwatch_metrics=True)
62
after an hour or so, when I check the status of the schedule like this:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6 role=get_execution_role(),
7 instance_count=1,
8 instance_type='ml.m5.large',
9 volume_size_in_gb=30,
10 max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21 baseline_dataset=baseline_data_uri,
22 dataset_format=DatasetFormat.csv(header=True),
23 output_s3_uri=baseline_results_uri,
24 wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49 destination="/opt/ml/processing/input_data",
50 s3_input_mode="File",
51 s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56 monitor_schedule_name=monitor_schedule_name,
57 output_s3_uri=baseline_dir_uri,
58 constraints=baseline_constraints,
59 statistics=baseline_statistics,
60 schedule_cron_expression=CronExpressionGenerator.hourly(),
61 enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65
I get failed status like below:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6 role=get_execution_role(),
7 instance_count=1,
8 instance_type='ml.m5.large',
9 volume_size_in_gb=30,
10 max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21 baseline_dataset=baseline_data_uri,
22 dataset_format=DatasetFormat.csv(header=True),
23 output_s3_uri=baseline_results_uri,
24 wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49 destination="/opt/ml/processing/input_data",
50 s3_input_mode="File",
51 s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56 monitor_schedule_name=monitor_schedule_name,
57 output_s3_uri=baseline_dir_uri,
58 constraints=baseline_constraints,
59 statistics=baseline_statistics,
60 schedule_cron_expression=CronExpressionGenerator.hourly(),
61 enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65'MonitoringExecutionStatus': 'Failed',
66 ...
67 'FailureReason': 'Job inputs had no data'},
68
Entire Message:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6 role=get_execution_role(),
7 instance_count=1,
8 instance_type='ml.m5.large',
9 volume_size_in_gb=30,
10 max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21 baseline_dataset=baseline_data_uri,
22 dataset_format=DatasetFormat.csv(header=True),
23 output_s3_uri=baseline_results_uri,
24 wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49 destination="/opt/ml/processing/input_data",
50 s3_input_mode="File",
51 s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56 monitor_schedule_name=monitor_schedule_name,
57 output_s3_uri=baseline_dir_uri,
58 constraints=baseline_constraints,
59 statistics=baseline_statistics,
60 schedule_cron_expression=CronExpressionGenerator.hourly(),
61 enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65'MonitoringExecutionStatus': 'Failed',
66 ...
67 'FailureReason': 'Job inputs had no data'},
68```
69{'MonitoringScheduleArn': 'arn:aws:sagemaker:ap-south-1:<my-account-id>:monitoring-schedule/ch-dq-monitor-schdl-try21',
70 'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
71 'MonitoringScheduleStatus': 'Scheduled',
72 'MonitoringType': 'DataQuality',
73 'CreationTime': datetime.datetime(2021, 9, 14, 13, 7, 31, 899000, tzinfo=tzlocal()),
74 'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 247000, tzinfo=tzlocal()),
75 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
76 'MonitoringJobDefinitionName': 'data-quality-job-definition-2021-09-14-13-07-31-483',
77 'MonitoringType': 'DataQuality'},
78 'EndpointName': 'ch-dq-nh-try21',
79 'LastMonitoringExecutionSummary': {'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
80 'ScheduledTime': datetime.datetime(2021, 9, 14, 14, 0, tzinfo=tzlocal()),
81 'CreationTime': datetime.datetime(2021, 9, 14, 14, 1, 9, 405000, tzinfo=tzlocal()),
82 'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 236000, tzinfo=tzlocal()),
83 'MonitoringExecutionStatus': 'Failed',
84 'EndpointName': 'ch-dq-nh-try21',
85 'FailureReason': 'Job inputs had no data'},
86 'ResponseMetadata': {'RequestId': 'dd729244-fde9-44b5-9904-066eea3a49bb',
87 'HTTPStatusCode': 200,
88 'HTTPHeaders': {'x-amzn-requestid': 'dd729244-fde9-44b5-9904-066eea3a49bb',
89 'content-type': 'application/x-amz-json-1.1',
90 'content-length': '835',
91 'date': 'Tue, 14 Sep 2021 14:27:53 GMT'},
92 'RetryAttempts': 0}}
93```
Possible things you might think to have gone wrong at my side or might help me fix my issue:
- dataset used for baseline: I have tried to create a baseline with the dataset with and without my target-variable(or dependent variable or y) and the error persisted both times. So, I think the error has originated because of a different reason.
- there are no log groups created for these jobs for me to look at and try debug the issue. baseline jobs have log-groups, so i presume there is no problem with roles being used for monitoring-schedule-jobs not having permissions to create a log group or stream.
- role: the role I have attached is defined by
get_execution_role()
, which points to a role with full access to sagemaker, cloudwatch, S3 and some other services. - the data collected from my endpoint during my inference: here's how a line of data of .jsonl file saved to S3, which contains data collected during inference, looks like:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6 role=get_execution_role(),
7 instance_count=1,
8 instance_type='ml.m5.large',
9 volume_size_in_gb=30,
10 max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21 baseline_dataset=baseline_data_uri,
22 dataset_format=DatasetFormat.csv(header=True),
23 output_s3_uri=baseline_results_uri,
24 wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49 destination="/opt/ml/processing/input_data",
50 s3_input_mode="File",
51 s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56 monitor_schedule_name=monitor_schedule_name,
57 output_s3_uri=baseline_dir_uri,
58 constraints=baseline_constraints,
59 statistics=baseline_statistics,
60 schedule_cron_expression=CronExpressionGenerator.hourly(),
61 enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65'MonitoringExecutionStatus': 'Failed',
66 ...
67 'FailureReason': 'Job inputs had no data'},
68```
69{'MonitoringScheduleArn': 'arn:aws:sagemaker:ap-south-1:<my-account-id>:monitoring-schedule/ch-dq-monitor-schdl-try21',
70 'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
71 'MonitoringScheduleStatus': 'Scheduled',
72 'MonitoringType': 'DataQuality',
73 'CreationTime': datetime.datetime(2021, 9, 14, 13, 7, 31, 899000, tzinfo=tzlocal()),
74 'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 247000, tzinfo=tzlocal()),
75 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
76 'MonitoringJobDefinitionName': 'data-quality-job-definition-2021-09-14-13-07-31-483',
77 'MonitoringType': 'DataQuality'},
78 'EndpointName': 'ch-dq-nh-try21',
79 'LastMonitoringExecutionSummary': {'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
80 'ScheduledTime': datetime.datetime(2021, 9, 14, 14, 0, tzinfo=tzlocal()),
81 'CreationTime': datetime.datetime(2021, 9, 14, 14, 1, 9, 405000, tzinfo=tzlocal()),
82 'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 236000, tzinfo=tzlocal()),
83 'MonitoringExecutionStatus': 'Failed',
84 'EndpointName': 'ch-dq-nh-try21',
85 'FailureReason': 'Job inputs had no data'},
86 'ResponseMetadata': {'RequestId': 'dd729244-fde9-44b5-9904-066eea3a49bb',
87 'HTTPStatusCode': 200,
88 'HTTPHeaders': {'x-amzn-requestid': 'dd729244-fde9-44b5-9904-066eea3a49bb',
89 'content-type': 'application/x-amz-json-1.1',
90 'content-length': '835',
91 'date': 'Tue, 14 Sep 2021 14:27:53 GMT'},
92 'RetryAttempts': 0}}
93```{"captureData":{"endpointInput":{"observedContentType":"application/json","mode":"INPUT","data":"{\"longitude\": [-122.32, -117.58], \"latitude\": [37.55, 33.6], \"housing_median_age\": [50.0, 5.0], \"total_rooms\": [2501.0, 5348.0], \"total_bedrooms\": [433.0, 659.0], \"population\": [1050.0, 1862.0], \"households\": [410.0, 555.0], \"median_income\": [4.6406, 11.0567]}","encoding":"JSON"},"endpointOutput":{"observedContentType":"text/html; charset=utf-8","mode":"OUTPUT","data":"eyJtZWRpYW5faG91c2VfdmFsdWUiOiBbNDUyOTU3LjY5LCA0NjcyMTQuNF19","encoding":"BASE64"}},"eventMetadata":{"eventId":"9804d438-eb4c-4cb4-8f1b-d0c832b641aa","inferenceId":"ef07163d-ea2d-4730-92f3-d755bc04ae0d","inferenceTime":"2021-09-14T13:59:03Z"},"eventVersion":"0"}
94
I would like to know what has gone wrong in this entire process, that led to data not being fed to my monitoring job.
ANSWER
Answered 2022-Feb-26 at 04:38This happens, during the ground-truth-merge job, when the spark can't find any data either in '/opt/ml/processing/groundtruth/' or '/opt/ml/processing/input_data/' directories. And that can happen when either you haven't sent any requests to the sagemaker endpoint or there are no ground truths.
I got this error because, the folder /opt/ml/processing/input_data/
of the docker volume mapped to the monitoring container had no data to process. And that happened because, the thing that facilitates entire process, including fetching data couldn't find any in S3. and that happened because, there was an extra slash(/
) in the directory to which endpoint's captured-data will be saved. to elaborate, while creating the endpoint, I had mentioned the directory as s3://<bucket-name>/<folder-1>/
, while it should have just been s3://<bucket-name>/<folder-1>
. so, while the thing that copies data from S3 to docker volume tried to fetch data of that hour, the directory it tried to extract the data from was s3://<bucket-name>/<folder-1>//<endpoint-name>/<variant-name>/<year>/<month>/<date>/<hour>
(notice the two slashes). So, when I created the endpoint-configuration again with the slash removed in S3 directory, this error wasn't present and ground-truth-merge operation was successful as part of model-quality-monitoring.
I am answering this question because, someone read the question and upvoted it. meaning, someone else has faced this problem too. so, I have mentioned what worked for me. And I wrote this, so that StackExchange doesn't think I am spamming the forum with questions.
QUESTION
Add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes
Asked 2022-Feb-13 at 20:24I run prometheus locally as http://localhost:9090/targets with
1docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
2
and want to connect it to several Kubernetes (cluster) instances we have. See that scraping works, try Grafana dashboards etc.
And then I'll do the same on dedicated server that will be specially for monitoring. However all googling gives me all different ways to configure prometheus that is already within one Kubernetes instance, and no way to read metrics from external Kubernetes.
How to add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes?
I have read Where Kubernetes metrics come from and checked that my (first) Kubernetes cluster has the Metrics Server.
1docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
2kubectl get pods --all-namespaces | grep metrics-server
3
There is definitely no sense to add Prometheus instance into every Kubernetes (cluster) instance. One Prometheus must be able to read metrics from many Kubernetes clusters and every node within them.
P.S. Some old question has answer to install Prometheus in every Kubernetes and then use federation, that is just opposite from what I am looking for.
P.P.S. It is also strange for me, why Kubernetes and Prometheus that are #1 and #2 projects from Cloud Native Foundation don't have simple "add Kubernetes target in Prometheus" button or simple step.
ANSWER
Answered 2021-Dec-28 at 08:33There are many agents capable of saving metrics collected in k8s to remote Prometheus server outside the cluster, example Prometheus itself now support agent mode, exporter from Opentelemetry, or using managed Prometheus etc.
QUESTION
getifaddrs returning 'bad file descriptor'/crashing the application
Asked 2022-Jan-18 at 10:47In my program, I have a thread which has to continuously monitor the network interfaces therefore it continuosly uses getifaddrs() in a while loop.
1 while(true) {
2
3 struct ifaddrs *ifaddr, *ifa;
4 if (getifaddrs(&ifaddr) == -1) {
5 perror("getifaddrs couldn't fetch required data");
6 exit(EXIT_FAILURE);
7 }
8
9 //Iterate through interfaces linked list
10 for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
11 //monitoring logic
12 }
13
14 //Free linked list
15 freeifaddrs(ifaddr);
16
17 //Sleep for specified time fo next polling cycle
18 usleep(1000);
19
20 }
21
Most of the time my program works fine. However, sometimes getifaddrs() returns -1 and errNo = EBADF(bad file descriptor). In order to not exit my thread, I have temporarily replaced exit with continue(as I don't want my program to end due to this). However, I'm curious to know in which cases can getifaddrs() return 'bad file descriptor' error and whether I can do something so that this does not happen?
EDIT
replacing 'exit' with 'continue' didn't solve my problem. Sometimes the call to getifaddrs() is crashing the application!
Given below is the backtrace obtained from gdb using the generated core file.
1 while(true) {
2
3 struct ifaddrs *ifaddr, *ifa;
4 if (getifaddrs(&ifaddr) == -1) {
5 perror("getifaddrs couldn't fetch required data");
6 exit(EXIT_FAILURE);
7 }
8
9 //Iterate through interfaces linked list
10 for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
11 //monitoring logic
12 }
13
14 //Free linked list
15 freeifaddrs(ifaddr);
16
17 //Sleep for specified time fo next polling cycle
18 usleep(1000);
19
20 }
21Program terminated with signal 6, Aborted.
22#0 0x00007fe2df1ef387 in raise () from /lib64/libc.so.6
23Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
24(gdb) bt
25#0 0x00007fe2df1ef387 in raise () from /lib64/libc.so.6
26#1 0x00007fe2df1f0a78 in abort () from /lib64/libc.so.6
27#2 0x00007fe2df231ed7 in __libc_message () from /lib64/libc.so.6
28#3 0x00007fe2df231fbe in __libc_fatal () from /lib64/libc.so.6
29#4 0x00007fe2df2df4c2 in __netlink_assert_response () from /lib64/libc.so.6
30#5 0x00007fe2df2dc412 in __netlink_request () from /lib64/libc.so.6
31#6 0x00007fe2df2dc5ef in getifaddrs_internal () from /lib64/libc.so.6
32#7 0x00007fe2df2dd310 in getifaddrs () from /lib64/libc.so.6
33#8 0x000000000047c03c in __interceptor_getifaddrs.part.0 ()
34
Operating system: Red Hat Enterprise Linux Server release 7.8 (Maipo)
GLIBC version: 2.17
ANSWER
Answered 2021-Dec-06 at 08:59According to man7.org getifaddrs, any of the socket operations could be a cause for EBADF
ERRORS
getifaddrs() may fail and set errno for any of the errors specified for socket(2), bind(2), getsockname(2), recvmsg(2), sendto(2), malloc(3), or realloc(3).
Unrelated, but do you do freeifaddrs()
somewhere?
QUESTION
Filter the parts of a Request Path which match against a Static Segment in Servant
Asked 2022-Jan-02 at 18:53Supposing I'm running a Servant webserver, with two endpoints, with a type looking like this:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11
I might want to use something like Network.Wai.Middleware.Prometheus's instrumentHandlerValue to generate a Prometheus metric that fire's every time this API is called, with a handler value set to the path of the request.
However, if I do something like the following:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12
This is bad, because different requests to the book/<Name>
endpoint, such as book/great-expectations
and book/vanity-fair
result in different labels, this is fine if the number of books is small, but if it's very large then the amount of data used by these metrics is very big, and either my service falls over, or my monitoring bill becomes very large.
I'd quite like a function, that took a Servant API, and a Wai Request, and if it matched, returned a list of segments in a form that was the same for each endpoint.
That is requests to /books
would return Just ["books"]
, requests to /book/little-dorrit
would return Just ["book", "Name"]
, and requests to /films
would return Nothing
.
I can kind of see how you might go about writing this by pattern matching on Router'
from Servant.Server.Internal.Router, but it's not clear to me that relying on an internal package in order to do this is a good idea.
Is there a better way?
ANSWER
Answered 2022-Jan-02 at 18:53The pathInfo
function returns all the path segments for a Request
. Perhaps we could define a typeclass that, given a Servant API, produced a "parser" for the list of segments, whose result would be a formatted version of the list.
The parser type could be something like:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17
Where the first [Text]
in the state are the path segments yet to be parsed, and the second are the formatted path segments we have accumulated so far.
This type has an Alternative
instance where failure discards state (basically backtracking) and a MonadFail
instance that returns mzero
on pattern-match failure inside do
-blocks.
The typeclass:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29 pathParser :: Proxy x -> PathParser
30
The instance for Symbol
moves the path piece from the pending list to the processed list:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29 pathParser :: Proxy x -> PathParser
30instance KnownSymbol piece => HasPathParser (piece :: Symbol) where
31 pathParser _ = do
32 (piece : rest, found) <- get -- we are using MonadFail here
33 guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34 put (rest, piece : found)
35
The instance for Capture
puts the name of the path variable—not the value—on the processed list:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29 pathParser :: Proxy x -> PathParser
30instance KnownSymbol piece => HasPathParser (piece :: Symbol) where
31 pathParser _ = do
32 (piece : rest, found) <- get -- we are using MonadFail here
33 guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34 put (rest, piece : found)
35instance KnownSymbol name => HasPathParser (Capture name x) where
36 pathParser _ = do
37 (_ : rest, found) <- get -- we are using MonadFail here
38 put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29 pathParser :: Proxy x -> PathParser
30instance KnownSymbol piece => HasPathParser (piece :: Symbol) where
31 pathParser _ = do
32 (piece : rest, found) <- get -- we are using MonadFail here
33 guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34 put (rest, piece : found)
35instance KnownSymbol name => HasPathParser (Capture name x) where
36 pathParser _ = do
37 (_ : rest, found) <- get -- we are using MonadFail here
38 put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39instance HasPathParser (Verb method statusCode contextTypes a) where
40 pathParser _ = do
41 ([], found) <- get -- we are using MonadFail here
42 put ([], found)
43
Some other instances:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29 pathParser :: Proxy x -> PathParser
30instance KnownSymbol piece => HasPathParser (piece :: Symbol) where
31 pathParser _ = do
32 (piece : rest, found) <- get -- we are using MonadFail here
33 guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34 put (rest, piece : found)
35instance KnownSymbol name => HasPathParser (Capture name x) where
36 pathParser _ = do
37 (_ : rest, found) <- get -- we are using MonadFail here
38 put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39instance HasPathParser (Verb method statusCode contextTypes a) where
40 pathParser _ = do
41 ([], found) <- get -- we are using MonadFail here
42 put ([], found)
43instance HasPathParser (ReqBody x y) where
44 pathParser _ = pure ()
45
46instance (HasPathParser a, HasPathParser b) => HasPathParser (a :> b) where
47 pathParser _ = pathParser (Proxy @a) *> pathParser (Proxy @b)
48
49instance (HasPathParser a, HasPathParser b) => HasPathParser (a :<|> b) where
50 pathParser _ = pathParser (Proxy @a) <|> pathParser (Proxy @b)
51
Putting it to work:
1type BookAPI =
2 "books" :> Get '[JSON] (Map Text Text)
3 :<|> "book" :> Capture "Name" Text :> ReqBody '[JSON] (Text) :> Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│ └─ <capture>/
8│ └─•
9└─ books/
10 └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate "\\" . pathInfo)
12import Data.Text
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29 pathParser :: Proxy x -> PathParser
30instance KnownSymbol piece => HasPathParser (piece :: Symbol) where
31 pathParser _ = do
32 (piece : rest, found) <- get -- we are using MonadFail here
33 guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34 put (rest, piece : found)
35instance KnownSymbol name => HasPathParser (Capture name x) where
36 pathParser _ = do
37 (_ : rest, found) <- get -- we are using MonadFail here
38 put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39instance HasPathParser (Verb method statusCode contextTypes a) where
40 pathParser _ = do
41 ([], found) <- get -- we are using MonadFail here
42 put ([], found)
43instance HasPathParser (ReqBody x y) where
44 pathParser _ = pure ()
45
46instance (HasPathParser a, HasPathParser b) => HasPathParser (a :> b) where
47 pathParser _ = pathParser (Proxy @a) *> pathParser (Proxy @b)
48
49instance (HasPathParser a, HasPathParser b) => HasPathParser (a :<|> b) where
50 pathParser _ = pathParser (Proxy @a) <|> pathParser (Proxy @b)
51main :: IO ()
52main = do
53 do let Just ([], result) = execStateT (pathParser (Proxy @BookAPI)) (["books"],[])
54 print result
55 -- ["books"]
56 do let Just ([], result) = execStateT (pathParser (Proxy @BookAPI)) (["book", "somebookid"],[])
57 print result
58 -- ["Name","book"]
59
QUESTION
CloudScheduler 403 Permission denied while creating
Asked 2021-Dec-22 at 05:07I am trying to create a Cron
job programmatically in the CloudScheduler
Google Cloud Platform using the following API explorer.
Reference: Cloud Scheduler Documentation
Even though I have given the user Owner
permission and verified it in Policy Troubleshooter
that it has cloudscheduler.jobs.create
, I am still getting the following error.
1{
2 "error": {
3 "code": 403,
4 "message": "The principal (user or service account) lacks IAM permission \"cloudscheduler.jobs.create\" for the resource \"projects/cloud-monitoring-saurav/locations/us-central\" (or the resource may not exist).",
5 "status": "PERMISSION_DENIED"
6 }
7}
8
ANSWER
Answered 2021-Dec-16 at 14:42The error is caused by using a service account that does not have an IAM role that includes the permission cloudscheduler.jobs.create
. An example role is roles/cloudscheduler.admin
aka Cloud Scheduler Admin
. I have the feeling that you have mixed the permission of the service account that you use with Cloud Scheduler (at runtime, when a job triggers something) and the permission of the account currently creating the job (aka your account for example).
You actually need two service accounts for the job to get created. You need one that you set up yourself (can be whatever name you like and doesn't require any special permissions) and you also need the one for the default Cloud Scheduler itself ( which is managed by Google)
Use an existing service account to be used for the call from Cloud Scheduler to your HTTP target or you can create a new service account for this purpose. The service account must belong to the same project as the one in which the Cloud Scheduler jobs are created. This is the client service account. Use this one when specifying the service account to generate the OAuth / OICD tokens. If your target is part of Google Cloud, like Cloud Functions/Cloud Run update your client service account by granting it the necessary IAM role (Cloud function invoker for cloud functions and Cloud Run Invoker for Cloud Run).The receiving service automatically verifies the generated token. If your target is outside of Google Cloud, the receiving service must manually verify the token.
The other service account is the default Cloud Scheduler service account which must also be present in your project and have the Cloud Scheduler Service Agent role granted to it. This is so it can generate header tokens on behalf of your client service account to authenticate to your target. The Cloud Scheduler service account with this role granted is automatically set up when you enable the Cloud Scheduler API, unless you enabled it prior to March 19, 2019, in which case you must add the role manually.
Note : Do not remove the service-YOUR_PROJECT_NUMBER@gcp-sa-cloudscheduler.iam.gserviceaccount.com
service account from your project, or its Cloud Scheduler Service Agent role
. Doing so will result in 403 responses to endpoints requiring authentication, even if your job's service account has the appropriate role.
QUESTION
Elastic Beanstalk deleting generated files on config changes
Asked 2021-Dec-20 at 23:51On Elastic Beanstalk, with an AWS Linux 2 based environment, updating the Environment Properties (i.e. environment variables) of an environment causes all generated files to be deleted. It also doesn't run container_commands as part of this update.
So, for example, I have a Django project with collectstatic
in the container commands:
105_collectstatic:
2 command: |
3 source $PYTHONPATH/activate
4 python manage.py collectstatic --noinput --ignore *.scss
5
This collects static files to a folder called staticfiles
as part of deploy. But when I do an environment variable update, staticfiles
is deleted. This causes all static files on the application to be broken until I re-deploy, which is extremely undesirable.
This behavior did not occur on AWS Linux 1 based environments. The difference appears to be that AWS Linux 2 based environments replace the /var/app/current
folder during environment variable changes, where AWS Linux 1 based environments did not do this.
How do I fix this?
ResearchI can verify that the container commands are not being run during an environment variable change by monitoring /var/log/cfn-init.log
; no new entries are added to this log.
This happens with both rolling update type "disabled" and "immutable".
This happens even if I convert the environment command to be a platform hook, despite the fact that hooks are listed as running when environment properties are updated.
It seems to me like there are two potential solutions, but I don't know of an Elastic Beanstalk setting for either:
- Have environment variable changes leave
/var/app/current
rather than replacing it. - Have environment variable changes run container commands.
The Elastic Beanstalk docs on container commands say "Leader-only container commands are only executed during environment creation and deployments, while other commands and server customization operations are performed every time an instance is provisioned or updated." Is this a bug in Elastic Beanstalk?
Related question: EB: Trigger container commands / deploy scripts on configuration change
ANSWER
Answered 2021-Dec-20 at 23:51The solution is to use a Configuration deployment platform hook for any commands that change the files in the deployment directory. Note that this is different from an Application deployment platform hook.
Using the example of the collectstatic command, the best thing to do is to move it from a container command to a pair of hooks, one for standard deployments and one for configuration changes.
To do this, remove the collectstatic container command. Then, make two identical files:
.platform/confighooks/predeploy/predeploy.sh
.platform/hooks/predeploy/predeploy.sh
Each file should have the following code:
105_collectstatic:
2 command: |
3 source $PYTHONPATH/activate
4 python manage.py collectstatic --noinput --ignore *.scss
5#!/bin/bash
6source $PYTHONPATH/activate
7python manage.py collectstatic --noinput --ignore *.scss
8
You need two seemingly redundant files because different hooks have different trigger conditions. Scripts in hooks
run when you deploy the app whereas scripts in confighooks
run when you change the configuration of the app.
Make sure to make both of these files executable according to git or else you will run into a "permission denied" error when you try to deploy. You can check if they are executable via git ls-files -s .platform
; you should see 100755
before any shell files in the output of this command. If you see 100644
before any of your shell files, run git add --chmod=+x -- .platform/*/*/*.sh
to make them executable.
QUESTION
GCP alert if Docker container exits
Asked 2021-Dec-16 at 07:46We are using Google Cloud Platform and its service Compute Engine. We have Docker installed on one of our VM instances (which is part of Compute Engine). Can we set up an alert in GCP Monitoring to be triggered when a container exits?
ANSWER
Answered 2021-Dec-16 at 07:46Yes. You have to configure docker with Cloud logging driver, create a log based metric and an alerting policy on that metric.
Solution:
Configure docker with cloud logging driver.
SSH to your instance.
Run this command as root:
dockerd --log-driver=gcplogs
. This will forward your docker logs to Google Cloud Logging.If running a container-optimized OS then follow these steps:
echo '{"log-driver":"gcplogs"}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker
Try starting and exiting a container.
These are the logs generated whenever we exit a container. Keep in mind the two exit messages
Create a log based metric
Go to Logging -> Select Log-based Metrics.
Click on Create Metric.
Metric Type: Counter. In Details, enter a Log Metric name (e.g. mysite-container-exited)
Under Filter Selection -> Build filter, copy the following code
resource.type="gce_instance"
log_name="projects/myproject-12345/logs/gcplogs-docker-driver"
replace myproject-12345 with your project name.jsonPayload.container.name="/mysite"
change mysite to your container name.jsonPayload.message=~"exit$"
This is regex that matches exit as the last word in a line.
It should look something like this.
Create an alerting policy
- Go to Monitoring -> Select Alerting.
- Click on Create Policy.
- Click on Select a Metric and search for your metric name (e.g. mysite-container-exited).
- Select your metric and click on Apply.
- Set Rolling window to 2 min and Rolling window function to count. Click Next.
- Set Alert Trigger to Any time series violates, Threshold postion to Above threshold and Threshold value to 1. Click Next.
- Select a notification Channel to send alerts to. If none exists then click on Manage Notification Channels and create one.
- Easiest one would be to add an email for notification channel.
- After creating go back, click on Refresh icon in Notification Channels and select it. Click ok.
- Click on Save Policy.
As we have seen that two exit commands are issued per container to the logs whenever we exit a container, thats's why the threshold is set to above 1.
You can monitor more containers by creating a new metric and changing the value of jsonPayload.container.name="/mysite"
to your container name.
Now try to start and stop the container. You will receive an alert on email in 2-3 minutes.
QUESTION
How to make Spring Boot Actuator available via JMX with globally enabled lazy initialization?
Asked 2021-Dec-13 at 22:11In our Spring Boot 2.4+ based applications we need to have configured that initialization should be performed lazily in application.properties
:
1spring.main.lazy-initialization=true
2spring.jmx.enabled=true
3
However with such settings Actuator end-points cannot be reached via JMX.
This is a blocker now when we are migrating to Instana monitoring, which requires org.springframework.boot:type=Endpoint,name=Metrics
and org.springframework.boot:type=Endpoint,name=Health
MBeans to be available via JMX.
Is there a way to keep lazy initialization enabled but at the same Actuator accessible via JMX, please?
ANSWER
Answered 2021-Oct-14 at 11:59This is a bug in Spring Boot for which I've just opened an issue. Thanks for bringing it to our attention.
You can work around the problem by excluding the bean that exports the endpoints to JMX from lazy initialization. To do so, add the following bean to your application:
1spring.main.lazy-initialization=true
2spring.jmx.enabled=true
3@Bean
4LazyInitializationExcludeFilter eagerJmxEndpointExport() {
5 return LazyInitializationExcludeFilter.forBeanTypes(JmxEndpointExporter.class);
6}
7
QUESTION
Monitor language specific API Calls for Microsoft Translator API
Asked 2021-Dec-03 at 06:23Is there any way to monitor the Language Specific USAGE of Translator API (e.g. xx number of characters for english
to japanese
for Microsoft Translator API. I couldn't see any such metrics in the Metrics Monitoring Section.
ANSWER
Answered 2021-Nov-22 at 10:06Currently it's not available, We have added to Azure metrics feature request to the Translator service and it will be added in the near future.
QUESTION
switchMap combined with mergeMap
Asked 2021-Dec-01 at 12:46I have an Observable where each new value should cause an HTTP request. On the client-side I only care about the latest response value; however, I want every request to complete for monitoring/etc. purposes.
What I currently have is something like:
1function simulate(x) {
2 // Simulate an HTTP request.
3 return of(x).pipe(delay(6));
4}
5
6source$.pipe(
7 someMapFunc(x => simulate(x)),
8);
9
When I use switchMap
for the someMapFunc
, I get the right set of responses (only the latest). However, if the request is taking too long, it will get canceled.
When I use mergeMap
instead, I get the right set of requests (every request completes), but I get the wrong set of responses (every single one).
Is there a way to get the requests of mergeMap
with the responses of switchMap
? I know I can write this as a custom operator, but I'm wondering if I can build this out of existing/standard rxjs operators. To summarize what I'm thinking of:
- a version of
switchMap
that doesn't unsubscribe when it switches; - a version of
mergeMap
that only emits values from the latest inner Observable.
Edit: Based on the accepted answer, I was able to get the following, which works:
1function simulate(x) {
2 // Simulate an HTTP request.
3 return of(x).pipe(delay(6));
4}
5
6source$.pipe(
7 someMapFunc(x => simulate(x)),
8);
9function orderedMergeMap(project) {
10 return (s) => defer(() => {
11 let recent = 0;
12 return s.pipe(
13 mergeMap((data, idx) => {
14 recent = idx;
15 return project(data).pipe(filter(() => idx === recent));
16 })
17 );
18 });
19}
20
ANSWER
Answered 2021-Nov-23 at 01:01I believe that you need a combination of concatMap()
and last()
.
concatMap
does not subscribe to the next observable until the previous completes. Using it you will ensure the order of requests execution. And as it follows from the description it doesn't cancel previous subscriptions and let them finish, unlike switchMap
.
last
emits the last value emitted from the source on completion. Using it you will ensure that only one (last) result will be passed to the result.
Your code will look like that:
1function simulate(x) {
2 // Simulate an HTTP request.
3 return of(x).pipe(delay(6));
4}
5
6source$.pipe(
7 someMapFunc(x => simulate(x)),
8);
9function orderedMergeMap(project) {
10 return (s) => defer(() => {
11 let recent = 0;
12 return s.pipe(
13 mergeMap((data, idx) => {
14 recent = idx;
15 return project(data).pipe(filter(() => idx === recent));
16 })
17 );
18 });
19}
20source$.pipe(
21 concatMap(x => simulate(x)),
22 last()
23);
24
Community Discussions contain sources that include Stack Exchange Network
Tutorials and Learning Resources in Monitoring
Tutorials and Learning Resources are not available at this moment for Monitoring