This is related to Monitoring

Popular New Releases in Monitoring

netdata

v1.34.1

sentry

22.4.0

skywalking

9.0.0

osquery

uptime-kuma

Popular Libraries in Monitoring

netdata

by netdata doticoncdoticon

star image 58912 doticonGPL-3.0

Real-time performance monitoring, done right! https://www.netdata.cloud

sentry

by getsentry doticonpythondoticon

star image 30691 doticonNOASSERTION

Sentry is cross-platform application monitoring, with a focus on error reporting.

skywalking

by apache doticonjavadoticon

star image 19150 doticonApache-2.0

APM, Application Performance Monitoring System

osquery

by osquery doticonc++doticon

star image 18789 doticonNOASSERTION

SQL powered operating system instrumentation, monitoring, and analytics.

cat

by dianping doticonjavadoticon

star image 16597 doticonApache-2.0

CAT 作为服务端项目基础组件,提供了 Java, C/C++, Node.js, Python, Go 等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据库框架,缓存框架等,消息队列,配置系统等)深度集成,为美团点评各业务线提供系统丰富的性能指标、健康状况、实时告警等。

uptime-kuma

by louislam doticonjavascriptdoticon

star image 16089 doticonMIT

A fancy self-hosted monitoring tool

goreplay

by buger doticongodoticon

star image 15413 doticonNOASSERTION

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes.

zipkin

by openzipkin doticonjavadoticon

star image 15220 doticonApache-2.0

Zipkin is a distributed tracing system

goaccess

by allinurl doticoncdoticon

star image 14533 doticonMIT

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.

Trending New libraries in Monitoring

uptime-kuma

by louislam doticonjavascriptdoticon

star image 16089 doticonMIT

A fancy self-hosted monitoring tool

howtheysre

by upgundecha doticonjavascriptdoticon

star image 6632 doticonCC0-1.0

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

signoz

by SigNoz doticontypescriptdoticon

star image 6127 doticonMIT

SigNoz is an open-source APM. It helps developers monitor their applications & troubleshoot problems, an open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

pyroscope

by pyroscope-io doticongodoticon

star image 5738 doticonApache-2.0

Continuous Profiling Platform. Debug performance issues down to a single line of code

nightingale

by didi doticongodoticon

star image 4433 doticonApache-2.0

An enterprise-level cloud-native monitoring system, which can be used as drop-in replacement of Prometheus for alerting and management.

SmsForwarder

by pppscn doticonjavadoticon

star image 3181 doticonBSD-2-Clause

短信转发器——监控Android手机短信、来电、APP通知,并根据指定规则转发到其他手机:钉钉机器人、企业微信群机器人、飞书机器人、企业微信应用消息、邮箱、bark、webhook、Telegram机器人、Server酱、PushPlus、手机短信等。PS.这个APK主要是学习与自用,如有BUG请提ISSUE,同时欢迎大家提PR指正

go-profiler-notes

by DataDog doticonjupyter notebookdoticon

star image 2336 doticonCC-BY-SA-4.0

felixge's notes on the various go profiling methods that are available.

fgprof

by felixge doticongodoticon

star image 1814 doticonMIT

🚀 fgprof is a sampling Go profiler that allows you to analyze On-CPU as well as Off-CPU (e.g. I/O) time together.

statsviz

by arl doticongodoticon

star image 1766 doticonMIT

:rocket: Instant live visualization of your Go application runtime statistics (GC, MemStats, etc.) in the browser

Top Authors in Monitoring

1

newrelic

67 Libraries

star icon2876

2

sensu-plugins

60 Libraries

star icon631

3

getsentry

49 Libraries

star icon45796

4

sensu

34 Libraries

star icon5453

5

Appdynamics

32 Libraries

star icon157

6

Icinga

26 Libraries

star icon3131

7

microsoft

25 Libraries

star icon2626

8

xfce-mirror

21 Libraries

star icon204

9

DataDog

16 Libraries

star icon8206

10

m4ce

14 Libraries

star icon78

1

67 Libraries

star icon2876

2

60 Libraries

star icon631

3

49 Libraries

star icon45796

4

34 Libraries

star icon5453

5

32 Libraries

star icon157

6

26 Libraries

star icon3131

7

25 Libraries

star icon2626

8

21 Libraries

star icon204

9

16 Libraries

star icon8206

10

14 Libraries

star icon78

Trending Kits in Monitoring

Here are some famous Java Monitoring Libraries. Some Java Monitoring Libraries' use cases include Performance Monitoring, Memory Usage Monitoring, Error Reporting, Metrics Collection, and Logging.


Java monitoring Libraries offers deep application performance monitoring with byte-code instrumentation for Java applications in development, QA, and production environments. Ensure that your end users have a satisfactory experience by simulating their journey through the applications.


Let us have a look at these libraries in detail below. 

pinpoint 

  • Real-time monitoring of your application. 
  • Get transaction visibility at the code level. 
  • Without altering a single line of code, install APM Agents. 

javamelody

  • Measure and calculate statistics on real operation of an application depending on the usage of the application by users.
  • Give facts about the average response times and number of executions.
  • Verify the real improvement after optimizations.

zabbix 

  • Enables you to collect metrics. 
  • Offers its users a variety of clever, adjustable threshold defining choices. 
  • Functions and operations that are supported for the statistical analysis of historical data. 

visualvm

  • Monitors and troubleshoots applications running on Java 1.4+.
  • Perfectly fits all the requirements of application developers, system administrators, and end users.
  • Provides detailed information about Java applications while they are running on the Java Virtual Machine.

log4j 

  • Supports multiple output appenders per logger. 
  • Not limited to a set of predetermined facilities. 
  • Seed-optimized and thread-safe. 

JHiccup

  • Allows developers and system operators to easily create and analyze response time profiles.
  • Unique in looking at underlying platform.
  • Helps to avoid common pitfall in application performance characterization.

jamonapi

  • Rich feature set to support encapsulation, parameterization, and reuse of presentation logic.
  • Declare the arguments required for rendering.
  • Allows passing arbitrary objects between templates.

relics 

  • Being able to monitor measurements and performance under pressure. 
  • Enhance output and identify bottlenecks. 
  • The capacity to probe deeply and evaluate the code stack. 

Trending Discussions on Monitoring

How to fix SageMaker data-quality monitoring-schedule job that fails with 'FailureReason': 'Job inputs had no data'

Add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes

getifaddrs returning 'bad file descriptor'/crashing the application

Filter the parts of a Request Path which match against a Static Segment in Servant

CloudScheduler 403 Permission denied while creating

Elastic Beanstalk deleting generated files on config changes

GCP alert if Docker container exits

How to make Spring Boot Actuator available via JMX with globally enabled lazy initialization?

Monitor language specific API Calls for Microsoft Translator API

switchMap combined with mergeMap

QUESTION

How to fix SageMaker data-quality monitoring-schedule job that fails with 'FailureReason': 'Job inputs had no data'

Asked 2022-Feb-26 at 04:38

I am trying to schedule a data-quality monitoring job in AWS SageMaker by following steps mentioned in this AWS documentation page. I have enabled data-capture for my endpoint. Then, trained a baseline on my training csv file and statistics and constraints are available in S3 like this:

1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6    role=get_execution_role(),
7    instance_count=1,
8    instance_type='ml.m5.large',
9    volume_size_in_gb=30,
10    max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to 
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21    baseline_dataset=baseline_data_uri,
22    dataset_format=DatasetFormat.csv(header=True),
23    output_s3_uri=baseline_results_uri,
24    wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26

and data is available in S3: enter image description here

Then I tried scheduling a monitoring job by following this example notebook for model-quality-monitoring in sagemaker-examples github repo, to schedule my data-quality-monitoring job by making necessary modifications with feedback from error messages.

Here's how tried to schedule the data-quality monitoring job from SageMaker Studio:

1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6    role=get_execution_role(),
7    instance_count=1,
8    instance_type='ml.m5.large',
9    volume_size_in_gb=30,
10    max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to 
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21    baseline_dataset=baseline_data_uri,
22    dataset_format=DatasetFormat.csv(header=True),
23    output_s3_uri=baseline_results_uri,
24    wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to 
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49                         destination="/opt/ml/processing/input_data",
50                         s3_input_mode="File",
51                         s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56                                           monitor_schedule_name=monitor_schedule_name,
57                                           output_s3_uri=baseline_dir_uri,
58                                           constraints=baseline_constraints,
59                                           statistics=baseline_statistics,
60                                           schedule_cron_expression=CronExpressionGenerator.hourly(),
61                                           enable_cloudwatch_metrics=True)
62

after an hour or so, when I check the status of the schedule like this:

1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6    role=get_execution_role(),
7    instance_count=1,
8    instance_type='ml.m5.large',
9    volume_size_in_gb=30,
10    max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to 
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21    baseline_dataset=baseline_data_uri,
22    dataset_format=DatasetFormat.csv(header=True),
23    output_s3_uri=baseline_results_uri,
24    wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to 
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49                         destination="/opt/ml/processing/input_data",
50                         s3_input_mode="File",
51                         s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56                                           monitor_schedule_name=monitor_schedule_name,
57                                           output_s3_uri=baseline_dir_uri,
58                                           constraints=baseline_constraints,
59                                           statistics=baseline_statistics,
60                                           schedule_cron_expression=CronExpressionGenerator.hourly(),
61                                           enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65

I get failed status like below:

1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6    role=get_execution_role(),
7    instance_count=1,
8    instance_type='ml.m5.large',
9    volume_size_in_gb=30,
10    max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to 
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21    baseline_dataset=baseline_data_uri,
22    dataset_format=DatasetFormat.csv(header=True),
23    output_s3_uri=baseline_results_uri,
24    wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to 
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49                         destination="/opt/ml/processing/input_data",
50                         s3_input_mode="File",
51                         s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56                                           monitor_schedule_name=monitor_schedule_name,
57                                           output_s3_uri=baseline_dir_uri,
58                                           constraints=baseline_constraints,
59                                           statistics=baseline_statistics,
60                                           schedule_cron_expression=CronExpressionGenerator.hourly(),
61                                           enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65'MonitoringExecutionStatus': 'Failed',
66  ...
67  'FailureReason': 'Job inputs had no data'},
68

Entire Message:

1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6    role=get_execution_role(),
7    instance_count=1,
8    instance_type='ml.m5.large',
9    volume_size_in_gb=30,
10    max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to 
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21    baseline_dataset=baseline_data_uri,
22    dataset_format=DatasetFormat.csv(header=True),
23    output_s3_uri=baseline_results_uri,
24    wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to 
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49                         destination="/opt/ml/processing/input_data",
50                         s3_input_mode="File",
51                         s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56                                           monitor_schedule_name=monitor_schedule_name,
57                                           output_s3_uri=baseline_dir_uri,
58                                           constraints=baseline_constraints,
59                                           statistics=baseline_statistics,
60                                           schedule_cron_expression=CronExpressionGenerator.hourly(),
61                                           enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65'MonitoringExecutionStatus': 'Failed',
66  ...
67  'FailureReason': 'Job inputs had no data'},
68```
69{'MonitoringScheduleArn': 'arn:aws:sagemaker:ap-south-1:<my-account-id>:monitoring-schedule/ch-dq-monitor-schdl-try21',
70 'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
71 'MonitoringScheduleStatus': 'Scheduled',
72 'MonitoringType': 'DataQuality',
73 'CreationTime': datetime.datetime(2021, 9, 14, 13, 7, 31, 899000, tzinfo=tzlocal()),
74 'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 247000, tzinfo=tzlocal()),
75 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
76  'MonitoringJobDefinitionName': 'data-quality-job-definition-2021-09-14-13-07-31-483',
77  'MonitoringType': 'DataQuality'},
78 'EndpointName': 'ch-dq-nh-try21',
79 'LastMonitoringExecutionSummary': {'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
80  'ScheduledTime': datetime.datetime(2021, 9, 14, 14, 0, tzinfo=tzlocal()),
81  'CreationTime': datetime.datetime(2021, 9, 14, 14, 1, 9, 405000, tzinfo=tzlocal()),
82  'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 236000, tzinfo=tzlocal()),
83  'MonitoringExecutionStatus': 'Failed',
84  'EndpointName': 'ch-dq-nh-try21',
85  'FailureReason': 'Job inputs had no data'},
86 'ResponseMetadata': {'RequestId': 'dd729244-fde9-44b5-9904-066eea3a49bb',
87  'HTTPStatusCode': 200,
88  'HTTPHeaders': {'x-amzn-requestid': 'dd729244-fde9-44b5-9904-066eea3a49bb',
89   'content-type': 'application/x-amz-json-1.1',
90   'content-length': '835',
91   'date': 'Tue, 14 Sep 2021 14:27:53 GMT'},
92  'RetryAttempts': 0}}
93```

Possible things you might think to have gone wrong at my side or might help me fix my issue:

  1. dataset used for baseline: I have tried to create a baseline with the dataset with and without my target-variable(or dependent variable or y) and the error persisted both times. So, I think the error has originated because of a different reason.
  2. there are no log groups created for these jobs for me to look at and try debug the issue. baseline jobs have log-groups, so i presume there is no problem with roles being used for monitoring-schedule-jobs not having permissions to create a log group or stream.
  3. role: the role I have attached is defined by get_execution_role(), which points to a role with full access to sagemaker, cloudwatch, S3 and some other services.
  4. the data collected from my endpoint during my inference: here's how a line of data of .jsonl file saved to S3, which contains data collected during inference, looks like:
1from sagemaker import get_execution_role
2from sagemaker import image_uris
3from sagemaker.model_monitor.dataset_format import DatasetFormat
4
5my_data_monitor = DefaultModelMonitor(
6    role=get_execution_role(),
7    instance_count=1,
8    instance_type='ml.m5.large',
9    volume_size_in_gb=30,
10    max_runtime_in_seconds=3_600)
11
12# base s3 directory
13baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
14# train data, that I have used to generate baseline
15baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
16# directory in s3 bucket that I have stored my baseline results to 
17baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
18
19
20my_data_monitor.suggest_baseline(
21    baseline_dataset=baseline_data_uri,
22    dataset_format=DatasetFormat.csv(header=True),
23    output_s3_uri=baseline_results_uri,
24    wait=True, logs=False, job_name='ch-dq-baseline-try21'
25)
26from sagemaker import get_execution_role
27from sagemaker.model_monitor import EndpointInput
28from sagemaker import image_uris
29from sagemaker.model_monitor import CronExpressionGenerator
30from sagemaker.model_monitor import DefaultModelMonitor
31from sagemaker.model_monitor.dataset_format import DatasetFormat
32
33# base s3 directory
34baseline_dir_uri = 's3://api-trial/data_quality_no_headers/'
35
36# train data, that I have used to generate baseline
37baseline_data_uri = baseline_dir_uri + 'ch_train_no_target.csv'
38
39# directory in s3 bucket that I have stored my baseline results to 
40baseline_results_uri = baseline_dir_uri + 'baseline_results_try17/'
41# s3 locations of baseline job outputs
42baseline_statistics = baseline_results_uri + 'statistics.json'
43baseline_constraints = baseline_results_uri + 'constraints.json'
44
45# directory in s3 bucket that I would like to store results of monitoring schedules in
46monitoring_outputs = baseline_dir_uri + 'monitoring_results_try17/'
47
48ch_dq_ep = EndpointInput(endpoint_name=myendpoint_name,
49                         destination="/opt/ml/processing/input_data",
50                         s3_input_mode="File",
51                         s3_data_distribution_type="FullyReplicated")
52
53monitor_schedule_name='ch-dq-monitor-schdl-try21'
54
55my_data_monitor.create_monitoring_schedule(endpoint_input=ch_dq_ep,
56                                           monitor_schedule_name=monitor_schedule_name,
57                                           output_s3_uri=baseline_dir_uri,
58                                           constraints=baseline_constraints,
59                                           statistics=baseline_statistics,
60                                           schedule_cron_expression=CronExpressionGenerator.hourly(),
61                                           enable_cloudwatch_metrics=True)
62import boto3
63boto3_sm_client = boto3.client('sagemaker')
64boto3_sm_client.describe_monitoring_schedule(MonitoringScheduleName='ch-dq-monitor-schdl-try17')
65'MonitoringExecutionStatus': 'Failed',
66  ...
67  'FailureReason': 'Job inputs had no data'},
68```
69{'MonitoringScheduleArn': 'arn:aws:sagemaker:ap-south-1:<my-account-id>:monitoring-schedule/ch-dq-monitor-schdl-try21',
70 'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
71 'MonitoringScheduleStatus': 'Scheduled',
72 'MonitoringType': 'DataQuality',
73 'CreationTime': datetime.datetime(2021, 9, 14, 13, 7, 31, 899000, tzinfo=tzlocal()),
74 'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 247000, tzinfo=tzlocal()),
75 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
76  'MonitoringJobDefinitionName': 'data-quality-job-definition-2021-09-14-13-07-31-483',
77  'MonitoringType': 'DataQuality'},
78 'EndpointName': 'ch-dq-nh-try21',
79 'LastMonitoringExecutionSummary': {'MonitoringScheduleName': 'ch-dq-monitor-schdl-try21',
80  'ScheduledTime': datetime.datetime(2021, 9, 14, 14, 0, tzinfo=tzlocal()),
81  'CreationTime': datetime.datetime(2021, 9, 14, 14, 1, 9, 405000, tzinfo=tzlocal()),
82  'LastModifiedTime': datetime.datetime(2021, 9, 14, 14, 1, 13, 236000, tzinfo=tzlocal()),
83  'MonitoringExecutionStatus': 'Failed',
84  'EndpointName': 'ch-dq-nh-try21',
85  'FailureReason': 'Job inputs had no data'},
86 'ResponseMetadata': {'RequestId': 'dd729244-fde9-44b5-9904-066eea3a49bb',
87  'HTTPStatusCode': 200,
88  'HTTPHeaders': {'x-amzn-requestid': 'dd729244-fde9-44b5-9904-066eea3a49bb',
89   'content-type': 'application/x-amz-json-1.1',
90   'content-length': '835',
91   'date': 'Tue, 14 Sep 2021 14:27:53 GMT'},
92  'RetryAttempts': 0}}
93```{"captureData":{"endpointInput":{"observedContentType":"application/json","mode":"INPUT","data":"{\"longitude\": [-122.32, -117.58], \"latitude\": [37.55, 33.6], \"housing_median_age\": [50.0, 5.0], \"total_rooms\": [2501.0, 5348.0], \"total_bedrooms\": [433.0, 659.0], \"population\": [1050.0, 1862.0], \"households\": [410.0, 555.0], \"median_income\": [4.6406, 11.0567]}","encoding":"JSON"},"endpointOutput":{"observedContentType":"text/html; charset=utf-8","mode":"OUTPUT","data":"eyJtZWRpYW5faG91c2VfdmFsdWUiOiBbNDUyOTU3LjY5LCA0NjcyMTQuNF19","encoding":"BASE64"}},"eventMetadata":{"eventId":"9804d438-eb4c-4cb4-8f1b-d0c832b641aa","inferenceId":"ef07163d-ea2d-4730-92f3-d755bc04ae0d","inferenceTime":"2021-09-14T13:59:03Z"},"eventVersion":"0"}
94

I would like to know what has gone wrong in this entire process, that led to data not being fed to my monitoring job.

ANSWER

Answered 2022-Feb-26 at 04:38

This happens, during the ground-truth-merge job, when the spark can't find any data either in '/opt/ml/processing/groundtruth/' or '/opt/ml/processing/input_data/' directories. And that can happen when either you haven't sent any requests to the sagemaker endpoint or there are no ground truths.

I got this error because, the folder /opt/ml/processing/input_data/ of the docker volume mapped to the monitoring container had no data to process. And that happened because, the thing that facilitates entire process, including fetching data couldn't find any in S3. and that happened because, there was an extra slash(/) in the directory to which endpoint's captured-data will be saved. to elaborate, while creating the endpoint, I had mentioned the directory as s3://<bucket-name>/<folder-1>/, while it should have just been s3://<bucket-name>/<folder-1>. so, while the thing that copies data from S3 to docker volume tried to fetch data of that hour, the directory it tried to extract the data from was s3://<bucket-name>/<folder-1>//<endpoint-name>/<variant-name>/<year>/<month>/<date>/<hour>(notice the two slashes). So, when I created the endpoint-configuration again with the slash removed in S3 directory, this error wasn't present and ground-truth-merge operation was successful as part of model-quality-monitoring.

I am answering this question because, someone read the question and upvoted it. meaning, someone else has faced this problem too. so, I have mentioned what worked for me. And I wrote this, so that StackExchange doesn't think I am spamming the forum with questions.

Source https://stackoverflow.com/questions/69179914

QUESTION

Add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes

Asked 2022-Feb-13 at 20:24

I run prometheus locally as http://localhost:9090/targets with

1docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
2

and want to connect it to several Kubernetes (cluster) instances we have. See that scraping works, try Grafana dashboards etc.

And then I'll do the same on dedicated server that will be specially for monitoring. However all googling gives me all different ways to configure prometheus that is already within one Kubernetes instance, and no way to read metrics from external Kubernetes.

How to add Kubernetes scrape target to Prometheus instance that is NOT in Kubernetes?


I have read Where Kubernetes metrics come from and checked that my (first) Kubernetes cluster has the Metrics Server.

1docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
2kubectl get pods --all-namespaces | grep metrics-server 
3

There is definitely no sense to add Prometheus instance into every Kubernetes (cluster) instance. One Prometheus must be able to read metrics from many Kubernetes clusters and every node within them.

P.S. Some old question has answer to install Prometheus in every Kubernetes and then use federation, that is just opposite from what I am looking for.

P.P.S. It is also strange for me, why Kubernetes and Prometheus that are #1 and #2 projects from Cloud Native Foundation don't have simple "add Kubernetes target in Prometheus" button or simple step.

ANSWER

Answered 2021-Dec-28 at 08:33

There are many agents capable of saving metrics collected in k8s to remote Prometheus server outside the cluster, example Prometheus itself now support agent mode, exporter from Opentelemetry, or using managed Prometheus etc.

Source https://stackoverflow.com/questions/70457308

QUESTION

getifaddrs returning 'bad file descriptor'/crashing the application

Asked 2022-Jan-18 at 10:47

In my program, I have a thread which has to continuously monitor the network interfaces therefore it continuosly uses getifaddrs() in a while loop.

1    while(true) {
2    
3        struct ifaddrs *ifaddr, *ifa;
4        if (getifaddrs(&amp;ifaddr) == -1) {
5            perror(&quot;getifaddrs couldn't fetch required data&quot;);
6            exit(EXIT_FAILURE);
7        }
8  
9        //Iterate through interfaces linked list
10        for (ifa = ifaddr; ifa != NULL; ifa = ifa-&gt;ifa_next) {
11        //monitoring logic
12        }
13
14       //Free linked list
15       freeifaddrs(ifaddr);
16
17       //Sleep for specified time fo next polling cycle
18       usleep(1000);
19    
20    }
21

Most of the time my program works fine. However, sometimes getifaddrs() returns -1 and errNo = EBADF(bad file descriptor). In order to not exit my thread, I have temporarily replaced exit with continue(as I don't want my program to end due to this). However, I'm curious to know in which cases can getifaddrs() return 'bad file descriptor' error and whether I can do something so that this does not happen?

EDIT

replacing 'exit' with 'continue' didn't solve my problem. Sometimes the call to getifaddrs() is crashing the application!

Given below is the backtrace obtained from gdb using the generated core file.

1    while(true) {
2    
3        struct ifaddrs *ifaddr, *ifa;
4        if (getifaddrs(&amp;ifaddr) == -1) {
5            perror(&quot;getifaddrs couldn't fetch required data&quot;);
6            exit(EXIT_FAILURE);
7        }
8  
9        //Iterate through interfaces linked list
10        for (ifa = ifaddr; ifa != NULL; ifa = ifa-&gt;ifa_next) {
11        //monitoring logic
12        }
13
14       //Free linked list
15       freeifaddrs(ifaddr);
16
17       //Sleep for specified time fo next polling cycle
18       usleep(1000);
19    
20    }
21Program terminated with signal 6, Aborted.
22#0  0x00007fe2df1ef387 in raise () from /lib64/libc.so.6
23Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
24(gdb) bt
25#0  0x00007fe2df1ef387 in raise () from /lib64/libc.so.6
26#1  0x00007fe2df1f0a78 in abort () from /lib64/libc.so.6
27#2  0x00007fe2df231ed7 in __libc_message () from /lib64/libc.so.6
28#3  0x00007fe2df231fbe in __libc_fatal () from /lib64/libc.so.6
29#4  0x00007fe2df2df4c2 in __netlink_assert_response () from /lib64/libc.so.6
30#5  0x00007fe2df2dc412 in __netlink_request () from /lib64/libc.so.6
31#6  0x00007fe2df2dc5ef in getifaddrs_internal () from /lib64/libc.so.6
32#7  0x00007fe2df2dd310 in getifaddrs () from /lib64/libc.so.6
33#8  0x000000000047c03c in __interceptor_getifaddrs.part.0 ()
34

Operating system: Red Hat Enterprise Linux Server release 7.8 (Maipo)

GLIBC version: 2.17

ANSWER

Answered 2021-Dec-06 at 08:59

According to man7.org getifaddrs, any of the socket operations could be a cause for EBADF

ERRORS

getifaddrs() may fail and set errno for any of the errors specified for socket(2), bind(2), getsockname(2), recvmsg(2), sendto(2), malloc(3), or realloc(3).


Unrelated, but do you do freeifaddrs() somewhere?

Source https://stackoverflow.com/questions/70242848

QUESTION

Filter the parts of a Request Path which match against a Static Segment in Servant

Asked 2022-Jan-02 at 18:53

Supposing I'm running a Servant webserver, with two endpoints, with a type looking like this:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4
1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11

I might want to use something like Network.Wai.Middleware.Prometheus's instrumentHandlerValue to generate a Prometheus metric that fire's every time this API is called, with a handler value set to the path of the request.

However, if I do something like the following:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12

This is bad, because different requests to the book/<Name> endpoint, such as book/great-expectations and book/vanity-fair result in different labels, this is fine if the number of books is small, but if it's very large then the amount of data used by these metrics is very big, and either my service falls over, or my monitoring bill becomes very large.

I'd quite like a function, that took a Servant API, and a Wai Request, and if it matched, returned a list of segments in a form that was the same for each endpoint.

That is requests to /books would return Just ["books"], requests to /book/little-dorrit would return Just ["book", "Name"], and requests to /films would return Nothing.

I can kind of see how you might go about writing this by pattern matching on Router' from Servant.Server.Internal.Router, but it's not clear to me that relying on an internal package in order to do this is a good idea.

Is there a better way?

ANSWER

Answered 2022-Jan-02 at 18:53

The pathInfo function returns all the path segments for a Request. Perhaps we could define a typeclass that, given a Servant API, produced a "parser" for the list of segments, whose result would be a formatted version of the list.

The parser type could be something like:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17

Where the first [Text] in the state are the path segments yet to be parsed, and the second are the formatted path segments we have accumulated so far.

This type has an Alternative instance where failure discards state (basically backtracking) and a MonadFail instance that returns mzero on pattern-match failure inside do-blocks.

The typeclass:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29    pathParser :: Proxy x -&gt; PathParser
30

The instance for Symbol moves the path piece from the pending list to the processed list:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29    pathParser :: Proxy x -&gt; PathParser
30instance KnownSymbol piece =&gt; HasPathParser (piece :: Symbol) where
31  pathParser _ = do
32      (piece : rest, found) &lt;- get -- we are using MonadFail here
33      guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34      put (rest, piece : found)
35

The instance for Capture puts the name of the path variable—not the value—on the processed list:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29    pathParser :: Proxy x -&gt; PathParser
30instance KnownSymbol piece =&gt; HasPathParser (piece :: Symbol) where
31  pathParser _ = do
32      (piece : rest, found) &lt;- get -- we are using MonadFail here
33      guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34      put (rest, piece : found)
35instance KnownSymbol name =&gt; HasPathParser (Capture name x) where
36  pathParser _ = do
37      (_ : rest, found) &lt;- get  -- we are using MonadFail here
38      put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39

When we reach a Verb (GET, POST...) we require that no pending path pieces should remain:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29    pathParser :: Proxy x -&gt; PathParser
30instance KnownSymbol piece =&gt; HasPathParser (piece :: Symbol) where
31  pathParser _ = do
32      (piece : rest, found) &lt;- get -- we are using MonadFail here
33      guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34      put (rest, piece : found)
35instance KnownSymbol name =&gt; HasPathParser (Capture name x) where
36  pathParser _ = do
37      (_ : rest, found) &lt;- get  -- we are using MonadFail here
38      put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39instance HasPathParser (Verb method statusCode contextTypes a) where
40  pathParser _ = do
41      ([], found) &lt;- get -- we are using MonadFail here
42      put ([], found)
43

Some other instances:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29    pathParser :: Proxy x -&gt; PathParser
30instance KnownSymbol piece =&gt; HasPathParser (piece :: Symbol) where
31  pathParser _ = do
32      (piece : rest, found) &lt;- get -- we are using MonadFail here
33      guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34      put (rest, piece : found)
35instance KnownSymbol name =&gt; HasPathParser (Capture name x) where
36  pathParser _ = do
37      (_ : rest, found) &lt;- get  -- we are using MonadFail here
38      put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39instance HasPathParser (Verb method statusCode contextTypes a) where
40  pathParser _ = do
41      ([], found) &lt;- get -- we are using MonadFail here
42      put ([], found)
43instance HasPathParser (ReqBody x y) where
44  pathParser _ = pure ()
45
46instance (HasPathParser a, HasPathParser b) =&gt; HasPathParser (a :&gt; b) where
47  pathParser _ = pathParser (Proxy @a) *&gt; pathParser (Proxy @b)
48
49instance (HasPathParser a, HasPathParser b) =&gt; HasPathParser (a :&lt;|&gt; b) where
50  pathParser _ = pathParser (Proxy @a) &lt;|&gt; pathParser (Proxy @b)
51

Putting it to work:

1type BookAPI =
2  &quot;books&quot; :&gt; Get '[JSON] (Map Text Text)
3    :&lt;|&gt; &quot;book&quot; :&gt; Capture &quot;Name&quot; Text :&gt; ReqBody '[JSON] (Text) :&gt; Post '[JSON] (Text)
4λ:T.putStrLn $ layout (Proxy :: Proxy BookAPI)
5/
6├─ book/
7│  └─ &lt;capture&gt;/
8│     └─•
9└─ books/
10   └─•
11prometheusMiddlware = instrumentHandlerValue (T.intercalate &quot;\\&quot; . pathInfo)
12import Data.Text 
13import Control.Monad.State.Strict
14import Control.Applicative
15
16type PathParser = StateT ([Text],[Text]) Maybe ()
17{-# LANGUAGE KindSignatures #-}
18{-# LANGUAGE PolyKinds #-}
19{-# LANGUAGE DataKinds #-}
20{-# LANGUAGE FlexibleInstances #-}
21{-# LANGUAGE TypeApplications #-}
22{-# LANGUAGE ScopedTypeVariables #-}
23{-# LANGUAGE TypeOperators #-}
24{-# LANGUAGE OverloadedStrings #-}
25import Data.Data ( Proxy )
26import GHC.TypeLits
27
28class HasPathParser (x :: k) where
29    pathParser :: Proxy x -&gt; PathParser
30instance KnownSymbol piece =&gt; HasPathParser (piece :: Symbol) where
31  pathParser _ = do
32      (piece : rest, found) &lt;- get -- we are using MonadFail here
33      guard (piece == Data.Text.pack (symbolVal (Proxy @piece)))
34      put (rest, piece : found)
35instance KnownSymbol name =&gt; HasPathParser (Capture name x) where
36  pathParser _ = do
37      (_ : rest, found) &lt;- get  -- we are using MonadFail here
38      put (rest, Data.Text.pack (symbolVal (Proxy @name)) : found)
39instance HasPathParser (Verb method statusCode contextTypes a) where
40  pathParser _ = do
41      ([], found) &lt;- get -- we are using MonadFail here
42      put ([], found)
43instance HasPathParser (ReqBody x y) where
44  pathParser _ = pure ()
45
46instance (HasPathParser a, HasPathParser b) =&gt; HasPathParser (a :&gt; b) where
47  pathParser _ = pathParser (Proxy @a) *&gt; pathParser (Proxy @b)
48
49instance (HasPathParser a, HasPathParser b) =&gt; HasPathParser (a :&lt;|&gt; b) where
50  pathParser _ = pathParser (Proxy @a) &lt;|&gt; pathParser (Proxy @b)
51main :: IO ()
52main = do
53    do let Just ([], result) = execStateT (pathParser (Proxy @BookAPI)) ([&quot;books&quot;],[])
54       print result
55       -- [&quot;books&quot;]
56    do let Just ([], result) = execStateT (pathParser (Proxy @BookAPI)) ([&quot;book&quot;, &quot;somebookid&quot;],[])
57       print result
58       -- [&quot;Name&quot;,&quot;book&quot;]
59

Source https://stackoverflow.com/questions/70439647

QUESTION

CloudScheduler 403 Permission denied while creating

Asked 2021-Dec-22 at 05:07

I am trying to create a Cron job programmatically in the CloudScheduler Google Cloud Platform using the following API explorer.

Reference: Cloud Scheduler Documentation

Even though I have given the user Owner permission and verified it in Policy Troubleshooter that it has cloudscheduler.jobs.create, I am still getting the following error.

1{
2  &quot;error&quot;: {
3    &quot;code&quot;: 403,
4    &quot;message&quot;: &quot;The principal (user or service account) lacks IAM permission \&quot;cloudscheduler.jobs.create\&quot; for the resource \&quot;projects/cloud-monitoring-saurav/locations/us-central\&quot; (or the resource may not exist).&quot;,
5    &quot;status&quot;: &quot;PERMISSION_DENIED&quot;
6  }
7}
8

ANSWER

Answered 2021-Dec-16 at 14:42

The error is caused by using a service account that does not have an IAM role that includes the permission cloudscheduler.jobs.create. An example role is roles/cloudscheduler.admin aka Cloud Scheduler Admin. I have the feeling that you have mixed the permission of the service account that you use with Cloud Scheduler (at runtime, when a job triggers something) and the permission of the account currently creating the job (aka your account for example).

You actually need two service accounts for the job to get created. You need one that you set up yourself (can be whatever name you like and doesn't require any special permissions) and you also need the one for the default Cloud Scheduler itself ( which is managed by Google)

  1. Use an existing service account to be used for the call from Cloud Scheduler to your HTTP target or you can create a new service account for this purpose. The service account must belong to the same project as the one in which the Cloud Scheduler jobs are created. This is the client service account. Use this one when specifying the service account to generate the OAuth / OICD tokens. If your target is part of Google Cloud, like Cloud Functions/Cloud Run update your client service account by granting it the necessary IAM role (Cloud function invoker for cloud functions and Cloud Run Invoker for Cloud Run).The receiving service automatically verifies the generated token. If your target is outside of Google Cloud, the receiving service must manually verify the token.

  2. The other service account is the default Cloud Scheduler service account which must also be present in your project and have the Cloud Scheduler Service Agent role granted to it. This is so it can generate header tokens on behalf of your client service account to authenticate to your target. The Cloud Scheduler service account with this role granted is automatically set up when you enable the Cloud Scheduler API, unless you enabled it prior to March 19, 2019, in which case you must add the role manually.

Note : Do not remove the service-YOUR_PROJECT_NUMBER@gcp-sa-cloudscheduler.iam.gserviceaccount.com service account from your project, or its Cloud Scheduler Service Agent role. Doing so will result in 403 responses to endpoints requiring authentication, even if your job's service account has the appropriate role.

Source https://stackoverflow.com/questions/70376512

QUESTION

Elastic Beanstalk deleting generated files on config changes

Asked 2021-Dec-20 at 23:51

On Elastic Beanstalk, with an AWS Linux 2 based environment, updating the Environment Properties (i.e. environment variables) of an environment causes all generated files to be deleted. It also doesn't run container_commands as part of this update.

So, for example, I have a Django project with collectstatic in the container commands:

105_collectstatic:
2  command: |
3    source $PYTHONPATH/activate
4    python manage.py collectstatic --noinput --ignore *.scss
5

This collects static files to a folder called staticfiles as part of deploy. But when I do an environment variable update, staticfiles is deleted. This causes all static files on the application to be broken until I re-deploy, which is extremely undesirable.

This behavior did not occur on AWS Linux 1 based environments. The difference appears to be that AWS Linux 2 based environments replace the /var/app/current folder during environment variable changes, where AWS Linux 1 based environments did not do this.

How do I fix this?

Research

I can verify that the container commands are not being run during an environment variable change by monitoring /var/log/cfn-init.log; no new entries are added to this log.

This happens with both rolling update type "disabled" and "immutable".

This happens even if I convert the environment command to be a platform hook, despite the fact that hooks are listed as running when environment properties are updated.

It seems to me like there are two potential solutions, but I don't know of an Elastic Beanstalk setting for either:

  1. Have environment variable changes leave /var/app/current rather than replacing it.
  2. Have environment variable changes run container commands.

The Elastic Beanstalk docs on container commands say "Leader-only container commands are only executed during environment creation and deployments, while other commands and server customization operations are performed every time an instance is provisioned or updated." Is this a bug in Elastic Beanstalk?

Related question: EB: Trigger container commands / deploy scripts on configuration change

ANSWER

Answered 2021-Dec-20 at 23:51

The solution is to use a Configuration deployment platform hook for any commands that change the files in the deployment directory. Note that this is different from an Application deployment platform hook.

Using the example of the collectstatic command, the best thing to do is to move it from a container command to a pair of hooks, one for standard deployments and one for configuration changes.

To do this, remove the collectstatic container command. Then, make two identical files:

  • .platform/confighooks/predeploy/predeploy.sh
  • .platform/hooks/predeploy/predeploy.sh

Each file should have the following code:

105_collectstatic:
2  command: |
3    source $PYTHONPATH/activate
4    python manage.py collectstatic --noinput --ignore *.scss
5#!/bin/bash
6source $PYTHONPATH/activate
7python manage.py collectstatic --noinput --ignore *.scss
8

You need two seemingly redundant files because different hooks have different trigger conditions. Scripts in hooks run when you deploy the app whereas scripts in confighooks run when you change the configuration of the app.

Make sure to make both of these files executable according to git or else you will run into a "permission denied" error when you try to deploy. You can check if they are executable via git ls-files -s .platform; you should see 100755 before any shell files in the output of this command. If you see 100644 before any of your shell files, run git add --chmod=+x -- .platform/*/*/*.sh to make them executable.

Source https://stackoverflow.com/questions/70386134

QUESTION

GCP alert if Docker container exits

Asked 2021-Dec-16 at 07:46

We are using Google Cloud Platform and its service Compute Engine. We have Docker installed on one of our VM instances (which is part of Compute Engine). Can we set up an alert in GCP Monitoring to be triggered when a container exits?

ANSWER

Answered 2021-Dec-16 at 07:46

Yes. You have to configure docker with Cloud logging driver, create a log based metric and an alerting policy on that metric.

Solution:

Configure docker with cloud logging driver.

  • SSH to your instance.

  • Run this command as root: dockerd --log-driver=gcplogs. This will forward your docker logs to Google Cloud Logging.

  • If running a container-optimized OS then follow these steps:

    echo '{"log-driver":"gcplogs"}' | sudo tee /etc/docker/daemon.json

    sudo systemctl restart docker

  • Try starting and exiting a container.

These are the logs generated whenever we exit a container. Keep in mind the two exit messages logs generated

Create a log based metric

  • Go to Logging -> Select Log-based Metrics.

  • Click on Create Metric.

  • Metric Type: Counter. In Details, enter a Log Metric name (e.g. mysite-container-exited)

  • Under Filter Selection -> Build filter, copy the following code

    resource.type="gce_instance"

    log_name="projects/myproject-12345/logs/gcplogs-docker-driver" replace myproject-12345 with your project name.

    jsonPayload.container.name="/mysite" change mysite to your container name.

    jsonPayload.message=~"exit$" This is regex that matches exit as the last word in a line.

It should look something like this. enter image description here

Create an alerting policy

  • Go to Monitoring -> Select Alerting.
  • Click on Create Policy.
  • Click on Select a Metric and search for your metric name (e.g. mysite-container-exited).
  • Select your metric and click on Apply. enter image description here
  • Set Rolling window to 2 min and Rolling window function to count. Click Next. enter image description here
  • Set Alert Trigger to Any time series violates, Threshold postion to Above threshold and Threshold value to 1. Click Next. enter image description here
  • Select a notification Channel to send alerts to. If none exists then click on Manage Notification Channels and create one.
  • Easiest one would be to add an email for notification channel.
  • After creating go back, click on Refresh icon in Notification Channels and select it. Click ok.
  • Click on Save Policy.

As we have seen that two exit commands are issued per container to the logs whenever we exit a container, thats's why the threshold is set to above 1.

You can monitor more containers by creating a new metric and changing the value of jsonPayload.container.name="/mysite" to your container name.

Now try to start and stop the container. You will receive an alert on email in 2-3 minutes. enter image description here

Source https://stackoverflow.com/questions/70345830

QUESTION

How to make Spring Boot Actuator available via JMX with globally enabled lazy initialization?

Asked 2021-Dec-13 at 22:11

In our Spring Boot 2.4+ based applications we need to have configured that initialization should be performed lazily in application.properties:

1spring.main.lazy-initialization=true
2spring.jmx.enabled=true
3

However with such settings Actuator end-points cannot be reached via JMX.

This is a blocker now when we are migrating to Instana monitoring, which requires org.springframework.boot:type=Endpoint,name=Metrics and org.springframework.boot:type=Endpoint,name=Health MBeans to be available via JMX.

Is there a way to keep lazy initialization enabled but at the same Actuator accessible via JMX, please?

ANSWER

Answered 2021-Oct-14 at 11:59

This is a bug in Spring Boot for which I've just opened an issue. Thanks for bringing it to our attention.

You can work around the problem by excluding the bean that exports the endpoints to JMX from lazy initialization. To do so, add the following bean to your application:

1spring.main.lazy-initialization=true
2spring.jmx.enabled=true
3@Bean
4LazyInitializationExcludeFilter eagerJmxEndpointExport() {
5    return LazyInitializationExcludeFilter.forBeanTypes(JmxEndpointExporter.class);
6}
7

Source https://stackoverflow.com/questions/69569955

QUESTION

Monitor language specific API Calls for Microsoft Translator API

Asked 2021-Dec-03 at 06:23

Is there any way to monitor the Language Specific USAGE of Translator API (e.g. xx number of characters for english to japanese for Microsoft Translator API. I couldn't see any such metrics in the Metrics Monitoring Section.

ANSWER

Answered 2021-Nov-22 at 10:06

Currently it's not available, We have added to Azure metrics feature request to the Translator service and it will be added in the near future.

Source https://stackoverflow.com/questions/69940928

QUESTION

switchMap combined with mergeMap

Asked 2021-Dec-01 at 12:46

I have an Observable where each new value should cause an HTTP request. On the client-side I only care about the latest response value; however, I want every request to complete for monitoring/etc. purposes.

What I currently have is something like:

1function simulate(x) {
2  // Simulate an HTTP request.
3  return of(x).pipe(delay(6));
4}
5
6source$.pipe(
7  someMapFunc(x =&gt; simulate(x)),
8);
9

When I use switchMap for the someMapFunc, I get the right set of responses (only the latest). However, if the request is taking too long, it will get canceled.

When I use mergeMap instead, I get the right set of requests (every request completes), but I get the wrong set of responses (every single one).

marble diagram of code above

Is there a way to get the requests of mergeMap with the responses of switchMap? I know I can write this as a custom operator, but I'm wondering if I can build this out of existing/standard rxjs operators. To summarize what I'm thinking of:

  • a version of switchMap that doesn't unsubscribe when it switches;
  • a version of mergeMap that only emits values from the latest inner Observable.

Edit: Based on the accepted answer, I was able to get the following, which works:

1function simulate(x) {
2  // Simulate an HTTP request.
3  return of(x).pipe(delay(6));
4}
5
6source$.pipe(
7  someMapFunc(x =&gt; simulate(x)),
8);
9function orderedMergeMap(project) {
10  return (s) =&gt; defer(() =&gt; {
11    let recent = 0;
12    return s.pipe(
13      mergeMap((data, idx) =&gt; {
14        recent = idx;
15        return project(data).pipe(filter(() =&gt; idx === recent));
16      })
17    );
18  });
19}
20

ANSWER

Answered 2021-Nov-23 at 01:01

I believe that you need a combination of concatMap() and last().

concatMap does not subscribe to the next observable until the previous completes. Using it you will ensure the order of requests execution. And as it follows from the description it doesn't cancel previous subscriptions and let them finish, unlike switchMap.

last emits the last value emitted from the source on completion. Using it you will ensure that only one (last) result will be passed to the result.

Your code will look like that:

1function simulate(x) {
2  // Simulate an HTTP request.
3  return of(x).pipe(delay(6));
4}
5
6source$.pipe(
7  someMapFunc(x =&gt; simulate(x)),
8);
9function orderedMergeMap(project) {
10  return (s) =&gt; defer(() =&gt; {
11    let recent = 0;
12    return s.pipe(
13      mergeMap((data, idx) =&gt; {
14        recent = idx;
15        return project(data).pipe(filter(() =&gt; idx === recent));
16      })
17    );
18  });
19}
20source$.pipe(
21  concatMap(x =&gt; simulate(x)),
22  last()
23);
24

Source https://stackoverflow.com/questions/70073963

Community Discussions contain sources that include Stack Exchange Network

Tutorials and Learning Resources in Monitoring

Tutorials and Learning Resources are not available at this moment for Monitoring

Share this Page

share link

Get latest updates on Monitoring