Explore all GCP open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in GCP

microservices-demo

v0.3.6

go-cloud

v0.24.0

infracost

v0.9.22

google-cloud-go

compute: v1.6.1

scio

0.11.5

Popular Libraries in GCP

microservices-demo

by GoogleCloudPlatform doticonpythondoticon

star image 12028 doticonApache-2.0

Sample cloud-native application with 10 microservices showcasing Kubernetes, Istio, gRPC and OpenCensus.

awesome-kubernetes

by ramitsurana doticonshelldoticon

star image 11943 doticonNOASSERTION

A curated list for awesome kubernetes sources :ship::tada:

go-cloud

by google doticongodoticon

star image 8273 doticonApache-2.0

The Go Cloud Development Kit (Go CDK): A library and tools for open cloud development in Go.

infracost

by infracost doticongodoticon

star image 6374 doticonApache-2.0

Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!

python-docs-samples

by GoogleCloudPlatform doticonpythondoticon

star image 5530 doticonApache-2.0

Code samples used on cloud.google.com

golang-samples

by GoogleCloudPlatform doticongodoticon

star image 3325 doticonApache-2.0

Sample apps and code written for Google Cloud in the Go programming language.

google-cloud-go

by googleapis doticongodoticon

star image 2827 doticonApache-2.0

Google Cloud Client Libraries for Go.

nodejs-docs-samples

by GoogleCloudPlatform doticonjavascriptdoticon

star image 2379 doticonApache-2.0

Node.js samples for Google Cloud Platform products.

google-cloud-node

by googleapis doticonjavascriptdoticon

star image 2364 doticonApache-2.0

Google Cloud Client Library for Node.js

Trending New libraries in GCP

infracost

by infracost doticongodoticon

star image 6374 doticonApache-2.0

Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!

serverless-vault-with-cloud-run

by kelseyhightower doticonshelldoticon

star image 273 doticonApache-2.0

Guide to running Vault on Cloud Run

Firebase-ESP-Client

by mobizt doticonc++doticon

star image 215 doticonMIT

🔥Firebase Arduino Client Library for ESP8266 and ESP32. The complete, fast, secured and reliable Firebase Arduino client library that supports RTDB, Cloud Firestore, Firebase and Google Cloud Storage, Cloud Messaging and Cloud Functions for Firebase.

collie-cli

by meshcloud doticontypescriptdoticon

star image 131 doticonApache-2.0

Collie CLI allows you to manage your AWS, Azure & GCP cloud landscape through a single view.

vscode-langservers-extracted

by hrsh7th doticonjavascriptdoticon

star image 117 doticonMIT

vscode-langservers bin collection.

healthcare-data-protection-suite

by GoogleCloudPlatform doticongodoticon

star image 111 doticonApache-2.0

Deploy, monitor & audit on GCP simplified

cloudprober

by cloudprober doticongodoticon

star image 111 doticonApache-2.0

An active monitoring software to detect failures before your customers do.

secrets-store-csi-driver-provider-gcp

by GoogleCloudPlatform doticongodoticon

star image 107 doticonApache-2.0

Google Secret Manager provider for the Secret Store CSI Driver.

SQL-scripts

by HariSekhon doticonshelldoticon

star image 97 doticonNOASSERTION

100+ SQL Scripts - PostgreSQL, MySQL, Google BigQuery, MariaDB, AWS Athena. DevOps / DBA / Analytics / performance engineering. Google BigQuery ML machine learning classification.

Top Authors in GCP

1

GoogleCloudPlatform

108 Libraries

star icon37610

2

googleapis

23 Libraries

star icon9097

3

salrashid123

13 Libraries

star icon186

4

google

13 Libraries

star icon10075

5

doitintl

12 Libraries

star icon558

6

googlearchive

9 Libraries

star icon1575

7

mchmarny

8 Libraries

star icon80

8

kelseyhightower

7 Libraries

star icon743

9

spotify

7 Libraries

star icon2673

10

benlcollins

6 Libraries

star icon41

1

108 Libraries

star icon37610

2

23 Libraries

star icon9097

3

13 Libraries

star icon186

4

13 Libraries

star icon10075

5

12 Libraries

star icon558

6

9 Libraries

star icon1575

7

8 Libraries

star icon80

8

7 Libraries

star icon743

9

7 Libraries

star icon2673

10

6 Libraries

star icon41

Trending Kits in GCP

No Trending Kits are available at this moment for GCP

Trending Discussions on GCP

Submit command line arguments to a pyspark job on airflow

Skip first line in import statement using gc.open_by_url from gspread (i.e. add header=0)

Automatically Grab Latest Google Cloud Platform Secret Version

Programmatically Connecting a GitHub repo to a Google Cloud Project

Unable to create a new Cloud Function - cloud-client-api-gae

TypeScript project failing to deploy to App Engine targeting Node 12 or 14, but works with Node 10

Dataproc Java client throws NoSuchMethodError setUseJwtAccessWithScope

Apache Beam Cloud Dataflow Streaming Stuck Side Input

BIG Query command using BAT file

Vertex AI Model Batch prediction, issue with referencing existing model and input file on Cloud Storage

QUESTION

Submit command line arguments to a pyspark job on airflow

Asked 2022-Mar-29 at 10:37

I have a pyspark job available on GCP Dataproc to be triggered on airflow as shown below:

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13

I need to supply command line arguments to this pyspark job as show below [this is how I am running my pyspark job from command line]:

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
14

I am providing the arguments to my pyspark job using "configparser". Therefore, arg1 is the key and val1 is the value from my spark-submit commant above.

How do I define the "args" param in the "MY_PYSPARK_JOB" defined above [equivalent to my command line arguments]?

ANSWER

Answered 2022-Mar-28 at 08:18

You have to pass a Sequence[str]. If you check DataprocSubmitJobOperator you will see that the params job implements a class google.cloud.dataproc_v1.types.Job.

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
14class DataprocSubmitJobOperator(BaseOperator):
15...
16    :param job: Required. The job resource. If a dict is provided, it must be of the same form as the protobuf message.
17    :class:`~google.cloud.dataproc_v1.types.Job` 
18

So, on the section about job type pySpark which is google.cloud.dataproc_v1.types.PySparkJob:

args Sequence[str] Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

Source https://stackoverflow.com/questions/71616491

QUESTION

Skip first line in import statement using gc.open_by_url from gspread (i.e. add header=0)

Asked 2022-Mar-16 at 08:12

What is the equivalent of header=0 in pandas, which recognises the first line as a heading in gspread?

pandas import statement (correct)

1import pandas as pd
2
3# gcp / google sheets URL
4df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
5
6raw_dataset = pd.read_csv(df_URL, na_values='?',sep=';'
7                          , skipinitialspace=True, header=0, index_col=None)
8

Using the gspread function, so far I import the data, change the first line to the heading then delete the first line after but this recognises everything in the DataFrame as a string. I would like to recognise the first line as a heading right away in the import statement.

gspread import statement that needs header=True equivalent

1import pandas as pd
2
3# gcp / google sheets URL
4df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
5
6raw_dataset = pd.read_csv(df_URL, na_values='?',sep=';'
7                          , skipinitialspace=True, header=0, index_col=None)
8import pandas as pd
9from google.colab import auth
10auth.authenticate_user()
11import gspread
12from oauth2client.client import GoogleCredentials
13
14
15# gcp / google sheets url
16df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
17
18# importing the data from Google Drive  setup
19gc = gspread.authorize(GoogleCredentials.get_application_default())
20
21# read data and put it in dataframe
22g_sheets = gc.open_by_url(df_URL) 
23
24df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_values())
25
26  
27# change first row to header
28df = df.rename(columns=df.iloc[0]) 
29
30# drop first row
31df.drop(index=df.index[0], axis=0, inplace=True) 
32

ANSWER

Answered 2022-Mar-16 at 08:12

Looking at the API documentation, you probably want to use:

1import pandas as pd
2
3# gcp / google sheets URL
4df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
5
6raw_dataset = pd.read_csv(df_URL, na_values='?',sep=';'
7                          , skipinitialspace=True, header=0, index_col=None)
8import pandas as pd
9from google.colab import auth
10auth.authenticate_user()
11import gspread
12from oauth2client.client import GoogleCredentials
13
14
15# gcp / google sheets url
16df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"
17
18# importing the data from Google Drive  setup
19gc = gspread.authorize(GoogleCredentials.get_application_default())
20
21# read data and put it in dataframe
22g_sheets = gc.open_by_url(df_URL) 
23
24df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_values())
25
26  
27# change first row to header
28df = df.rename(columns=df.iloc[0]) 
29
30# drop first row
31df.drop(index=df.index[0], axis=0, inplace=True) 
32df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_records(head=1))
33

The .get_all_records method returns a dictionary of with the column headers as the keys and a list of column values as the dictionary values. The argument head=<int> determines which row to use as keys; rows start from 1 and follow the numeration of the spreadsheet.

Since the values returned by .get_all_records() are lists of strings, the data frame constructor, pd.DataFrame, will return a data frame that is all strings. To convert it to floats, we need to replace the empty strings, and the the dash-only strings ('-') with NA-type values, then convert to float.

Luckily pandas DataFrame has a convenient method for replacing values .replace. We can pass it mapping from the string we want as NAs to None, which gets converted to NaN.

1import pandas as pd
2
3# gcp / google sheets URL
4df_URL = &quot;https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing&quot;
5
6raw_dataset = pd.read_csv(df_URL, na_values='?',sep=';'
7                          , skipinitialspace=True, header=0, index_col=None)
8import pandas as pd
9from google.colab import auth
10auth.authenticate_user()
11import gspread
12from oauth2client.client import GoogleCredentials
13
14
15# gcp / google sheets url
16df_URL = &quot;https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing&quot;
17
18# importing the data from Google Drive  setup
19gc = gspread.authorize(GoogleCredentials.get_application_default())
20
21# read data and put it in dataframe
22g_sheets = gc.open_by_url(df_URL) 
23
24df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_values())
25
26  
27# change first row to header
28df = df.rename(columns=df.iloc[0]) 
29
30# drop first row
31df.drop(index=df.index[0], axis=0, inplace=True) 
32df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_records(head=1))
33import pandas as pd
34
35data = g_sheets.get_worksheet(0).get_all_records(head=1)
36
37na_strings_map= {
38    '-': None, 
39    '': None
40}
41
42df = pd.DataFrame(data).replace(na_strings_map).astype(float)
43

Source https://stackoverflow.com/questions/71418682

QUESTION

Automatically Grab Latest Google Cloud Platform Secret Version

Asked 2022-Mar-01 at 03:01

I'm trying to grab the latest secret version. Is there a way to do that without specifying the version number? Such as using the keyword "latest". I'm trying to avoid having to iterate through all the secret versions with a for loop as GCP documentation shows:

1try (SecretManagerServiceClient client = SecretManagerServiceClient.create()) {
2  // Build the parent name.
3  SecretName projectName = SecretName.of(projectId, secretId);
4
5  // Get all versions.
6  ListSecretVersionsPagedResponse pagedResponse = client.listSecretVersions(projectName);
7
8  // List all versions and their state.
9  pagedResponse
10      .iterateAll()
11      .forEach(
12          version -&gt; {
13            System.out.printf(&quot;Secret version %s, %s\n&quot;, version.getName(), version.getState());
14          });
15}
16

ANSWER

Answered 2021-Sep-12 at 18:54
1try (SecretManagerServiceClient client = SecretManagerServiceClient.create()) {
2  // Build the parent name.
3  SecretName projectName = SecretName.of(projectId, secretId);
4
5  // Get all versions.
6  ListSecretVersionsPagedResponse pagedResponse = client.listSecretVersions(projectName);
7
8  // List all versions and their state.
9  pagedResponse
10      .iterateAll()
11      .forEach(
12          version -&gt; {
13            System.out.printf(&quot;Secret version %s, %s\n&quot;, version.getName(), version.getState());
14          });
15}
16import com.google.cloud.secretmanager.v1.AccessSecretVersionResponse;
17import com.google.cloud.secretmanager.v1.SecretManagerServiceClient;
18import com.google.cloud.secretmanager.v1.SecretVersionName;
19import java.io.IOException;
20
21public class AccessSecretVersion {
22
23  public static void accessSecretVersion() throws IOException {
24    // TODO(developer): Replace these variables before running the sample.
25    String projectId = &quot;your-project-id&quot;;
26    String secretId = &quot;your-secret-id&quot;;
27    String versionId = &quot;latest&quot;; //&lt;-- specify version
28    accessSecretVersion(projectId, secretId, versionId);
29  }
30
31  // Access the payload for the given secret version if one exists. The version
32  // can be a version number as a string (e.g. &quot;5&quot;) or an alias (e.g. &quot;latest&quot;).
33  public static void accessSecretVersion(String projectId, String secretId, String versionId)
34      throws IOException {
35    // Initialize client that will be used to send requests. This client only needs to be created
36    // once, and can be reused for multiple requests. After completing all of your requests, call
37    // the &quot;close&quot; method on the client to safely clean up any remaining background resources.
38    try (SecretManagerServiceClient client = SecretManagerServiceClient.create()) {
39      SecretVersionName secretVersionName = SecretVersionName.of(projectId, secretId, versionId);
40
41      // Access the secret version.
42      AccessSecretVersionResponse response = client.accessSecretVersion(secretVersionName);
43
44      // Print the secret payload.
45      //
46      // WARNING: Do not print the secret in a production environment - this
47      // snippet is showing how to access the secret material.
48      String payload = response.getPayload().getData().toStringUtf8();
49      System.out.printf(&quot;Plaintext: %s\n&quot;, payload);
50    }
51  }
52}
53

Source https://stackoverflow.com/questions/68805240

QUESTION

Programmatically Connecting a GitHub repo to a Google Cloud Project

Asked 2022-Feb-12 at 16:16

I'm working on a Terraform project that will set up all the GCP resources needed for a large project spanning multiple GitHub repos. My goal is to be able to recreate the cloud infrastructure from scratch completely with Terraform.

The issue I'm running into is in order to setup build triggers with Terraform within GCP, the GitHub repo that is setting off the trigger first needs to be connected. Currently, I've only been able to do that manually via the Google Cloud Build dashboard. I'm not sure if this is possible via Terraform or with a script but I'm looking for any solution I can automate this with. Once the projects are connected updating everything with Terraform is working fine.

TLDR; How can I programmatically connect a GitHub project with a GCP project instead of using the dashboard?

ANSWER

Answered 2022-Feb-12 at 16:16

Currently there is no way to programmatically connect a GitHub repo to a Google Cloud Project. This must be done manually via Google Cloud.

My workaround is to manually connect an "admin" project, build containers and save them to that project's artifact registry, and then deploy the containers from the registry in the programmatically generated project.

Source https://stackoverflow.com/questions/69834735

QUESTION

Unable to create a new Cloud Function - cloud-client-api-gae

Asked 2022-Feb-11 at 18:49

I'm unable to create a Cloud Function in my GCP project using GUI, but have admin roles for GCF, SA and IAM.

Here is the error message:

Missing necessary permission iam.serviceAccounts.actAs for cloud-client-api-gae on the service account serviceaccountname@DOMAIN.iam.gserviceaccount.com. Grant the role 'roles/iam.serviceAccountUser' to cloud-client-api-gae on the service account serviceaccountname@DOMAIN.iam.gserviceaccount.com.

cloud-client-api-gae is not an SA nor User on my IAM list. It must be a creature living underneath Graphical User Interfrace.

I have Enabled API for GCF, AppEngine and I have Service Account Admin role.

I had literally 0 search results when googling for cloud-client-api-gae.

ANSWER

Answered 2022-Jan-18 at 13:53

I contacted GCP support and it seems I was missing a single permission for my user: Service Account User - that's it.

PS: Person from support didn't know what this thing called "cloud-client-api-gae" is.

Source https://stackoverflow.com/questions/70756708

QUESTION

TypeScript project failing to deploy to App Engine targeting Node 12 or 14, but works with Node 10

Asked 2022-Jan-16 at 14:32

I have a TypeScript project that has been deployed several times without any problems to Google App Engine, Standard environment, running Node 10. However, when I try to update the App Engine project to either Node 12 or 14 (by editing the engines.node value in package.json and the runtime value in app.yaml), the deploy fails, printing the following to the console:

1&gt; ####@1.0.1 prepare /workspace
2&gt; npm run gcp-build
3
4
5&gt; ####@1.0.1 gcp-build /workspace
6&gt; tsc -p .
7
8sh: 1: tsc: not found
9npm ERR! code ELIFECYCLE
10npm ERR! syscall spawn
11npm ERR! file sh
12npm ERR! errno ENOENT
13npm ERR! ####@1.0.1 gcp-build: `tsc -p .`
14npm ERR! spawn ENOENT
15npm ERR!
16npm ERR! Failed at the ####@1.0.1 gcp-build script.
17npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
18

According to the following documentation, App Engine should be installing the modules listed in devDependencies in package.json prior to running the gcp-build script (which it has been doing, as expected, when deploying with the Node version set to 10).

https://cloud.google.com/appengine/docs/standard/nodejs/running-custom-build-step

I can't find any documentation regarding how this App Engine deployment behavior has changed when targeting Node 12 or 14 rather than Node 10. Is there some configuration I am missing? Or does Google no longer install devDependencies prior to running the gcp-build script?

Here is the devDependencies section of my package.json (TypeScript is there):

1&gt; ####@1.0.1 prepare /workspace
2&gt; npm run gcp-build
3
4
5&gt; ####@1.0.1 gcp-build /workspace
6&gt; tsc -p .
7
8sh: 1: tsc: not found
9npm ERR! code ELIFECYCLE
10npm ERR! syscall spawn
11npm ERR! file sh
12npm ERR! errno ENOENT
13npm ERR! ####@1.0.1 gcp-build: `tsc -p .`
14npm ERR! spawn ENOENT
15npm ERR!
16npm ERR! Failed at the ####@1.0.1 gcp-build script.
17npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
18    &quot;devDependencies&quot;: {
19        &quot;@google-cloud/nodejs-repo-tools&quot;: &quot;^3.3.0&quot;,
20        &quot;@types/cors&quot;: &quot;^2.8.12&quot;,
21        &quot;@types/express&quot;: &quot;^4.17.13&quot;,
22        &quot;@types/pg&quot;: &quot;^8.6.1&quot;,
23        &quot;@types/request&quot;: &quot;^2.48.7&quot;,
24        &quot;tsc-watch&quot;: &quot;^4.4.0&quot;,
25        &quot;tslint&quot;: &quot;^6.1.3&quot;,
26        &quot;typescript&quot;: &quot;^4.3.5&quot;
27    },
28

ANSWER

Answered 2022-Jan-16 at 14:32

I encountered the exact same problem and just put typescript in dependencies, not devDependencies.

It worked after that, but cannot assure that it is due to this change (since I have no proof of that).

Source https://stackoverflow.com/questions/69153522

QUESTION

Dataproc Java client throws NoSuchMethodError setUseJwtAccessWithScope

Asked 2022-Jan-14 at 19:24

I am following this article ,for submit a job to an existing Dataproc cluster via a Dataproc API

For the following line of code :

1 // Configure the settings for the job controller client.
2    JobControllerSettings jobControllerSettings =
3        JobControllerSettings.newBuilder().setEndpoint(myEndpoint).build();
4

I am getting the following errors :

1 // Configure the settings for the job controller client.
2    JobControllerSettings jobControllerSettings =
3        JobControllerSettings.newBuilder().setEndpoint(myEndpoint).build();
4SEVERE: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.NoSuchMethodError: 'com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean)'] with root cause
5java.lang.NoSuchMethodError: 'com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean)'
6

In my pom I used the following dependencies :

1 // Configure the settings for the job controller client.
2    JobControllerSettings jobControllerSettings =
3        JobControllerSettings.newBuilder().setEndpoint(myEndpoint).build();
4SEVERE: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.NoSuchMethodError: 'com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean)'] with root cause
5java.lang.NoSuchMethodError: 'com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean)'
6&lt;dependencyManagement&gt;
7        &lt;dependencies&gt;
8            &lt;dependency&gt;
9                &lt;groupId&gt;com.google.cloud&lt;/groupId&gt;
10                &lt;artifactId&gt;libraries-bom&lt;/artifactId&gt;
11                &lt;version&gt;24.1.2&lt;/version&gt;
12                &lt;type&gt;pom&lt;/type&gt;
13                &lt;scope&gt;import&lt;/scope&gt;
14            &lt;/dependency&gt;
15
16            &lt;dependency&gt;
17                &lt;groupId&gt;org.springframework.cloud&lt;/groupId&gt;
18                &lt;artifactId&gt;spring-cloud-gcp-dependencies&lt;/artifactId&gt;
19                &lt;version&gt;1.2.8.RELEASE&lt;/version&gt;
20                &lt;type&gt;pom&lt;/type&gt;
21                &lt;scope&gt;import&lt;/scope&gt;
22            &lt;/dependency&gt;
23
24        &lt;/dependencies&gt;
25    &lt;/dependencyManagement&gt;
26

And added the dataproc

1 // Configure the settings for the job controller client.
2    JobControllerSettings jobControllerSettings =
3        JobControllerSettings.newBuilder().setEndpoint(myEndpoint).build();
4SEVERE: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.NoSuchMethodError: 'com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean)'] with root cause
5java.lang.NoSuchMethodError: 'com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean)'
6&lt;dependencyManagement&gt;
7        &lt;dependencies&gt;
8            &lt;dependency&gt;
9                &lt;groupId&gt;com.google.cloud&lt;/groupId&gt;
10                &lt;artifactId&gt;libraries-bom&lt;/artifactId&gt;
11                &lt;version&gt;24.1.2&lt;/version&gt;
12                &lt;type&gt;pom&lt;/type&gt;
13                &lt;scope&gt;import&lt;/scope&gt;
14            &lt;/dependency&gt;
15
16            &lt;dependency&gt;
17                &lt;groupId&gt;org.springframework.cloud&lt;/groupId&gt;
18                &lt;artifactId&gt;spring-cloud-gcp-dependencies&lt;/artifactId&gt;
19                &lt;version&gt;1.2.8.RELEASE&lt;/version&gt;
20                &lt;type&gt;pom&lt;/type&gt;
21                &lt;scope&gt;import&lt;/scope&gt;
22            &lt;/dependency&gt;
23
24        &lt;/dependencies&gt;
25    &lt;/dependencyManagement&gt;
26&lt;dependency&gt;
27            &lt;groupId&gt;com.google.cloud&lt;/groupId&gt;
28            &lt;artifactId&gt;google-cloud-dataproc&lt;/artifactId&gt;
29        &lt;/dependency&gt;
30

Any help what I am missing here?

ANSWER

Answered 2022-Jan-14 at 19:22

The method com.google.api.gax.core.GoogleCredentialsProvider$Builder com.google.api.gax.core.GoogleCredentialsProvider$Builder.setUseJwtAccessWithScope(boolean) was introduced in com.google.api:gax in version 2.3.0.

Can you

  1. run mvn dependency:tree and confirm that your version of com.google.api:gax is above version 2.3.0?

  2. upgrade all Google libraries to the latest version?

Here is a similar issue found on the internet.

Source https://stackoverflow.com/questions/70640346

QUESTION

Apache Beam Cloud Dataflow Streaming Stuck Side Input

Asked 2022-Jan-12 at 13:12

I'm currently building PoC Apache Beam pipeline in GCP Dataflow. In this case, I want to create streaming pipeline with main input from PubSub and side input from BigQuery and store processed data back to BigQuery.

Side pipeline code

1side_pipeline = (
2    p
3    | &quot;periodic&quot; &gt;&gt; PeriodicImpulse(fire_interval=3600, apply_windowing=True)
4    | &quot;map to read request&quot; &gt;&gt;
5        beam.Map(lambda x:beam.io.gcp.bigquery.ReadFromBigQueryRequest(table=side_table))
6    | beam.io.ReadAllFromBigQuery()
7)
8

Function with side input code

1side_pipeline = (
2    p
3    | &quot;periodic&quot; &gt;&gt; PeriodicImpulse(fire_interval=3600, apply_windowing=True)
4    | &quot;map to read request&quot; &gt;&gt;
5        beam.Map(lambda x:beam.io.gcp.bigquery.ReadFromBigQueryRequest(table=side_table))
6    | beam.io.ReadAllFromBigQuery()
7)
8def enrich_payload(payload, equipments):
9    id = payload[&quot;id&quot;]
10    for equipment in equipments:
11        if id == equipment[&quot;id&quot;]:
12            payload[&quot;type&quot;] = equipment[&quot;type&quot;]
13            payload[&quot;brand&quot;] = equipment[&quot;brand&quot;]
14            payload[&quot;year&quot;] = equipment[&quot;year&quot;]
15
16            break
17
18    return payload
19

Main pipeline code

1side_pipeline = (
2    p
3    | &quot;periodic&quot; &gt;&gt; PeriodicImpulse(fire_interval=3600, apply_windowing=True)
4    | &quot;map to read request&quot; &gt;&gt;
5        beam.Map(lambda x:beam.io.gcp.bigquery.ReadFromBigQueryRequest(table=side_table))
6    | beam.io.ReadAllFromBigQuery()
7)
8def enrich_payload(payload, equipments):
9    id = payload[&quot;id&quot;]
10    for equipment in equipments:
11        if id == equipment[&quot;id&quot;]:
12            payload[&quot;type&quot;] = equipment[&quot;type&quot;]
13            payload[&quot;brand&quot;] = equipment[&quot;brand&quot;]
14            payload[&quot;year&quot;] = equipment[&quot;year&quot;]
15
16            break
17
18    return payload
19main_pipeline = (
20    p
21    | &quot;read&quot; &gt;&gt; beam.io.ReadFromPubSub(topic=&quot;projects/my-project/topics/topiq&quot;)
22    | &quot;bytes to dict&quot; &gt;&gt; beam.Map(lambda x: json.loads(x.decode(&quot;utf-8&quot;)))
23    | &quot;transform&quot; &gt;&gt; beam.Map(transform_function)
24    | &quot;timestamping&quot; &gt;&gt; beam.Map(lambda src: window.TimestampedValue(
25        src,
26        dt.datetime.fromisoformat(src[&quot;timestamp&quot;]).timestamp()
27    ))
28    | &quot;windowing&quot; &gt;&gt; beam.WindowInto(window.FixedWindows(30))
29)
30
31final_pipeline = (
32    main_pipeline
33    | &quot;enrich data&quot; &gt;&gt; beam.Map(enrich_payload, equipments=beam.pvalue.AsIter(side_pipeline))
34    | &quot;store&quot; &gt;&gt; beam.io.WriteToBigQuery(bq_table)
35)
36
37result = p.run()
38result.wait_until_finish()
39

After deploy it to Dataflow, everything looks fine and no error. But then I noticed that enrich data step has two nodes instead of one.

Dataflow Graph

And also, the side input stuck as you can see it has Elements Added with 21 counts in Input Collections and - value in Elements Added in Output Collections. Enrich data stuck

You can find the full pipeline code here and mock pubsub publisher here

I already follow all instruction in these documentations:

Yet still found this error. Please help me. Thanks!

ANSWER

Answered 2022-Jan-12 at 13:12

Here you have a working example:

1side_pipeline = (
2    p
3    | &quot;periodic&quot; &gt;&gt; PeriodicImpulse(fire_interval=3600, apply_windowing=True)
4    | &quot;map to read request&quot; &gt;&gt;
5        beam.Map(lambda x:beam.io.gcp.bigquery.ReadFromBigQueryRequest(table=side_table))
6    | beam.io.ReadAllFromBigQuery()
7)
8def enrich_payload(payload, equipments):
9    id = payload[&quot;id&quot;]
10    for equipment in equipments:
11        if id == equipment[&quot;id&quot;]:
12            payload[&quot;type&quot;] = equipment[&quot;type&quot;]
13            payload[&quot;brand&quot;] = equipment[&quot;brand&quot;]
14            payload[&quot;year&quot;] = equipment[&quot;year&quot;]
15
16            break
17
18    return payload
19main_pipeline = (
20    p
21    | &quot;read&quot; &gt;&gt; beam.io.ReadFromPubSub(topic=&quot;projects/my-project/topics/topiq&quot;)
22    | &quot;bytes to dict&quot; &gt;&gt; beam.Map(lambda x: json.loads(x.decode(&quot;utf-8&quot;)))
23    | &quot;transform&quot; &gt;&gt; beam.Map(transform_function)
24    | &quot;timestamping&quot; &gt;&gt; beam.Map(lambda src: window.TimestampedValue(
25        src,
26        dt.datetime.fromisoformat(src[&quot;timestamp&quot;]).timestamp()
27    ))
28    | &quot;windowing&quot; &gt;&gt; beam.WindowInto(window.FixedWindows(30))
29)
30
31final_pipeline = (
32    main_pipeline
33    | &quot;enrich data&quot; &gt;&gt; beam.Map(enrich_payload, equipments=beam.pvalue.AsIter(side_pipeline))
34    | &quot;store&quot; &gt;&gt; beam.io.WriteToBigQuery(bq_table)
35)
36
37result = p.run()
38result.wait_until_finish()
39mytopic = &quot;&quot;
40sql = &quot;SELECT station_id, CURRENT_TIMESTAMP() timestamp FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations` LIMIT 10&quot;
41
42def to_bqrequest(e, sql):
43    from apache_beam.io import ReadFromBigQueryRequest
44    yield ReadFromBigQueryRequest(query=sql)
45     
46
47def merge(e, side):
48    for i in side:
49        yield f&quot;Main {e.decode('utf-8')} Side {i}&quot;
50
51pubsub = p | &quot;Read PubSub topic&quot; &gt;&gt; ReadFromPubSub(topic=mytopic)
52
53side_pcol = (p | PeriodicImpulse(fire_interval=300, apply_windowing=False)
54               | &quot;ApplyGlobalWindow&quot; &gt;&gt; WindowInto(window.GlobalWindows(),
55                                           trigger=trigger.Repeatedly(trigger.AfterProcessingTime(5)),
56                                           accumulation_mode=trigger.AccumulationMode.DISCARDING)
57               | &quot;To BQ Request&quot; &gt;&gt; ParDo(to_bqrequest, sql=sql)
58               | ReadAllFromBigQuery()
59            )
60
61final = (pubsub | &quot;Merge&quot; &gt;&gt; ParDo(merge, side=beam.pvalue.AsList(side_pcol))
62                | Map(logging.info)
63        )                    
64    
65p.run()
66

Note this uses a GlobalWindow (so that both inputs have the same window). I used a processing time trigger so that the pane contains multiple rows. 5 was chosen arbitrarily, using 1 would work too.

Please note matching the data between side and main inputs is non deterministic, and you may see fluctuating values from older fired panes.

In theory, using FixedWindows should fix this, but I cannot get the FixedWindows to work.

Source https://stackoverflow.com/questions/70561769

QUESTION

BIG Query command using BAT file

Asked 2022-Jan-09 at 15:24
1echo Give yearmonth &quot;yyyyMM&quot;
2setlocal enabledelayedexpansion
3SET /p yearmonth= 
4SET ClientName[0]=abc
5SET ClientName[1]=def
6
7SET i = 0
8
9:myLoop
10if defined ClientName[%i%] (
11    call bq query --use_legacy_sql=false &quot;CREATE EXTERNAL TABLE `test.!ClientName[%%i]!.%yearmonth%` OPTIONS (format = 'CSV',skip_leading_rows = 1 uris = ['gs://test/!ClientName[%%i]!/AWS/%yearmonth%/Metrics/data/*.csv'])&quot;
12    set /a &quot;i+=1&quot;
13    GOTO :myLoop
14
15)
16

Hi, I am trying to create a batch so that i can run Multiple BIG QUERY at once. Above i tried to write a batch script putting command in a loop .

I am trying to create a table by using yearmonth as user input and then create array to create a table with different client name .

  1. But I am unable to print if i =0 ClientName[i] = abc in a call query i am using !ClientName[%%i]! to print but its not working.

  2. Call query inside loop is not running in GCP console, when i executed the bat file .

Can you please help me resolve this

ANSWER

Answered 2022-Jan-09 at 11:04
  1. It is bad practice to set variables as standalone alphabetical characters like i. One reason is exactly as you have experienced, you have confused for metavariable %%i with a set variable %i%.

  2. You are expanding in the loop, but have not enabledelayedexpansion so there are 2 ways, which we will get to in a second.

  3. setting variables should not have spaces before or after = excluding the likes of set /a

So, Method 1, without delayedexpansion (note how the variables are used with double %% in the loop with the call command).

1echo Give yearmonth &quot;yyyyMM&quot;
2setlocal enabledelayedexpansion
3SET /p yearmonth= 
4SET ClientName[0]=abc
5SET ClientName[1]=def
6
7SET i = 0
8
9:myLoop
10if defined ClientName[%i%] (
11    call bq query --use_legacy_sql=false &quot;CREATE EXTERNAL TABLE `test.!ClientName[%%i]!.%yearmonth%` OPTIONS (format = 'CSV',skip_leading_rows = 1 uris = ['gs://test/!ClientName[%%i]!/AWS/%yearmonth%/Metrics/data/*.csv'])&quot;
12    set /a &quot;i+=1&quot;
13    GOTO :myLoop
14
15)
16@echo off
17echo Give yearmonth &quot;yyyyMM&quot;
18SET /p yearmonth= 
19SET ClientName[0]=abc
20SET ClientName[1]=def
21
22SET num=0
23
24:myLoop
25if defined ClientName[%num%] (
26    call bq query --use_legacy_sql=false &quot;CREATE EXTERNAL TABLE `test.%%ClientName[%num%]%%.%yearmonth%` OPTIONS (format = 'CSV',skip_leading_rows = 1 uris = ['gs://test/%%ClientName[%num%]%%/AWS/%yearmonth%/Metrics/data/*.csv'])&quot;
27    set /a num+=1
28    GOTO :myLoop
29
30)
31

Method 2: (better method using delayedexpansion)

1echo Give yearmonth &quot;yyyyMM&quot;
2setlocal enabledelayedexpansion
3SET /p yearmonth= 
4SET ClientName[0]=abc
5SET ClientName[1]=def
6
7SET i = 0
8
9:myLoop
10if defined ClientName[%i%] (
11    call bq query --use_legacy_sql=false &quot;CREATE EXTERNAL TABLE `test.!ClientName[%%i]!.%yearmonth%` OPTIONS (format = 'CSV',skip_leading_rows = 1 uris = ['gs://test/!ClientName[%%i]!/AWS/%yearmonth%/Metrics/data/*.csv'])&quot;
12    set /a &quot;i+=1&quot;
13    GOTO :myLoop
14
15)
16@echo off
17echo Give yearmonth &quot;yyyyMM&quot;
18SET /p yearmonth= 
19SET ClientName[0]=abc
20SET ClientName[1]=def
21
22SET num=0
23
24:myLoop
25if defined ClientName[%num%] (
26    call bq query --use_legacy_sql=false &quot;CREATE EXTERNAL TABLE `test.%%ClientName[%num%]%%.%yearmonth%` OPTIONS (format = 'CSV',skip_leading_rows = 1 uris = ['gs://test/%%ClientName[%num%]%%/AWS/%yearmonth%/Metrics/data/*.csv'])&quot;
27    set /a num+=1
28    GOTO :myLoop
29
30)
31@echo off
32setlocal enabledelayedexpansion
33echo Give yearmonth &quot;yyyyMM&quot;
34SET /p yearmonth= 
35SET ClientName[0]=abc
36SET ClientName[1]=def
37
38SET num=0
39
40:myLoop
41if defined ClientName[%num%] (
42    call bq query --use_legacy_sql=false &quot;CREATE EXTERNAL TABLE `test.!ClientName[%num%]!.%yearmonth%` OPTIONS (format = 'CSV',skip_leading_rows = 1 uris = ['gs://test/!ClientName[%num%]!/AWS/%yearmonth%/Metrics/data/*.csv'])&quot;
43    set /a num+=1
44    GOTO :myLoop
45
46)
47

Source https://stackoverflow.com/questions/70640411

QUESTION

Vertex AI Model Batch prediction, issue with referencing existing model and input file on Cloud Storage

Asked 2021-Dec-21 at 14:35

I'm struggling to correctly set Vertex AI pipeline which does the following:

  1. read data from API and store to GCS and as as input for batch prediction.
  2. get an existing model (Video classification on Vertex AI)
  3. create Batch prediction job with input from point 1.
    As it will be seen, I don't have much experience with Vertex Pipelines/Kubeflow thus I'm asking for help/advice, hope it's just some beginner mistake. this is the gist of the code I'm using as pipeline
1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72

When running the pipeline it throws this exception when running Batch prediction:
details = "List of found errors: 1.Field: batch_prediction_job.model; Message: Invalid Model resource name. so I'm not sure what could be wrong. I tried to load model in the notebook (outside of component) and it correctly returns.

Second issue I'm having is referencing GCS URI as output from component to batch job input.

1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72   input_data = get_input_data2()
73   gcc_aip.ModelBatchPredictOp(
74        project=PROJECT_ID,
75        job_display_name=f'test-prediction',
76        model=ml_model.output,
77        gcs_source_uris=[input_data.output],  # this doesn't work
78        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
79        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
80    )
81

During compilation, I get following exception TypeError: Object of type PipelineParam is not JSON serializable, though I think this could be issue of ModelBatchPredictOp component.

Again any help/advice appreciated, I'm dealing with this from yesterday, so maybe I missed something obvious.

libraries I'm using:

1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72   input_data = get_input_data2()
73   gcc_aip.ModelBatchPredictOp(
74        project=PROJECT_ID,
75        job_display_name=f'test-prediction',
76        model=ml_model.output,
77        gcs_source_uris=[input_data.output],  # this doesn't work
78        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
79        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
80    )
81google-cloud-aiplatform==1.8.0  
82google-cloud-pipeline-components==0.2.0  
83kfp==1.8.10  
84kfp-pipeline-spec==0.1.13  
85kfp-server-api==1.7.1
86

UPDATE After comments, some research and tuning, for referencing model this works:

1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72   input_data = get_input_data2()
73   gcc_aip.ModelBatchPredictOp(
74        project=PROJECT_ID,
75        job_display_name=f'test-prediction',
76        model=ml_model.output,
77        gcs_source_uris=[input_data.output],  # this doesn't work
78        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
79        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
80    )
81google-cloud-aiplatform==1.8.0  
82google-cloud-pipeline-components==0.2.0  
83kfp==1.8.10  
84kfp-pipeline-spec==0.1.13  
85kfp-server-api==1.7.1
86@component
87def load_ml_model(project_id: str, model: Output[Artifact]):
88    region = 'us-central1'
89    model_id = '1234'
90    model_uid = f'projects/{project_id}/locations/{region}/models/{model_id}'
91    model.uri = model_uid
92    model.metadata['resourceName'] = model_uid
93

and then I can use it as intended:

1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72   input_data = get_input_data2()
73   gcc_aip.ModelBatchPredictOp(
74        project=PROJECT_ID,
75        job_display_name=f'test-prediction',
76        model=ml_model.output,
77        gcs_source_uris=[input_data.output],  # this doesn't work
78        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
79        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
80    )
81google-cloud-aiplatform==1.8.0  
82google-cloud-pipeline-components==0.2.0  
83kfp==1.8.10  
84kfp-pipeline-spec==0.1.13  
85kfp-server-api==1.7.1
86@component
87def load_ml_model(project_id: str, model: Output[Artifact]):
88    region = 'us-central1'
89    model_id = '1234'
90    model_uid = f'projects/{project_id}/locations/{region}/models/{model_id}'
91    model.uri = model_uid
92    model.metadata['resourceName'] = model_uid
93batch_predict_op = gcc_aip.ModelBatchPredictOp(
94        project=gcp_project,
95        job_display_name=f'batch-prediction-test',
96        model=ml_model.outputs['model'],
97        gcs_source_uris=[input_batch_gcs_path],
98gcs_destination_output_uri_prefix=f'gs://{BUCKET_NAME}/prediction_output/test'
99    )
100

UPDATE 2 regarding GCS path, a workaround is to define path outside of the component and pass it as an input parameter, for example (abbreviated):

1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72   input_data = get_input_data2()
73   gcc_aip.ModelBatchPredictOp(
74        project=PROJECT_ID,
75        job_display_name=f'test-prediction',
76        model=ml_model.output,
77        gcs_source_uris=[input_data.output],  # this doesn't work
78        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
79        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
80    )
81google-cloud-aiplatform==1.8.0  
82google-cloud-pipeline-components==0.2.0  
83kfp==1.8.10  
84kfp-pipeline-spec==0.1.13  
85kfp-server-api==1.7.1
86@component
87def load_ml_model(project_id: str, model: Output[Artifact]):
88    region = 'us-central1'
89    model_id = '1234'
90    model_uid = f'projects/{project_id}/locations/{region}/models/{model_id}'
91    model.uri = model_uid
92    model.metadata['resourceName'] = model_uid
93batch_predict_op = gcc_aip.ModelBatchPredictOp(
94        project=gcp_project,
95        job_display_name=f'batch-prediction-test',
96        model=ml_model.outputs['model'],
97        gcs_source_uris=[input_batch_gcs_path],
98gcs_destination_output_uri_prefix=f'gs://{BUCKET_NAME}/prediction_output/test'
99    )
100@dsl.pipeline(
101    name=&quot;my-pipeline&quot;,
102    pipeline_root=PIPELINE_ROOT,
103)
104def pipeline(
105        gcp_project: str,
106        region: str,
107        bucket: str
108):
109    ts = datetime.datetime.now().strftime(&quot;%Y%m%d-%H%M%S&quot;)
110    
111    gcs_prediction_input_path = f'gs://{BUCKET_NAME}/prediction_input/video_batch_prediction_input_{ts}.jsonl'
112    batch_input_data_op = get_input_data(gcs_prediction_input_path)  # this loads input data to GCS path
113
114    batch_predict_op = gcc_aip.ModelBatchPredictOp(
115        project=gcp_project,
116        model=training_job_run_op.outputs[&quot;model&quot;],
117        job_display_name='batch-prediction',
118        # gcs_source_uris=[batch_input_data_op.output],
119        gcs_source_uris=[gcs_prediction_input_path],
120        gcs_destination_output_uri_prefix=f'gs://{BUCKET_NAME}/prediction_output/',
121    ).after(batch_input_data_op)  # we need to add 'after' so it runs after input data is prepared since get_input_data doesn't returns anything
122
123

still not sure, why it doesn't work/compile when I return GCS path from get_input_data component

ANSWER

Answered 2021-Dec-21 at 14:35

I'm glad you solved most of your main issues and found a workaround for model declaration.

For your input.output observation on gcs_source_uris, the reason behind it is because the way the function/class returns the value. If you dig inside the class/methods of google_cloud_pipeline_components you will find that it implements a structure that will allow you to use .outputs from the returned value of the function called.

If you go to the implementation of one of the components of the pipeline you will find that it returns an output array from convert_method_to_component function. So, in order to have that implemented in your custom class/function your function should return a value which can be called as an attribute. Below is a basic implementation of it.

1from google_cloud_pipeline_components import aiplatform as gcc_aip
2from kfp.v2 import dsl
3
4from kfp.v2.dsl import component
5from kfp.v2.dsl import (
6    Output,
7    Artifact,
8    Model,
9)
10
11PROJECT_ID = 'my-gcp-project'
12BUCKET_NAME = &quot;mybucket&quot;
13PIPELINE_ROOT = &quot;{}/pipeline_root&quot;.format(BUCKET_NAME)
14
15
16@component
17def get_input_data() -&gt; str:
18    # getting data from API, save to Cloud Storage
19    # return GS URI
20    gcs_batch_input_path = 'gs://somebucket/file'
21    return gcs_batch_input_path
22
23
24@component(
25    base_image=&quot;python:3.9&quot;,
26    packages_to_install=['google-cloud-aiplatform==1.8.0']
27)
28def load_ml_model(project_id: str, model: Output[Artifact]):
29    &quot;&quot;&quot;Load existing Vertex model&quot;&quot;&quot;
30    import google.cloud.aiplatform as aip
31
32    model_id = '1234'
33    model = aip.Model(model_name=model_id, project=project_id, location='us-central1')
34
35
36
37@dsl.pipeline(
38    name=&quot;batch-pipeline&quot;, pipeline_root=PIPELINE_ROOT,
39)
40def pipeline(gcp_project: str):
41    input_data = get_input_data()
42    ml_model = load_ml_model(gcp_project)
43
44    gcc_aip.ModelBatchPredictOp(
45        project=PROJECT_ID,
46        job_display_name=f'test-prediction',
47        model=ml_model.output,
48        gcs_source_uris=[input_data.output],  # this doesn't work
49        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
50        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
51    )
52
53
54if __name__ == '__main__':
55    from kfp.v2 import compiler
56    import google.cloud.aiplatform as aip
57    pipeline_export_filepath = 'test-pipeline.json'
58    compiler.Compiler().compile(pipeline_func=pipeline,
59                                package_path=pipeline_export_filepath)
60    # pipeline_params = {
61    #     'gcp_project': PROJECT_ID,
62    # }
63    # job = aip.PipelineJob(
64    #     display_name='test-pipeline',
65    #     template_path=pipeline_export_filepath,
66    #     pipeline_root=f'gs://{PIPELINE_ROOT}',
67    #     project=PROJECT_ID,
68    #     parameter_values=pipeline_params,
69    # )
70
71    # job.run()
72   input_data = get_input_data2()
73   gcc_aip.ModelBatchPredictOp(
74        project=PROJECT_ID,
75        job_display_name=f'test-prediction',
76        model=ml_model.output,
77        gcs_source_uris=[input_data.output],  # this doesn't work
78        # gcs_source_uris=['gs://mybucket/output/'],  # hardcoded gs uri works
79        gcs_destination_output_uri_prefix=f'gs://{PIPELINE_ROOT}/prediction_output/'
80    )
81google-cloud-aiplatform==1.8.0  
82google-cloud-pipeline-components==0.2.0  
83kfp==1.8.10  
84kfp-pipeline-spec==0.1.13  
85kfp-server-api==1.7.1
86@component
87def load_ml_model(project_id: str, model: Output[Artifact]):
88    region = 'us-central1'
89    model_id = '1234'
90    model_uid = f'projects/{project_id}/locations/{region}/models/{model_id}'
91    model.uri = model_uid
92    model.metadata['resourceName'] = model_uid
93batch_predict_op = gcc_aip.ModelBatchPredictOp(
94        project=gcp_project,
95        job_display_name=f'batch-prediction-test',
96        model=ml_model.outputs['model'],
97        gcs_source_uris=[input_batch_gcs_path],
98gcs_destination_output_uri_prefix=f'gs://{BUCKET_NAME}/prediction_output/test'
99    )
100@dsl.pipeline(
101    name=&quot;my-pipeline&quot;,
102    pipeline_root=PIPELINE_ROOT,
103)
104def pipeline(
105        gcp_project: str,
106        region: str,
107        bucket: str
108):
109    ts = datetime.datetime.now().strftime(&quot;%Y%m%d-%H%M%S&quot;)
110    
111    gcs_prediction_input_path = f'gs://{BUCKET_NAME}/prediction_input/video_batch_prediction_input_{ts}.jsonl'
112    batch_input_data_op = get_input_data(gcs_prediction_input_path)  # this loads input data to GCS path
113
114    batch_predict_op = gcc_aip.ModelBatchPredictOp(
115        project=gcp_project,
116        model=training_job_run_op.outputs[&quot;model&quot;],
117        job_display_name='batch-prediction',
118        # gcs_source_uris=[batch_input_data_op.output],
119        gcs_source_uris=[gcs_prediction_input_path],
120        gcs_destination_output_uri_prefix=f'gs://{BUCKET_NAME}/prediction_output/',
121    ).after(batch_input_data_op)  # we need to add 'after' so it runs after input data is prepared since get_input_data doesn't returns anything
122
123class CustomClass():
124     def __init__(self):
125       self.return_val = {'path':'custompath','desc':'a desc'}
126      
127     @property
128     def output(self):
129       return self.return_val 
130
131hello = CustomClass()
132print(hello.output['path'])
133

If you want to dig more about it you can go to the following pages:

Source https://stackoverflow.com/questions/70356856

Community Discussions contain sources that include Stack Exchange Network

Tutorials and Learning Resources in GCP

Tutorials and Learning Resources are not available at this moment for GCP

Share this Page

share link

Get latest updates on GCP