kandi background
kandi background
Explore Kits
kandi background
Explore Kits
Explore all GCP open source software, libraries, packages, source code, cloud functions and APIs.

Popular New Releases in GCP

v0.3.6

v0.24.0

v0.9.22

compute: v1.6.1

0.11.5

microservices-demo

v0.3.6

go-cloud

v0.24.0

infracost

v0.9.22

google-cloud-go

compute: v1.6.1

scio

0.11.5

Popular Libraries in GCP

microservices-demo

by GoogleCloudPlatform python

star image 12028 Apache-2.0

Sample cloud-native application with 10 microservices showcasing Kubernetes, Istio, gRPC and OpenCensus.

awesome-kubernetes

by ramitsurana shell

star image 11943 NOASSERTION

A curated list for awesome kubernetes sources :ship::tada:

go-cloud

by google go

star image 8273 Apache-2.0

The Go Cloud Development Kit (Go CDK): A library and tools for open cloud development in Go.

infracost

by infracost go

star image 6374 Apache-2.0

Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!

python-docs-samples

by GoogleCloudPlatform python

star image 5530 Apache-2.0

Code samples used on cloud.google.com

golang-samples

by GoogleCloudPlatform go

star image 3325 Apache-2.0

Sample apps and code written for Google Cloud in the Go programming language.

google-cloud-go

by googleapis go

star image 2827 Apache-2.0

Google Cloud Client Libraries for Go.

nodejs-docs-samples

by GoogleCloudPlatform javascript

star image 2379 Apache-2.0

Node.js samples for Google Cloud Platform products.

google-cloud-node

by googleapis javascript

star image 2364 Apache-2.0

Google Cloud Client Library for Node.js

microservices-demo

by GoogleCloudPlatform python

star image 12028 Apache-2.0

Sample cloud-native application with 10 microservices showcasing Kubernetes, Istio, gRPC and OpenCensus.

awesome-kubernetes

by ramitsurana shell

star image 11943 NOASSERTION

A curated list for awesome kubernetes sources :ship::tada:

go-cloud

by google go

star image 8273 Apache-2.0

The Go Cloud Development Kit (Go CDK): A library and tools for open cloud development in Go.

infracost

by infracost go

star image 6374 Apache-2.0

Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!

python-docs-samples

by GoogleCloudPlatform python

star image 5530 Apache-2.0

Code samples used on cloud.google.com

golang-samples

by GoogleCloudPlatform go

star image 3325 Apache-2.0

Sample apps and code written for Google Cloud in the Go programming language.

google-cloud-go

by googleapis go

star image 2827 Apache-2.0

Google Cloud Client Libraries for Go.

nodejs-docs-samples

by GoogleCloudPlatform javascript

star image 2379 Apache-2.0

Node.js samples for Google Cloud Platform products.

google-cloud-node

by googleapis javascript

star image 2364 Apache-2.0

Google Cloud Client Library for Node.js

Trending New libraries in GCP

infracost

by infracost go

star image 6374 Apache-2.0

Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!

serverless-vault-with-cloud-run

by kelseyhightower shell

star image 273 Apache-2.0

Guide to running Vault on Cloud Run

Firebase-ESP-Client

by mobizt c++

star image 215 MIT

🔥Firebase Arduino Client Library for ESP8266 and ESP32. The complete, fast, secured and reliable Firebase Arduino client library that supports RTDB, Cloud Firestore, Firebase and Google Cloud Storage, Cloud Messaging and Cloud Functions for Firebase.

collie-cli

by meshcloud typescript

star image 131 Apache-2.0

Collie CLI allows you to manage your AWS, Azure & GCP cloud landscape through a single view.

vscode-langservers-extracted

by hrsh7th javascript

star image 117 MIT

vscode-langservers bin collection.

healthcare-data-protection-suite

by GoogleCloudPlatform go

star image 111 Apache-2.0

Deploy, monitor & audit on GCP simplified

cloudprober

by cloudprober go

star image 111 Apache-2.0

An active monitoring software to detect failures before your customers do.

secrets-store-csi-driver-provider-gcp

by GoogleCloudPlatform go

star image 107 Apache-2.0

Google Secret Manager provider for the Secret Store CSI Driver.

SQL-scripts

by HariSekhon shell

star image 97 NOASSERTION

100+ SQL Scripts - PostgreSQL, MySQL, Google BigQuery, MariaDB, AWS Athena. DevOps / DBA / Analytics / performance engineering. Google BigQuery ML machine learning classification.

infracost

by infracost go

star image 6374 Apache-2.0

Cloud cost estimates for Terraform in pull requests💰📉 Love your cloud bill!

serverless-vault-with-cloud-run

by kelseyhightower shell

star image 273 Apache-2.0

Guide to running Vault on Cloud Run

Firebase-ESP-Client

by mobizt c++

star image 215 MIT

🔥Firebase Arduino Client Library for ESP8266 and ESP32. The complete, fast, secured and reliable Firebase Arduino client library that supports RTDB, Cloud Firestore, Firebase and Google Cloud Storage, Cloud Messaging and Cloud Functions for Firebase.

collie-cli

by meshcloud typescript

star image 131 Apache-2.0

Collie CLI allows you to manage your AWS, Azure & GCP cloud landscape through a single view.

vscode-langservers-extracted

by hrsh7th javascript

star image 117 MIT

vscode-langservers bin collection.

healthcare-data-protection-suite

by GoogleCloudPlatform go

star image 111 Apache-2.0

Deploy, monitor & audit on GCP simplified

cloudprober

by cloudprober go

star image 111 Apache-2.0

An active monitoring software to detect failures before your customers do.

secrets-store-csi-driver-provider-gcp

by GoogleCloudPlatform go

star image 107 Apache-2.0

Google Secret Manager provider for the Secret Store CSI Driver.

SQL-scripts

by HariSekhon shell

star image 97 NOASSERTION

100+ SQL Scripts - PostgreSQL, MySQL, Google BigQuery, MariaDB, AWS Athena. DevOps / DBA / Analytics / performance engineering. Google BigQuery ML machine learning classification.

Top Authors in GCP

1

108 Libraries

37610

2

23 Libraries

9097

3

13 Libraries

186

4

13 Libraries

10075

5

12 Libraries

558

6

9 Libraries

1575

7

8 Libraries

80

8

7 Libraries

743

9

7 Libraries

2673

10

6 Libraries

41

1

108 Libraries

37610

2

23 Libraries

9097

3

13 Libraries

186

4

13 Libraries

10075

5

12 Libraries

558

6

9 Libraries

1575

7

8 Libraries

80

8

7 Libraries

743

9

7 Libraries

2673

10

6 Libraries

41

Trending Kits in GCP

No Trending Kits are available at this moment for GCP

Trending Discussions on GCP

    Submit command line arguments to a pyspark job on airflow
    Skip first line in import statement using gc.open_by_url from gspread (i.e. add header=0)
    Automatically Grab Latest Google Cloud Platform Secret Version
    Programmatically Connecting a GitHub repo to a Google Cloud Project
    Unable to create a new Cloud Function - cloud-client-api-gae
    TypeScript project failing to deploy to App Engine targeting Node 12 or 14, but works with Node 10
    Dataproc Java client throws NoSuchMethodError setUseJwtAccessWithScope
    Apache Beam Cloud Dataflow Streaming Stuck Side Input
    BIG Query command using BAT file
    Vertex AI Model Batch prediction, issue with referencing existing model and input file on Cloud Storage

QUESTION

Submit command line arguments to a pyspark job on airflow

Asked 2022-Mar-29 at 10:37

I have a pyspark job available on GCP Dataproc to be triggered on airflow as shown below:

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13

I need to supply command line arguments to this pyspark job as show below [this is how I am running my pyspark job from command line]:

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
14

I am providing the arguments to my pyspark job using "configparser". Therefore, arg1 is the key and val1 is the value from my spark-submit commant above.

How do I define the "args" param in the "MY_PYSPARK_JOB" defined above [equivalent to my command line arguments]?

ANSWER

Answered 2022-Mar-28 at 08:18

You have to pass a Sequence[str]. If you check DataprocSubmitJobOperator you will see that the params job implements a class google.cloud.dataproc_v1.types.Job.

copy icondownload icon

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
14class DataprocSubmitJobOperator(BaseOperator):
15...
16    :param job: Required. The job resource. If a dict is provided, it must be of the same form as the protobuf message.
17    :class:`~google.cloud.dataproc_v1.types.Job` 
18

So, on the section about job type pySpark which is google.cloud.dataproc_v1.types.PySparkJob:

args Sequence[str] Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

Source https://stackoverflow.com/questions/71616491

Community Discussions contain sources that include Stack Exchange Network

    Submit command line arguments to a pyspark job on airflow
    Skip first line in import statement using gc.open_by_url from gspread (i.e. add header=0)
    Automatically Grab Latest Google Cloud Platform Secret Version
    Programmatically Connecting a GitHub repo to a Google Cloud Project
    Unable to create a new Cloud Function - cloud-client-api-gae
    TypeScript project failing to deploy to App Engine targeting Node 12 or 14, but works with Node 10
    Dataproc Java client throws NoSuchMethodError setUseJwtAccessWithScope
    Apache Beam Cloud Dataflow Streaming Stuck Side Input
    BIG Query command using BAT file
    Vertex AI Model Batch prediction, issue with referencing existing model and input file on Cloud Storage

QUESTION

Submit command line arguments to a pyspark job on airflow

Asked 2022-Mar-29 at 10:37

I have a pyspark job available on GCP Dataproc to be triggered on airflow as shown below:

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13

I need to supply command line arguments to this pyspark job as show below [this is how I am running my pyspark job from command line]:

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
14

I am providing the arguments to my pyspark job using "configparser". Therefore, arg1 is the key and val1 is the value from my spark-submit commant above.

How do I define the "args" param in the "MY_PYSPARK_JOB" defined above [equivalent to my command line arguments]?

ANSWER

Answered 2022-Mar-28 at 08:18

You have to pass a Sequence[str]. If you check DataprocSubmitJobOperator you will see that the params job implements a class google.cloud.dataproc_v1.types.Job.

copy icondownload icon

1config = help.loadJSON("batch/config_file")
2
3MY_PYSPARK_JOB = {
4    "reference": {"project_id": "my_project_id"},
5    "placement": {"cluster_name": "my_cluster_name"},
6    "pyspark_job": {
7        "main_python_file_uri": "gs://file/loc/my_spark_file.py"]
8        "properties": config["spark_properties"]
9        "args": <TO_BE_ADDED>
10    },
11}
12
13spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
14class DataprocSubmitJobOperator(BaseOperator):
15...
16    :param job: Required. The job resource. If a dict is provided, it must be of the same form as the protobuf message.
17    :class:`~google.cloud.dataproc_v1.types.Job` 
18

So, on the section about job type pySpark which is google.cloud.dataproc_v1.types.PySparkJob:

args Sequence[str] Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

Source https://stackoverflow.com/questions/71616491

Community Discussions contain sources that include Stack Exchange Network

Tutorials and Learning Resources in GCP

Tutorials and Learning Resources are not available at this moment for GCP

Share this Page

share link