Support
Quality
Security
License
Reuse
kandi has reviewed solutions-automated-file-loader-for-bigquery and discovered the below as its top functions. This is intended to give you an instant insight into solutions-automated-file-loader-for-bigquery implemented functionality, and help decide if they suit your requirements.
Get all kandi verified functions for this library.
Get all kandi verified functions for this library.
The Automated File Loader to BigQuery solution demonstrates the use of Object Change Notification service on Google Cloud Storage. It shows how one can easily build a simple App Engine application to automatically pick up new data in Cloud Storage based on business logic, and load it directly into Google BigQuery.
QUESTION
Submit command line arguments to a pyspark job on airflow
Asked 2022-Mar-29 at 10:37I have a pyspark job available on GCP Dataproc to be triggered on airflow as shown below:
config = help.loadJSON("batch/config_file")
MY_PYSPARK_JOB = {
"reference": {"project_id": "my_project_id"},
"placement": {"cluster_name": "my_cluster_name"},
"pyspark_job": {
"main_python_file_uri": "gs://file/loc/my_spark_file.py"]
"properties": config["spark_properties"]
"args": <TO_BE_ADDED>
},
}
I need to supply command line arguments to this pyspark job as show below [this is how I am running my pyspark job from command line]:
spark-submit gs://file/loc/my_spark_file.py --arg1 val1 --arg2 val2
I am providing the arguments to my pyspark job using "configparser". Therefore, arg1 is the key and val1 is the value from my spark-submit commant above.
How do I define the "args" param in the "MY_PYSPARK_JOB" defined above [equivalent to my command line arguments]?
ANSWER
Answered 2022-Mar-28 at 08:18You have to pass a Sequence[str]
. If you check DataprocSubmitJobOperator you will see that the params job
implements a class google.cloud.dataproc_v1.types.Job.
class DataprocSubmitJobOperator(BaseOperator):
...
:param job: Required. The job resource. If a dict is provided, it must be of the same form as the protobuf message.
:class:`~google.cloud.dataproc_v1.types.Job`
So, on the section about job type pySpark
which is google.cloud.dataproc_v1.types.PySparkJob:
args Sequence[str] Optional. The arguments to pass to the driver. Do not include arguments, such as
--conf
, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Find more information at:
Save this library and start creating your kit
See Similar Libraries in
Save this library and start creating your kit
Open Weaver – Develop Applications Faster with Open Source