airflow-dbt | Apache Airflow integration for dbt | BPM library
kandi X-RAY | airflow-dbt Summary
kandi X-RAY | airflow-dbt Summary
Apache Airflow integration for dbt
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Runs the script
- Run the dbt command
- Create the DbtHook instance
- Dump the variables in JSON format
- Builds the distribution
- Print a status message
airflow-dbt Key Features
airflow-dbt Examples and Code Snippets
Community Discussions
Trending Discussions on airflow-dbt
QUESTION
I have a dbt project running on Cloud Composer and all my models and snapshots are running sucessfully.
I'm having trouble generating the documentation once all the processing is finished.
The integration between dbt and cloud composer is done via airflow-dbt and I have setup a task for the DbtDocsGenerateOperator.
The DAG actually runs fine, and I can see in the log that the catalog.json
file is being written to the target folder in the correspondent cloud bucket, but the file is not there.
Doing some investigation on the GCP logging, I've notice that there's a process called gcs-syncd
that is apparently removing the file.
Wondering if anyone has had success in this integration before and was able to generate the dbt docs from cloud composer?
...ANSWER
Answered 2022-Jan-03 at 00:50The problem here is that you're writing your catalog file to a location on a worker node that is mounted to the dags folder in gcs, which airflow and cloud composer manages. Per the documentation,
When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster.
Cloud Composer synchronizes the dags/ and plugins/ folders uni-directionally by copying locally. Unidirectional synching means that local changes in these folders are overwritten.
The data/ and logs/ folders synchronize bi-directionally by using Cloud >Storage FUSE.
If you change the location of this file to /home/airflow/gcs/data/target/catalog.json, you should be fine as that syncs bi-directionally
QUESTION
We use DBT with GCP and BigQuery for transformations in BigQuery, and the simplest approach to scheduling our daily run dbt
seems to be a BashOperator
in Airflow. Currently we have two separate directories / github projects, one for DBT and another for Airflow. To schedule DBT to run with Airflow, it seems like our entire DBT project would need to be nested inside of our Airflow project, that way we can point to it for our dbt run
bash command?
Is it possible to trigger our dbt run
and dbt test
without moving our DBT directory inside of our Airflow directory? With the airflow-dbt package, for the dir
in the default_args
, maybe it is possible to point to the gibhub link for the DBT project here?
ANSWER
Answered 2020-Dec-23 at 11:51My advice would be to leave your dbt and airflow codebases separated. There is indeed a better way:
- dockerise your dbt project in a simple python-based image where you COPY the codebase
- push that to DockerHub or ECR or any other docker repository that you are using
- use the
DockerOperator
in your airflow DAG to run that docker image with your dbt code
I'm assuming that you use the airflow LocalExecutor here and that you want to execute your dbt run
workload on the server where airflow is running. If that's not the case and that you have access to a Kubernetes cluster, I would suggest instead to use the KubernetesPodOperator
.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install airflow-dbt
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page