Data-Pipelines-with-Apache-Airflow | data pipeline to automate data warehouse ETL | Data Migration library
kandi X-RAY | Data-Pipelines-with-Apache-Airflow Summary
kandi X-RAY | Data-Pipelines-with-Apache-Airflow Summary
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of Data-Pipelines-with-Apache-Airflow
Data-Pipelines-with-Apache-Airflow Key Features
Data-Pipelines-with-Apache-Airflow Examples and Code Snippets
Community Discussions
Trending Discussions on Data-Pipelines-with-Apache-Airflow
QUESTION
I don't understand how to interpret the combination of schedule_interval=None
and start_date=airflow.utils.dates.days_ago(3)
in an Airflow DAG. If the schedule_interval
was '@daily'
, then (I think) the following DAG would wait for the start of the next day, and then run three times once a day, backfilling the days_ago(3)
. I do know that because schedule_interval=None
, it will have to be manually started, but I don't understand the behavior beyond that. What is the point of the days_ago(3)
?
ANSWER
Answered 2021-Nov-02 at 18:28Your confusion is understandable. This is also confusing for the Airflow scheduler which is why using dynamic values for start_date considered a bad practice. To quote from the Airflow FAQ:
We recommend against using dynamic values as start_date
The reason for this is because Airflow calculates DAG scheduling using start_date
as base and schedule_interval
as period. When reaching the end of the period the DAG is triggered. However when the start_date
is dynamic there is a risk that the period will never end because the base always "moving".
To ease your confusion just change the start_date to some static value and then it will make sense to you.
Noting also that the guide that you referred to was written before AIP-39 Richer scheduler_interval was implemented. Starting Airflow 2.2.0 it's much easier to schedule DAGs. You can read about Timetables
in the documentation.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Data-Pipelines-with-Apache-Airflow
You can use Data-Pipelines-with-Apache-Airflow like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page