data-pipelines-with-apache-airflow | Code for Data Pipelines with Apache Airflow | BPM library
kandi X-RAY | data-pipelines-with-apache-airflow Summary
kandi X-RAY | data-pipelines-with-apache-airflow Summary
Code accompanying the Manning book Data Pipelines with Apache Airflow.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- List of available ratings
- Convert a date string to a timestamp
- Generate the tasks for the given dataset
- Returns True if there is no ratings for the given time period
- Gets the list of ratings for a given date range
- Get data from the API endpoint
- Close the session
- Fetch the ratings for a given time range
- Fetch ratings from the API
- Get all ratings for a given month
- Return a session object
- Get a paginated list of ratings
- Generate a workflow for a given dataset
- Download the ml - 25 ratings dataset
- Show all events
- Convert a string to a datetime object
- Calculate template stats
- Generate a pandas DataFrame containing events for the given end date
- Generate the events for a day
- Rank a list of movies by rating
- Fetch weather
- Read ratings from a csv file
- Write the ratings to the given directory
- Clean up old sales
- Deploy the model
- Fetch sales objects
data-pipelines-with-apache-airflow Key Features
data-pipelines-with-apache-airflow Examples and Code Snippets
Community Discussions
Trending Discussions on data-pipelines-with-apache-airflow
QUESTION
I don't understand how to interpret the combination of schedule_interval=None
and start_date=airflow.utils.dates.days_ago(3)
in an Airflow DAG. If the schedule_interval
was '@daily'
, then (I think) the following DAG would wait for the start of the next day, and then run three times once a day, backfilling the days_ago(3)
. I do know that because schedule_interval=None
, it will have to be manually started, but I don't understand the behavior beyond that. What is the point of the days_ago(3)
?
ANSWER
Answered 2021-Nov-02 at 18:28Your confusion is understandable. This is also confusing for the Airflow scheduler which is why using dynamic values for start_date considered a bad practice. To quote from the Airflow FAQ:
We recommend against using dynamic values as start_date
The reason for this is because Airflow calculates DAG scheduling using start_date
as base and schedule_interval
as period. When reaching the end of the period the DAG is triggered. However when the start_date
is dynamic there is a risk that the period will never end because the base always "moving".
To ease your confusion just change the start_date to some static value and then it will make sense to you.
Noting also that the guide that you referred to was written before AIP-39 Richer scheduler_interval was implemented. Starting Airflow 2.2.0 it's much easier to schedule DAGs. You can read about Timetables
in the documentation.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install data-pipelines-with-apache-airflow
You can use data-pipelines-with-apache-airflow like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page