airflow | Apache Airflow - A platform to programmatically author | BPM library

 by   apache Python Version: 2.6.1 License: Apache-2.0

kandi X-RAY | airflow Summary

kandi X-RAY | airflow Summary

airflow is a Python library typically used in Retail, Automation, BPM applications. airflow has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install airflow' or download it from GitHub, PyPI.

Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative. Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              airflow has a highly active ecosystem.
              It has 30593 star(s) with 12459 fork(s). There are 762 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 690 open issues and 6706 have been closed. On average issues are closed in 133 days. There are 179 open pull requests and 0 closed requests.
              OutlinedDot
              It has a negative sentiment in the developer community.
              The latest version of airflow is 2.6.1

            kandi-Quality Quality

              airflow has 0 bugs and 0 code smells.

            kandi-Security Security

              airflow has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              airflow code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              airflow is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              airflow releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              It has 143816 lines of code, 7445 functions and 1994 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed airflow and discovered the below as its top functions. This is intended to give you an instant insight into airflow implemented functionality, and help decide if they suit your requirements.
            • Create default connection objects .
            • Creates a new training job .
            • Returns the list of executables to be queued .
            • Process backfill task instances .
            • Create evaluation operations .
            • The main entry point .
            • Creates an AutoML training job .
            • Get a template context .
            • Authenticate the LDAP user .
            • Evaluate the given trigger rule .
            Get all kandi verified functions for this library.

            airflow Key Features

            No Key Features are available at this moment for airflow.

            airflow Examples and Code Snippets

            Creating a simple Airflow DAG to run an Airbyte Sync Job
            Javadot img1Lines of Code : 18dot img1License : Non-SPDX (NOASSERTION)
            copy iconCopy
            from airflow import DAG
            from airflow.utils.dates import days_ago
            from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator
            
            with DAG(dag_id='trigger_airbyte_job_example',
                     default_args={'owner': 'airflow'},
                     s  
            Deploy Airbyte on Plural-Advanced Use Cases-Running with External Airflow
            Javadot img2Lines of Code : 1dot img2License : Non-SPDX (NOASSERTION)
            copy iconCopy
            https://username:password@airbytedomain
              
            copy iconCopy
            import datetime
            
            from airflow import models
            from airflow.operators import bash
            from airflow.providers.google.cloud.transfers.sheets_to_gcs import GoogleSheetsToGCSOperator
            
            YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
            B
            Get session parameter for airflow.models.dag get_last_dagrun()
            Pythondot img4Lines of Code : 13dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            from airflow.models import DagRun
            
            def get_last_exec_date(dag_id):
                dag_runs = DagRun.find(dag_id=dag_id)
                dags = []
                for dag in dag_runs:
                    if dag.state == 'success':
                        dags.append(dag)
                
                dags.sort(key=lamb
            Airflow Python Branch Operator not working in 1.10.15
            Pythondot img5Lines of Code : 19dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            def branch_test(**context: dict) -> str:
            
                return 'dummy_step_four'
            
            dummy_step_two >> dummy_step_three >> dummy_step_four
            
            trigger_rule="none_failed_or_skipped"
            
            Docker compose missing python package
            Pythondot img6Lines of Code : 2dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- openpyxl==3.0.9}
            
            How to disable Airflow DAGs with AWS Lambda
            Pythondot img7Lines of Code : 34dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            https://airflow.apache.org/api/v1/dags/{dag_id}
            
            {
              "is_paused": true
            }
            
            import time
            import airflow_client.client
            from airflow_client.client.api import dag_api
            from airflow_client.client.mod
            Airflow DAG task dependency, breaking up long lines
            Pythondot img8Lines of Code : 6dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            a >> b >> [c, d] >> f >> G
            
            from airflow.models.baseoperator import chain
            
            chain(a, b, [c, d], f, G)
            
            Table expiration in GCS to BQ Airflow task
            Pythondot img9Lines of Code : 9dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            BigQueryCreateEmptyTableOperator(
              ...
              table_resource={
                        "tableReference": {"tableId": ""},
                        "expirationTime": ,
              }
            )
            
            
            Table expiration in GCS to BQ Airflow task
            Pythondot img10Lines of Code : 59dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import datetime
            
            from airflow import models
            from airflow.operators import python
            
            from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
            from airflow.providers.google.cloud.hooks.bigquery import BigQuery

            Community Discussions

            QUESTION

            Submit command line arguments to a pyspark job on airflow
            Asked 2022-Mar-29 at 10:37

            I have a pyspark job available on GCP Dataproc to be triggered on airflow as shown below:

            ...

            ANSWER

            Answered 2022-Mar-28 at 08:18

            You have to pass a Sequence[str]. If you check DataprocSubmitJobOperator you will see that the params job implements a class google.cloud.dataproc_v1.types.Job.

            Source https://stackoverflow.com/questions/71616491

            QUESTION

            How to dynamically build a resources (V1ResourceRequirements) object for a kubernetes pod in airflow
            Asked 2022-Mar-06 at 16:26

            I'm currently migrating a DAG from airflow version 1.10.10 to 2.0.0.

            This DAG uses a custom python operator where, depending on the complexity of the task, it assigns resources dynamically. The problem is that the import used in v1.10.10 (airflow.contrib.kubernetes.pod import Resources) no longer works. I read that for v2.0.0 I should use kubernetes.client.models.V1ResourceRequirements, but I need to build this resource object dynamically. This might sound dumb, but I haven't been able to find the correct way to build this object.

            For example, I've tried with

            ...

            ANSWER

            Answered 2022-Mar-06 at 16:26

            The proper syntax is for example:

            Source https://stackoverflow.com/questions/71241180

            QUESTION

            Dataproc Cluster creation is failing with PIP error "Could not build wheels"
            Asked 2022-Jan-24 at 13:04

            We use to spin cluster with below configurations. It used to run fine till last week but now failing with error ERROR: Failed cleaning build dir for libcst Failed to build libcst ERROR: Could not build wheels for libcst which use PEP 517 and cannot be installed directly

            ...

            ANSWER

            Answered 2022-Jan-19 at 21:50

            Seems you need to upgrade pip, see this question.

            But there can be multiple pips in a Dataproc cluster, you need to choose the right one.

            1. For init actions, at cluster creation time, /opt/conda/default is a symbolic link to either /opt/conda/miniconda3 or /opt/conda/anaconda, depending on which Conda env you choose, the default is Miniconda3, but in your case it is Anaconda. So you can run either /opt/conda/default/bin/pip install --upgrade pip or /opt/conda/anaconda/bin/pip install --upgrade pip.

            2. For custom images, at image creation time, you want to use the explicit full path, /opt/conda/anaconda/bin/pip install --upgrade pip for Anaconda, or /opt/conda/miniconda3/bin/pip install --upgrade pip for Miniconda3.

            So, you can simply use /opt/conda/anaconda/bin/pip install --upgrade pip for both init actions and custom images.

            Source https://stackoverflow.com/questions/70743642

            QUESTION

            airflow health check
            Asked 2022-Jan-11 at 23:07

            The airflow I'm using, sometimes the pipelines wait for a long time to be scheduled. There have also been instances where a job was running for too long (presumably taking up resources of other jobs)

            I'm trying to work out how to programatically identify the health of the scheduler and potentially monitor those in the future without any additional frameworks. I started to have a look at the metadata database tables. All I can think of now is to see start_date and end_date from dag_run, and duration of the tasks. What are the other metrics that I should be looking at? Many thanks for your help.

            ...

            ANSWER

            Answered 2022-Jan-04 at 12:37

            There is no need to go "deep" inside the database.

            Airflow provide you with metrics that you can utilize for the very purpose: https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html

            If you scroll down, you will see all the useful metrics and some of them are precisely what you are looking for (especially Timers).

            This can be done with the usual metrics integration. Airflow publishes the metrics via statsd, and Airflow Official Helm Chart (https://airflow.apache.org/docs/helm-chart/stable/index.html) even exposes those metrics for Prometheus via statsd exporter.

            Regarding the spark job - yeah - current implementation of spark submit hook/operator is implemented in "active poll" mode. The "worker" process of airflow polls the status of the job. But Airlfow can run multiple worker jobs in parallel. Also if you want, you can implement your own task which will behave differently.

            In "classic" Airflow you'd need to implement a Submit Operator (to submit the job) and "poke_reschedule" sensor (to wait for the job to complete) and implement your DAG in the way that sensort task will be triggered after the operator. The "Poke reschedule" mode works in the way that the sensor is only taking the worker slot for the time of "polling" and then it frees the slot for some time (until it checks again).

            As of Airflow 2.2 you can also write a Deferrable Operator (https://airflow.apache.org/docs/apache-airflow/stable/concepts/deferring.html?highlight=deferrable) where you could write single Operator - doing submision first, and then deferring the status check - all in one operator. Defferrable operators are efficiently handling (using async.io) potentially multiple thousands of waiting/deferred operators without taking slots or excessive resources.

            Update: If you really cannot use statsd (helm is not needed, statsd is enough) you should never use DB to get information about the DAGS. Use Stable Airflow REST API instead: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html

            Source https://stackoverflow.com/questions/70515361

            QUESTION

            How to invoke a cloud function from google cloud composer?
            Asked 2021-Nov-30 at 19:27

            For a requirement I want to call/invoke a cloud function from inside a cloud composer pipeline but I cant find much info on it, I tried using SimpleHTTP airflow operator but I get this error:

            ...

            ANSWER

            Answered 2021-Sep-10 at 12:41

            I think you are looking for: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/operators/functions/index.html#airflow.providers.google.cloud.operators.functions.CloudFunctionInvokeFunctionOperator

            Note that in order to use in 1.10 you need to have backport provider packages installed (but I believe they are installed by default) and version of the operator might be slightly different due to backport packages not being released for quite some time already.

            In Airflow 2

            Source https://stackoverflow.com/questions/69131840

            QUESTION

            ImportError: cannot import name 'OP_NO_TICKET' from 'urllib3.util.ssl_'
            Asked 2021-Nov-08 at 22:41

            I started running airflow locally and while running docker specifically: docker-compose run -rm web server initdb I started seeing this error. I hadn't seen this issue prior to this afternoon, wondering if anyone else has come upon this.

            cannot import name 'OP_NO_TICKET' from 'urllib3.util.ssl_'

            ...

            ANSWER

            Answered 2021-Nov-08 at 22:41

            I have the same issue in my CI/CD using GitLab-CI. The awscli version 1.22.0 have this problem. I solved temporally the problem changed in my gitlab-ci file the line:

            pip install awscli --upgrade --user

            By:

            pip install awscli==1.21.12 --user

            Because when you call latest, the version that comes is 1.22.0

            Source https://stackoverflow.com/questions/69889936

            QUESTION

            Apache Airflow: No such file or directory: 'beeline' when trying to execute DAG with HiveOperator
            Asked 2021-Oct-29 at 06:41

            Receiving below error in task logs when running DAG:

            FileNotFoundError: [Errno 2] No such file or directory: 'beeline': 'beeline'

            This is my DAG:

            ...

            ANSWER

            Answered 2021-Oct-29 at 06:41

            The 'run_as_user' feature uses 'sudo' to switch to airflow user in non-interactive mode. The sudo comand will never (no matter what parameters you specify including -E) preserve PATH variable unless you do sudo in --interactive mode (logging in by the user). Only in the --interactive mode the user's .profile , .bashrc and other startup scripts are executed (and those are the scripts that set PATH for the user usually).

            All non-interactive 'sudo' command will have path set to secure_path set in /etc/sudoers file.

            My case here:

            Source https://stackoverflow.com/questions/69761943

            QUESTION

            Snowflake LIKE not populating rows
            Asked 2021-Oct-21 at 08:46

            When I run the below query:

            ...

            ANSWER

            Answered 2021-Oct-21 at 08:46

            From the description, you want to match rows where the column load_fname begins with the following:

            Source https://stackoverflow.com/questions/69649703

            QUESTION

            Get the client_id of the IAM proxy on GCP Cloud composer
            Asked 2021-Oct-15 at 15:02

            I'm trying to trigger Airflow DAG inside of a composer environment with cloud functions. In order to do that I need to get the client id as described here. I've tried with curl command but it doesn't return any value. With a python script I keep getting this error:

            ...

            ANSWER

            Answered 2021-Sep-28 at 13:00

            Posting this Community Wiki for better visibility.

            As mentioned in the comment section by @LEC this configuration is compatible with Cloud Composer V1 which can be found in GCP Documentation Triggering DAGs with Cloud Functions.

            At the moment there can be found two tabs Cloud Composer 1 Guides and Cloud Composer 2 Guides. Under Cloud Composer 1 is code used by the OP, but if you will check Cloud Composer 2 under Manage DAGs > Triggering DAGs with Cloud Functions you will get information that there is not proper documentation yet.

            This documentation page for Cloud Composer 2 is not yet available. Please use the page for Cloud Composer 1.

            As solution, please use Cloud Composer V1.

            Source https://stackoverflow.com/questions/69269929

            QUESTION

            What is context variable in Airflow operators
            Asked 2021-Oct-11 at 20:05

            I'm trying to understand what is this variable called context in Airflow operators. as example:

            ...

            ANSWER

            Answered 2021-Oct-11 at 14:02

            When Airflow runs a task, it collects several variables and passes these to the context argument on the execute() method. These variables hold information about the current task, you can find the list here: https://airflow.apache.org/docs/apache-airflow/stable/macros-ref.html#default-variables.

            Information from the context can be used in your task, for example to reference a folder yyyymmdd, where the date is fetched from the variable ds_nodash, a variable in the context:

            Source https://stackoverflow.com/questions/69527239

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install airflow

            Visit the official Airflow website documentation (latest stable release) for help with installing Airflow, getting started, or walking through a more complete tutorial. Note: If you're looking for documentation for the main branch (latest development branch): you can find it on s.apache.org/airflow-docs. For more information on Airflow Improvement Proposals (AIPs), visit the Airflow Wiki. Documentation for dependent projects like provider packages, Docker image, Helm Chart, you'll find it in the documentation index.

            Support

            As of Airflow 2.0, we agreed to certain rules we follow for Python and Kubernetes support. They are based on the official release schedule of Python and Kubernetes, nicely summarized in the Python Developer's Guide and Kubernetes version skew policy.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries

            Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Explore Related Topics

            Consider Popular BPM Libraries

            Try Top Libraries by apache

            echarts

            by apacheTypeScript

            superset

            by apacheTypeScript

            dubbo

            by apacheJava

            spark

            by apacheScala

            incubator-superset

            by apachePython