CatchUp | An app for catching up on things | REST library
kandi X-RAY | CatchUp Summary
kandi X-RAY | CatchUp Summary
An app for catching up on things.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of CatchUp
CatchUp Key Features
CatchUp Examples and Code Snippets
Community Discussions
Trending Discussions on CatchUp
QUESTION
ANSWER
Answered 2022-Mar-30 at 11:46If task executed it should have a log.
I think your issue is that the task you defined is not assigned to any DAG object thus you see No task found
error (empty DAG)
You should add dag=dag
:
QUESTION
My airflow test_dag looks like:
...ANSWER
Answered 2022-Mar-14 at 20:40I believe your code has a bug in the pipeline logic.
BranchPythonOperator
is expected to return the task_id to follow.
In your case you have:
QUESTION
I have airflow dag written to work with Python operator.I need to use PostgreSQL operator for same dag without changing functionality of dag. Here is code with Python operators. How Should I replace Python operator with PostgreSQL operator? Or can we use two different operators in a single dag?
...ANSWER
Answered 2022-Feb-28 at 13:18PostgesOperator
runs SQL. Your code is querying API, generate a CSV and loading it to the DB. You can not do that with PostgesOperator
.
What you can do is to replace the usage of psycopg2
with PostgresHook
.
The hook is a wrapper around psycopg2
that expose you functions that you can interact with. This means that, for example, you don't need to handle how to connect to Postgres on your own. Simply define the connection in Admin -> Connections
and reference the connection name in the hook:
QUESTION
Problem: We're having difficulties having a DAG fire off given a defined interval. It's also preventing manual DAG executions as well. We've added catchup=False
as well to the DAG definition.
Context: We're planning to have a DAG execute on a 4HR interval from M-F. We've defined this behavior using the following CRON expression:
...ANSWER
Answered 2022-Feb-23 at 12:31I think you are looking for is 0 0/4 * * 1-5
which will run on 4th hour from 0 through 23 on every day-of-week from Monday through Friday.
Your DAG object can be:
QUESTION
I am running Airflow in a docker container on my local machine. I'm running a test DAG doing 3 tasks. The three tasks run fine, however, the last task with the bash operator is stuck in a loop as seen in the picture in the bottom. Looking in the log file, an entry is only generated for the first execution of the bash python script, then nothing, but the python file keeps getting executed. Any suggestions as to what could be the issue?
Thanks,
Richard
...ANSWER
Answered 2022-Feb-21 at 15:33All right, didn't research as to why this is the case, but it seems like if I create a scripts folder inside the dags folder, the python script inside (test_dontputthescripthere.py) is executed even if the bashoperator isn't telling it to execute. As you can see, the bashoperator is executing the test.py file perfectly, and adds the following line to the csv:
2022-02-21 15:11:53.923284;adding entry with bash python script
The test_dontputthescripthere.py is executed in a loop, and without the bashoperator executing the file. This is all the "- and this is wrong" entries in the demo.csv file.
I suspect some kind of refresh is going on inside airflow, forcing it to execute the python file.
QUESTION
I'm learning how to use airflow to build machine learning pipeline.
But didn't find a way to pass pandas dataframe generated from 1 task into another task... It seems that need to convert the data to JSON format or save the data in database within each task?
Finally, I had to put everything in 1 task... Is there anyway to pass dataframe between airflow tasks?
Here's my code:
...ANSWER
Answered 2021-Nov-08 at 09:59Although it is used in many ETL tasks, Airflow is not the right choice for that kind of operations, it is intended for workflow not dataflow. But there are many ways to do that without passing the whole dataframe between tasks.
You can pass information about the data using xcom.push and xcom.pull:
a. Save the outcome of the first task somewhere (json, csv, etc.)
b. Pass to xcom.push information about saved file. E.g. file name, path.
c. Read this filename using xcom.pull from the other task and perform needed operation.
Or:
Everything above using some database tables:
a. In task_1 you can download data from table_1 in some dataframe, process it and save in another table_2 (df.to_sql()).
b. Pass the name of the table using xcom.push.
c. From the other task get table_2 using xcom.pull and read it with df.read_sql().
Information on how to use xcom you can get from airflow examples. Example: https://github.com/apache/airflow/blob/main/airflow/example_dags/tutorial_etl_dag.py
IMHO there are many other better ways, I have just written what I tried.
QUESTION
I am using S3ToRedshiftOperator to load csv file into Redshift database. Kindly help to pass xcom variable to S3ToRedshiftOperator. How can we push xcom without using custom function?
Error:
NameError: name 'ti' is not defined
Using below code:
...ANSWER
Answered 2022-Jan-05 at 21:24The error message tells the problem.
ti
is not defined.
When you set provide_context=True
, Airflow makes Context available for you in the python callable. One of the attributes is ti
(see source code). So you need to extract it from kwargs or set it in the function signature.
Your code should be:
QUESTION
## Section 1 | Import Modules
## Section 2 | DAG Default Arguments
## Section 3 | Instantiate the DAG
## Section 4 | defining Utils
## Section 5 | Task defining
## Section 6 | Defining dependecies
## Section 1 | Import Modules
from airflow import DAG
from datetime import datetime
from airflow.operators.python_operator import PythonOperator
## Section 2 | DAG Default Arguments
default_args = {
'owner': 'Sourav',
'depends_on_past': False,
'start_date': datetime(2021, 6, 11),
'retries': 0,
}
## Section 3 | Instantiate the DAG
dag = DAG('basic_skeleton',
description='basic skeleton of a DAG',
default_args=default_args,
schedule_interval=None,
catchup=False,
tags=['skeleton'],
)
x = 0
## Section 4 | defining Utils
def print_context(**kwargs):
print("hello world")
return "hello world!!!"
def sum(**kwargs):
c = 1+2
return c
def diff(**kwargs):
global c
c = 2-1
return c
## Doubts
x = c
y = dag.get_dagrun(execution_date=dag.get_latest_execution_date()).conf
## Section 5 | Task defining
with dag:
t_printHello_prejob = PythonOperator(
task_id='t_printHello_prejob',
provide_context=True,
python_callable=print_context,
dag=dag,
)
t_sum_job = PythonOperator(
task_id='t_sum_job',
python_callable=sum,
provide_context=True,
dag=dag
)
## Section 6 | Defining dependecies
t_printHello_prejob>>t_sum_job
...ANSWER
Answered 2021-Dec-13 at 09:10Sourav, tell me if this helps:
In an Airflow DAG we generally don't share data between tasks, even though it's technically possible. We're encouraged to keep every task idempotent, not unlike a "pure function" in functional programming. This means that given an input x
, a given task will always create the same result.
The DAG you're defining here is basically a blueprint for a data pipeline. When the DAG and tasks are evaluated by the Airflow scheduler, the functions which will be called by the tasks are... well, not yet called. Intuitively, therefore I would expect x
to always equal zero, and while it's an interesting mystery to unravel why it isn't always, mutating global variables during a DAG run isn't what Airflow is set up to do.
That said, one simple way to reliably mutate x
or c
and use it across tasks is to store it in an Airflow variable:
QUESTION
I have 2 BigQueryOperator tasks in a loop. The first task works perfectly, however the second task (create_partition_table_agent_intensity_{v_list[i]}
) throws an error:
ANSWER
Answered 2021-Dec-08 at 14:55I do not have playground to test it, but I think you should not use f-string for sql parameter. If you use {{something}}
in f-string it returns string {something}
so parameters for query are not inserted and this results in SQL syntax error as query is run without parameters. Please try to remove f
before string for sql
in 2nd task.
QUESTION
Relating to this earlier question, suppose that we have an Apache Airflow DAG that comprises two tasks, first an HTTP request (i.e., SimpleHTTPOperator) and then a PythonOperator that does something with the response of the first task.
Conveniently, using the Dog CEO API as an example, consider the following DAG:
...ANSWER
Answered 2021-Oct-31 at 10:37Your problem is that you did not set dependency between the tasks so inspect_dog
may run before or in parallel to get_dog
when this happens get_dog
will see no xcom value because inspect_dog
didn't push it yet.
You just need to set dependency as:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install CatchUp
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page