dagster | orchestration platform for the development production | BPM library

by dagster-io Python Version: 1.7.9rc0 License: Apache-2.0

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | dagster Summary

dagster is a Python library typically used in Automation, BPM applications. dagster has no bugs, it has no vulnerabilities, it has a Permissive License and it has high support. However dagster build file is not available. You can install using 'pip install dagster' or download it from GitHub, PyPI.

An orchestration platform for the development, production, and observation of data assets. Dagster lets you define jobs in terms of the data flow between reusable, logical components, then test locally and run anywhere. With a unified view of jobs and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke. Dagster is designed for data platform engineers, data engineers, and full-stack data scientists. Building a data platform with Dagster makes your stakeholders more independent and your systems more robust. Developing data pipelines with Dagster makes testing easier and deploying faster.

Support

Quality

Security

License

Reuse

Support

dagster has a highly active ecosystem.

It has 7654 star(s) with 965 fork(s). There are 97 watchers for this library.

There were 6 major release(s) in the last 12 months.

There are 1398 open issues and 4068 have been closed. On average issues are closed in 261 days. There are 292 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of dagster is 1.7.9rc0

Quality

dagster has 0 bugs and 0 code smells.

Security

dagster has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

dagster code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

dagster is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

dagster releases are available to install and integrate.

Deployable package is available in PyPI.

dagster has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions, examples and code snippets are available.

It has 314705 lines of code, 17864 functions and 3098 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed dagster and discovered the below as its top functions. This is intended to give you an instant insight into dagster implemented functionality, and help decide if they suit your requirements.

Config field for Spark .
Define the Dataproc cluster config field .
Define Cloud Dataproc job config .
Create a config field for an EMR run job flow .
Creates a task execution task .
Create a default scaling policy .
Creates a Graphene EventRecord from a Dagster event record .
Construct a DagsterK job .
Define the definition of the instance fleet .
Returns a sequence of DagsterEvent objects .

Get all kandi verified functions for this library.

dagster Key Features

No Key Features are available at this moment for dagster.

dagster Examples and Code Snippets

Using the Dagster Integration-2. Create the Dagster Op to trigger your Airbyte job-Creating a simple Dagster DAG to run an Airbyte Sync Job

Java

Lines of Code : 15

License : Non-SPDX (NOASSERTION)

Copy

from dagster import job
from dagster_airbyte import airbyte_resource, airbyte_sync_op

my_airbyte_resource = airbyte_resource.configured(
    {
        "host": {"env": "AIRBYTE_HOST"},
        "port": {"env": "AIRBYTE_PORT"},
    }
)
sync_foobar = ai

Dagster-meltano (Under development),Example

Python

Lines of Code : 14

License : Permissive (MIT)

Copy


from dagster import OutputDefinition, Nothing
from dagster_meltano.tests import pipeline
from dagster_meltano.solids import meltano_elt_solid


@pipeline
def meltano_pipeline():
    meltano_elt_solid(
        output_defs=[OutputDefinition(dagster_ty

ML with DAGs ️

Python

Lines of Code : 6

License : No License

Copy

git clone https://github.com/stephanBV/ML_with_DAGs.git
cd ML_with_DAGs

python3 -m virtualenv venv
source venv/bin/activate

pip install -r requirements.txt

python3 -m dagit -f script.py

How to rewrite python script to dagster friendly code

Python

Lines of Code : 46

License : Strong Copyleft (CC BY-SA 4.0)

Copy

import logging

from dagster import success_hook, failure_hook, solid, pipeline, execute_pipeline

from mypackage import function1, function2, function3, function4, function5


@solid
def dag_function1() -> bool:
    myvar1 = True
    f

How to run tasks in parallel in dagster?

Python

Lines of Code : 3

License : Strong Copyleft (CC BY-SA 4.0)

Copy

execution:
    multiprocess: {}

Adding additional parameters to a solid function

Python

Lines of Code : 12

License : Strong Copyleft (CC BY-SA 4.0)

Copy

@solid
def name():
    return "Marcus"

@solid
def age():
    return 20

@pipeline
def pipe2():
    hello2(hello(), name(), age())

import python files, with same function names, in different directories

Python

Lines of Code : 6

License : Strong Copyleft (CC BY-SA 4.0)

Copy

from CompositeKey.modules.Gordian.Gordian import (add_to_non_key_set ...)

CompositeKey.modules.Gordian.Assisted_Gordian.add_to_non_key_set(curNonKey, NonKeySet)

import CompositeKey
<

Dagster: how to reexecute failed steps of a pipeline?

Python

Lines of Code : 6

License : Strong Copyleft (CC BY-SA 4.0)

Copy

parent_run_id = instance.get_runs()[0].run_id

result = reexecute_pipeline(inputs_pipeline, parent_run_id=parent_run_id,
                            step_keys_to_execute=['step2.compute', 'step3.compute'],

How can you ensure, that the same pipeline is not executed twice at the same time

Python

Lines of Code : 9

License : Strong Copyleft (CC BY-SA 4.0)

Copy

try:
    open('sentinel', 'x').close()
except FileExistsError:
    exit("someone else already has sentinel")

do_stuff()

os.remove('sentinel')

dagster pipeline executes successfully when run with `execute_pipeline` but not when run with dagit

Python

Lines of Code : 3

License : Strong Copyleft (CC BY-SA 4.0)

Copy

numba.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Type of #4 arg mismatch: i1 != i32

Community Discussions

Trending Discussions on dagster

Is it possible to generate jobs in Dagster dynamically using configuration from database

Using an AuthorizationPolicy causes a 503 error

How to rewrite python script to dagster friendly code

What is proper Partition configs for Dagster job?

poetry run worker.py | FileNotFound [Errno 2] No such file or directory: b'/snap/bin/worker.py'

Is it possible to create dynamic jobs with Dagster?

DagsterUnmetExecutorRequirementsError with dagster CLI during tutorial

How to run tasks in parallel in dagster?

Dagster chaining resources

Azure DevOps | E265 block comment should start with '# ' (linting)

QUESTION

Is it possible to generate jobs in Dagster dynamically using configuration from database

Asked 2022-Mar-28 at 20:31

Currently, my database has multi departments. I need to apply a data pipeline to all of these departments with different configurations.

I want to load configurations for each department from a database. Then use these configuration to generate a list of Jobs in Dagster.

For example, I have 3 tenants:

Department1: Configuration1

Department2: Configuration2

Department3: Configuration3

These information is stored in my database.

How can I load these information and dynamically create 3 jobs (pipelines):

Pipeline1 for Department1 with Configuration1

Pipeline2 for Department2 with Configuration2

Pipeline3 for Department3 with Configuration3

Is it possible to do it on Dagster? I can do it with Airflow (dynamically generating DAGs) but not sure how to do this in Dagster. I cannot load database configuration outside of op/job in Dagster.

...

ANSWER

Answered 2022-Mar-28 at 20:31

In Dagster, your @repository function is just a regular function, so you can run arbitrary code in there to query your database and generate jobs dynamically:

Source https://stackoverflow.com/questions/71615796

QUESTION

Using an AuthorizationPolicy causes a 503 error

Asked 2022-Mar-28 at 19:42

When using an AuthorizationPolicy, I came across a 503 error trying to access the application. The error seems to be between the gateway and the service, but traffic never reaches the service. For instance, when using this policy to allow all traffic:

...

ANSWER

Answered 2022-Mar-28 at 19:42

The error above came from an alternate (unsecured) service. The Istiod ads logs had the following:

Source https://stackoverflow.com/questions/71590528

QUESTION

How to rewrite python script to dagster friendly code

Asked 2022-Feb-18 at 21:53

I have this simple python script. How could I rewrite it in a way that works in dagster?

...

ANSWER

Answered 2022-Feb-18 at 21:53

In order to pass outputs of solids to inputs of other solids, you'll need to create a pipeline that defines the dependencies between inputs and outputs.

From there, you'll be able to execute the pipeline:

Source https://stackoverflow.com/questions/71165201

QUESTION

What is proper Partition configs for Dagster job?

Asked 2022-Jan-05 at 18:12

Currently, I am facing with dagster.core.errors.PartitionExecutionError but error logs from Dagster seem not obvious to me.

...

ANSWER

Answered 2022-Jan-05 at 18:12

@daily_paritioned_config needs to be able to accept two arguments, one for the start of the time window and one for the end. daily_download_config doesn't actually make use of this end date value, but it still needs to show up in the signature because Dagster will try to pass two arguments to this function regardless

Source https://stackoverflow.com/questions/70465752

QUESTION

poetry run worker.py | FileNotFound [Errno 2] No such file or directory: b'/snap/bin/worker.py'

Asked 2021-Nov-17 at 20:33

I execute the below command, in the same working directory as file worker.py:

...

ANSWER

Answered 2021-Oct-21 at 10:12

poetry run means "run the following command in the venv managed by poetry".

So the correct way of using it in your case is: poetry run python worker.py

Source https://stackoverflow.com/questions/69659712

QUESTION

Is it possible to create dynamic jobs with Dagster?

Asked 2021-Nov-12 at 23:42

Consider this example - you need to load table1 from source database, do some generic transformations (like convert time zones for timestamped columns) and write resulting data into Snowflake. This is an easy one and can be implemented using 3 dagster ops.

Now, imagine you need to do the same thing but with 100s of tables. How would you do it with dagster? Do you literally need to create 100 jobs/graphs? Or can you create one job, that will be executed 100 times? Can you throttle how many of these jobs will run at the same time?

...

ANSWER

Answered 2021-Nov-12 at 23:42

You have a two main options for doing this:

Use a single job with Dynamic Outputs:

With this setup, all of your ETLs would happen in a single job. You would have an initial op that would yield a DynamicOutput for each table name that you wanted to do this process for, and feed that into a set of ops (probably organized into a graph) that would be run on each individual DynamicOutput.

Depending on what executor you're using, it's possible to limit the overall step concurrency (for example, the default multiprocess_executor supports this option).

Create a configurable job (I think this is more likely what you want)

Source https://stackoverflow.com/questions/69949073

QUESTION

DagsterUnmetExecutorRequirementsError with dagster CLI during tutorial

Asked 2021-Oct-25 at 22:29

I just started following the dagster tutorial. I managed to get the hello_cereal job running with dagit and the Python API, but for some reason when trying with dagster CLI

...

ANSWER

Answered 2021-Oct-25 at 00:59

I had the same issue. I set DAGSTER_HOME and it fixed the problem. From the Dagster docs (https://docs.dagster.io/deployment/dagster-instance)

If DAGSTER_HOME is not set, the Dagster tools will use an ephemeral instance for execution. In this case, the run and event log storages will be in-memory rather than persisted to disk, and filesystem storage will use a temporary directory that is cleaned up when the process exits. This is useful for tests and is the default when invoking Python APIs such as JobDefinition.execute_in_process directly.

Source https://stackoverflow.com/questions/69677992

QUESTION

How to run tasks in parallel in dagster?

Asked 2021-Oct-14 at 17:07

I am using dagster to running into local node.js microservices pipelines, in order to execute test.

The ide is execute n docker_files, and n node.js microservices, easily like you can do with dagster.

The problem is that when I execute the first second one task a shell command to execute a docker container, dagsteer keep in that point, and not execute all task in the same level.

Current dag logs like this

...

ANSWER

Answered 2021-Oct-14 at 17:07

If you're using the new job/op APIs, then Dagster will by default use a multiprocess executor, which will be able to run multiple tasks in parallel.

If you're using the pipeline/solid APIs, then you can pass in run configuration to tell Dagster to use the multiprocess executor instead of the default single process executor. If you're launching a pipeline from Dagit, you'd pass in run config that looked like:

Source https://stackoverflow.com/questions/69573045

QUESTION

Dagster chaining resources

Asked 2021-Oct-13 at 16:46

I've recently picked up Dagster to evaluate as an alternate to Airflow.

I haven't been able to wrap my head around the concept of resources and looking to understand if what I'm trying to do is possible or can be achieved better in a different way.

I have a helpder class like below that helps keep code DRY

...

ANSWER

Answered 2021-Oct-13 at 16:46

I posted the same question on the dagster Slack channel and qickly had a reply frok the helpful team. Posting it here, in case it helps someone -

keep your HelperAwsS3 class and write your own resource that uses the s3 resource, it could look something like this:

Source https://stackoverflow.com/questions/69553675

QUESTION

Azure DevOps | E265 block comment should start with '# ' (linting)

Asked 2021-Sep-28 at 14:14

I cannot see what the error is here. I only think these are Warnings.

What is there error message, and a probable cause?

Update: I ran linting in VS Code using Ctrl+Shift+P: >Python: Run Linting. Pushed changes and ran the pipeline again.

test_ontology_tagger.py:

...

ANSWER

Answered 2021-Sep-28 at 14:14

The linter package I was using is flake8.

This was a series of code quality problems.

The last one being E265 block comment should start with '# '.

This meant that a space had to appear immediately after #; before any and all other text.

Source https://stackoverflow.com/questions/69349497

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install dagster

Dagster helps platform teams build systems for data practitioners. Jobs are built from shared, reusable, configurable data processing and infrastructure components. Dagit, Dagster’s web interface, lets anyone inspect these objects and discover how to use them.
Dagster is available on PyPI, and officially supports Python 3.6+.
Dagster: the core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
Dagit: the UI for developing and operating Dagster pipelines, including a DAG browser, a type-aware config editor, and a live execution interface.