kedro | Python framework | Machine Learning library

by quantumblacklabs Python Version: 0.17.6 License: Apache-2.0

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | kedro Summary

kedro is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Machine Learning, Deep Learning applications. kedro has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install kedro' or download it from GitHub, PyPI.

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.

Support

Quality

Security

License

Reuse

Support

kedro has a highly active ecosystem.

It has 4748 star(s) with 537 fork(s). There are 92 watchers for this library.

It had no major release in the last 12 months.

There are 48 open issues and 541 have been closed. On average issues are closed in 31 days. There are 10 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of kedro is 0.17.6

Quality

kedro has 0 bugs and 0 code smells.

Security

kedro has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

kedro code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

kedro is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

kedro releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

It has 30585 lines of code, 2834 functions and 302 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed kedro and discovered the below as its top functions. This is intended to give you an instant insight into kedro implemented functionality, and help decide if they suit your requirements.

Creates a pipeline for the given parameters .
Builds a data set from the given catalog .
Formats an object .
Replace source code for a specific package .
Creates a new node with decorators
Creates a node from the given functions .
Finds the syntactic replacements for a given module .
Replace package_name with original name and destination_name
Parse a dataset definition .
Creates a kedroro session .

Get all kandi verified functions for this library.

kedro Key Features

No Key Features are available at this moment for kedro.

kedro Examples and Code Snippets

Kedro Introduction Tutorial,Tutorial,Part 2: Connecting a DataSet to Nodes in a Pipeline

Python

Lines of Code : 66

License : No License

Copy

# nodes.py
import pandas as pd
from .lib import inscribe_triangles

def circles_to_triangles(circles: pd.DataFrame) -> pd.DataFrame:
    """
    Takes a collection of circles, inscribes a triangle in each circle,
    returns the collection of insc

,> 0.17.x +,Creating nodes

Python

Lines of Code : 65

License : Permissive (MIT)

Copy

# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

pipeline = [
    node(
        split_data,
        ["example_iris_data", "params:example_test_data_ratio"],
        dict(
            train_

,> 0.17.x +,

Python

Lines of Code : 30

License : Permissive (MIT)

Copy

Usage: find-kedro [OPTIONS]

Options:
  --file-patterns TEXT       glob-style file patterns for Python node module
                             discovery

  --patterns TEXT            prefixes or glob names for Python pipeline, node,

Community Discussions

Trending Discussions on kedro

Saving data with DataCatalog

How to save kedro dataset in azure and still have it in memory

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

Why doesn't my Kedro starter prompt for input?

How to access environment name in kedro pipeline

Is there a package in R that mimics KEDRO as a modular collaborative framework for development?

kedro DataSetError while loading PartitionedDataSet

kedro context and catalog missing from ipython session

Logging the git_sha as a parameter on Mlflow using Kedro hooks

Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline

QUESTION

Saving data with DataCatalog

Asked 2022-Jan-24 at 15:31

I was looking at iris project example provided by kedro. Apart from logging the accuracy I also wanted to save the predictions and test_y as a csv.

This is the example node provided by kedro.

...

ANSWER

Answered 2022-Jan-24 at 15:31

Kedro actually abstracts this part for you. You don't need to access the datasets via their Python API.

Your report_accuracy method does need to be tweaked to return the DataFrame instead of None.

Your node needs to be defined as such:

Source https://stackoverflow.com/questions/68923747

QUESTION

How to save kedro dataset in azure and still have it in memory

Asked 2022-Jan-18 at 15:45

I want to save Kedro memory dataset in azure as a file and still want to have it in memory as my pipeline will be using this later in the pipeline. Is this possible in Kedro. I tried to look at Transcoding datasets but looks like not possible. Is there any other way to acheive this?

...

ANSWER

Answered 2022-Jan-18 at 15:22

I would try explicitly saving the dataset to Azure as part of your node logic, i.e. with catalog.save(). Then you can feed the dataset to downstream nodes in memory using the standard node inputs and outputs.

Source https://stackoverflow.com/questions/70757448

QUESTION

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

Asked 2022-Jan-15 at 21:02

I'm quite new using Kedro and after installing kedro in my conda environment, I'm getting the following error when trying to list my catalog:

Command performed: kedro catalog list

Error:

kedro.io.core.DataSetError: An exception occurred when parsing config for DataSet df_medinfo_raw: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pandas.ParquetDataSet:

I installed kedro trough conda-forge: conda install -c conda-forge "kedro[pandas]". As far as I understand, this way to install kedro also installs the pandas dependencies.

I tried to read the kedro documentation for dependencies, but it's not really clear how to solve this kind of issue.

My kedro version is 0.17.6.

...

ANSWER

Answered 2022-Jan-15 at 12:10

Try installing using pip

Source https://stackoverflow.com/questions/70719080

QUESTION

Why doesn't my Kedro starter prompt for input?

Asked 2022-Jan-06 at 16:55

I would like to create my own Kedro starter. I have tried to replicate the relevant portions of the pandas iris starter. I have a cookiecutter.json file with what I believe are appropriate mappings, and I have changed the repo and package directory names as well as any references to Kedro version such that they work with cookie cutter.

I am able to generate a new project from my starter with kedro new --starter=path/to/my/starter. However, the newly created project uses the default values for the project, package, and repo names, without prompting me for any input in the terminal.

Have I misconfigured something? How can I create a starter that will prompt users to override the defaults when creating new projects?

Here are the contents of cookiecutter.json in the top directory of my starter project:

...

ANSWER

Answered 2022-Jan-06 at 16:55

I think you may be missing prompts.yml https://github.com/quantumblacklabs/kedro/blob/main/kedro/templates/project/prompts.yml

Full instructions can be found here: https://kedro.readthedocs.io/en/stable/07_extend_kedro/05_create_kedro_starters.html

Source https://stackoverflow.com/questions/70610418

QUESTION

How to access environment name in kedro pipeline

Asked 2021-Dec-15 at 20:27

Is there any way to access the kedro pipeline environment name? Actually below is my problem.

I am loading the config paths as below

...

ANSWER

Answered 2021-Dec-15 at 20:27

You don't need to define config paths, config loader etc unless you are trying to override something.

If you are using kedro 0.17.x, the hooks.py will look something like this.

Kedro will pass, base, local and the env you specified during runtime in conf_paths into ConfigLoader.

Source https://stackoverflow.com/questions/70355869

QUESTION

Is there a package in R that mimics KEDRO as a modular collaborative framework for development?

Asked 2021-Dec-15 at 15:15

I currently work with Kedro (from quantum black https://kedro.readthedocs.io/en/stable/01_introduction/01_introduction.html) as a framework for deployment oriented framework to code collaboratively. It is a great framework to develop machine learning in a team.

I am looking for an R equivalent.

My main issue is that I have teams of data scientists that develop in R, but each team is developing in different formats.

I wanted to make them follow a common framework to develop deployment ready R code, easy to work on in 2 or 3-people teams.

Any suggestions are welcome

...

ANSWER

Answered 2021-Dec-15 at 15:10

Not on a very prominent scale as kedro but i can think of the below :

Local project of a R Expert : https://github.com/Jeniffen/projectr
Pipeliner on Tidyverse : https://cran.r-project.org/web/packages/pipeliner/index.html

Source https://stackoverflow.com/questions/70365836

QUESTION

kedro DataSetError while loading PartitionedDataSet

Asked 2021-Dec-05 at 06:25

I am using PartitionedDataSet to load multiple csv files from azure blob storage. I defined my data set in the datacatalog as below.

...

ANSWER

Answered 2021-Dec-05 at 06:25

Move load_args inside dataset

Source https://stackoverflow.com/questions/70230262

QUESTION

kedro context and catalog missing from ipython session

Asked 2021-Nov-23 at 12:43

I launched ipython session and trying to load a dataset.
I am running
df = catalog.load("test_dataset")
Facing the below error
NameError: name 'catalog' is not defined

I also tried %reload_kedro but got the below error

UsageError: Line magic function `%reload_kedro` not found.

Even not able to load context either. I am running the kedro environment from a Docker container. I am not sure where I am going wrong.

...

ANSWER

Answered 2021-Nov-23 at 12:43

new in 0.17.5 there is a fallback option, please run the following commands in your Jupyter/IPython session:

Source https://stackoverflow.com/questions/70080915

QUESTION

Logging the git_sha as a parameter on Mlflow using Kedro hooks

Asked 2021-Nov-17 at 14:26

I would like to log the git_sha parameter on Mlflow as shown in the documentation. What appears to me here, is that simply running the following portion of code should be enough to get git_sha logged in the Mlflow UI. Am I right ?

...

ANSWER

Answered 2021-Nov-17 at 14:26

Whilst it's heavily encouraged to use git with Kedro it's not required and as such no part of Kedro (except kedro-starters if we're being pedantic) is 'aware' of git.

In your before_pipeline_hook there it is pretty easy for you to retrieve the info via the techniques documented here. It seems trivial for the whole codebase, a bit more involved if you want to say provide pipeline specific hashes.

Source https://stackoverflow.com/questions/70005957

QUESTION

Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline

Asked 2021-Nov-12 at 10:27

According to Kedro's documentation, Azure Blob Storage is one of the available data sources. Does this extend to ADLS Gen2 ?

Haven't tried Kedro yet, but before I invest some time on it, I wanted to make sure I could connect to ADLS Gen2.

Thank you in advance !

...

ANSWER

Answered 2021-Nov-12 at 10:27

Yes this works with Kedro. You're actually pointing a really old version of the docs, nowadays all filesystem based datasets in Kedro use fsspec under the hood which means they work with S3, HDFS, local and many more filesystems seamlessly.

The ADLS Gen2 is supported by ffspec via the underlying adlfs library which is documented here.

From a Kedro point of view all you need to do is declare your catalog entry like so:

Source https://stackoverflow.com/questions/69940562

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install kedro

You can install using 'pip install kedro' or download it from GitHub, PyPI.
You can use kedro like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.