kedro | Python framework | Machine Learning library

 by   quantumblacklabs Python Version: 0.17.6 License: Apache-2.0

kandi X-RAY | kedro Summary

kandi X-RAY | kedro Summary

kedro is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Machine Learning, Deep Learning applications. kedro has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install kedro' or download it from GitHub, PyPI.

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              kedro has a highly active ecosystem.
              It has 4748 star(s) with 537 fork(s). There are 92 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 48 open issues and 541 have been closed. On average issues are closed in 31 days. There are 10 open pull requests and 0 closed requests.
              It has a positive sentiment in the developer community.
              The latest version of kedro is 0.17.6

            kandi-Quality Quality

              kedro has 0 bugs and 0 code smells.

            kandi-Security Security

              kedro has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              kedro code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              kedro is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              kedro releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              It has 30585 lines of code, 2834 functions and 302 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed kedro and discovered the below as its top functions. This is intended to give you an instant insight into kedro implemented functionality, and help decide if they suit your requirements.
            • Creates a pipeline for the given parameters .
            • Builds a data set from the given catalog .
            • Formats an object .
            • Replace source code for a specific package .
            • Creates a new node with decorators
            • Creates a node from the given functions .
            • Finds the syntactic replacements for a given module .
            • Replace package_name with original name and destination_name
            • Parse a dataset definition .
            • Creates a kedroro session .
            Get all kandi verified functions for this library.

            kedro Key Features

            No Key Features are available at this moment for kedro.

            kedro Examples and Code Snippets

            copy iconCopy
            # nodes.py
            import pandas as pd
            from .lib import inscribe_triangles
            
            def circles_to_triangles(circles: pd.DataFrame) -> pd.DataFrame:
                """
                Takes a collection of circles, inscribes a triangle in each circle,
                returns the collection of insc  
            ,> 0.17.x +,Creating nodes
            Pythondot img2Lines of Code : 65dot img2License : Permissive (MIT)
            copy iconCopy
            # my-proj/pipelinies/data_engineering/pipeline
            from kedro.pipeline import node
            from .nodes import split_data
            
            pipeline = [
                node(
                    split_data,
                    ["example_iris_data", "params:example_test_data_ratio"],
                    dict(
                        train_  
            ,> 0.17.x +,
            Pythondot img3Lines of Code : 30dot img3License : Permissive (MIT)
            copy iconCopy
            Usage: find-kedro [OPTIONS]
            
            Options:
              --file-patterns TEXT       glob-style file patterns for Python node module
                                         discovery
            
              --patterns TEXT            prefixes or glob names for Python pipeline, node,
                              

            Community Discussions

            QUESTION

            Saving data with DataCatalog
            Asked 2022-Jan-24 at 15:31

            I was looking at iris project example provided by kedro. Apart from logging the accuracy I also wanted to save the predictions and test_y as a csv.

            This is the example node provided by kedro.

            ...

            ANSWER

            Answered 2022-Jan-24 at 15:31

            Kedro actually abstracts this part for you. You don't need to access the datasets via their Python API.

            Your report_accuracy method does need to be tweaked to return the DataFrame instead of None.

            Your node needs to be defined as such:

            Source https://stackoverflow.com/questions/68923747

            QUESTION

            How to save kedro dataset in azure and still have it in memory
            Asked 2022-Jan-18 at 15:45

            I want to save Kedro memory dataset in azure as a file and still want to have it in memory as my pipeline will be using this later in the pipeline. Is this possible in Kedro. I tried to look at Transcoding datasets but looks like not possible. Is there any other way to acheive this?

            ...

            ANSWER

            Answered 2022-Jan-18 at 15:22

            I would try explicitly saving the dataset to Azure as part of your node logic, i.e. with catalog.save(). Then you can feed the dataset to downstream nodes in memory using the standard node inputs and outputs.

            Source https://stackoverflow.com/questions/70757448

            QUESTION

            AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas
            Asked 2022-Jan-15 at 21:02

            I'm quite new using Kedro and after installing kedro in my conda environment, I'm getting the following error when trying to list my catalog:

            Command performed: kedro catalog list

            Error:

            kedro.io.core.DataSetError: An exception occurred when parsing config for DataSet df_medinfo_raw: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pandas.ParquetDataSet:

            I installed kedro trough conda-forge: conda install -c conda-forge "kedro[pandas]". As far as I understand, this way to install kedro also installs the pandas dependencies.

            I tried to read the kedro documentation for dependencies, but it's not really clear how to solve this kind of issue.

            My kedro version is 0.17.6.

            ...

            ANSWER

            Answered 2022-Jan-15 at 12:10

            Try installing using pip

            Source https://stackoverflow.com/questions/70719080

            QUESTION

            Why doesn't my Kedro starter prompt for input?
            Asked 2022-Jan-06 at 16:55

            I would like to create my own Kedro starter. I have tried to replicate the relevant portions of the pandas iris starter. I have a cookiecutter.json file with what I believe are appropriate mappings, and I have changed the repo and package directory names as well as any references to Kedro version such that they work with cookie cutter.

            I am able to generate a new project from my starter with kedro new --starter=path/to/my/starter. However, the newly created project uses the default values for the project, package, and repo names, without prompting me for any input in the terminal.

            Have I misconfigured something? How can I create a starter that will prompt users to override the defaults when creating new projects?

            Here are the contents of cookiecutter.json in the top directory of my starter project:

            ...

            ANSWER

            Answered 2022-Jan-06 at 16:55

            QUESTION

            How to access environment name in kedro pipeline
            Asked 2021-Dec-15 at 20:27

            Is there any way to access the kedro pipeline environment name? Actually below is my problem.

            I am loading the config paths as below

            ...

            ANSWER

            Answered 2021-Dec-15 at 20:27

            You don't need to define config paths, config loader etc unless you are trying to override something.

            If you are using kedro 0.17.x, the hooks.py will look something like this.

            Kedro will pass, base, local and the env you specified during runtime in conf_paths into ConfigLoader.

            Source https://stackoverflow.com/questions/70355869

            QUESTION

            Is there a package in R that mimics KEDRO as a modular collaborative framework for development?
            Asked 2021-Dec-15 at 15:15

            I currently work with Kedro (from quantum black https://kedro.readthedocs.io/en/stable/01_introduction/01_introduction.html) as a framework for deployment oriented framework to code collaboratively. It is a great framework to develop machine learning in a team.

            I am looking for an R equivalent.

            My main issue is that I have teams of data scientists that develop in R, but each team is developing in different formats.

            I wanted to make them follow a common framework to develop deployment ready R code, easy to work on in 2 or 3-people teams.

            Any suggestions are welcome

            ...

            ANSWER

            Answered 2021-Dec-15 at 15:10

            Not on a very prominent scale as kedro but i can think of the below :

            1. Local project of a R Expert : https://github.com/Jeniffen/projectr
            2. Pipeliner on Tidyverse : https://cran.r-project.org/web/packages/pipeliner/index.html

            Source https://stackoverflow.com/questions/70365836

            QUESTION

            kedro DataSetError while loading PartitionedDataSet
            Asked 2021-Dec-05 at 06:25

            I am using PartitionedDataSet to load multiple csv files from azure blob storage. I defined my data set in the datacatalog as below.

            ...

            ANSWER

            Answered 2021-Dec-05 at 06:25

            Move load_args inside dataset

            Source https://stackoverflow.com/questions/70230262

            QUESTION

            kedro context and catalog missing from ipython session
            Asked 2021-Nov-23 at 12:43

            I launched ipython session and trying to load a dataset.
            I am running
            df = catalog.load("test_dataset")
            Facing the below error
            NameError: name 'catalog' is not defined

            I also tried %reload_kedro but got the below error

            UsageError: Line magic function `%reload_kedro` not found.

            Even not able to load context either. I am running the kedro environment from a Docker container. I am not sure where I am going wrong.

            ...

            ANSWER

            Answered 2021-Nov-23 at 12:43

            new in 0.17.5 there is a fallback option, please run the following commands in your Jupyter/IPython session:

            Source https://stackoverflow.com/questions/70080915

            QUESTION

            Logging the git_sha as a parameter on Mlflow using Kedro hooks
            Asked 2021-Nov-17 at 14:26

            I would like to log the git_sha parameter on Mlflow as shown in the documentation. What appears to me here, is that simply running the following portion of code should be enough to get git_sha logged in the Mlflow UI. Am I right ?

            ...

            ANSWER

            Answered 2021-Nov-17 at 14:26

            Whilst it's heavily encouraged to use git with Kedro it's not required and as such no part of Kedro (except kedro-starters if we're being pedantic) is 'aware' of git.

            In your before_pipeline_hook there it is pretty easy for you to retrieve the info via the techniques documented here. It seems trivial for the whole codebase, a bit more involved if you want to say provide pipeline specific hashes.

            Source https://stackoverflow.com/questions/70005957

            QUESTION

            Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline
            Asked 2021-Nov-12 at 10:27

            According to Kedro's documentation, Azure Blob Storage is one of the available data sources. Does this extend to ADLS Gen2 ?

            Haven't tried Kedro yet, but before I invest some time on it, I wanted to make sure I could connect to ADLS Gen2.

            Thank you in advance !

            ...

            ANSWER

            Answered 2021-Nov-12 at 10:27

            Yes this works with Kedro. You're actually pointing a really old version of the docs, nowadays all filesystem based datasets in Kedro use fsspec under the hood which means they work with S3, HDFS, local and many more filesystems seamlessly.

            The ADLS Gen2 is supported by ffspec via the underlying adlfs library which is documented here.

            From a Kedro point of view all you need to do is declare your catalog entry like so:

            Source https://stackoverflow.com/questions/69940562

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install kedro

            You can install using 'pip install kedro' or download it from GitHub, PyPI.
            You can use kedro like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            Yes! Want to help build Kedro? Check out our guide to contributing to Kedro.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/quantumblacklabs/kedro.git

          • CLI

            gh repo clone quantumblacklabs/kedro

          • sshUrl

            git@github.com:quantumblacklabs/kedro.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Machine Learning Libraries

            tensorflow

            by tensorflow

            youtube-dl

            by ytdl-org

            models

            by tensorflow

            pytorch

            by pytorch

            keras

            by keras-team

            Try Top Libraries by quantumblacklabs

            causalnex

            by quantumblacklabsPython

            qbstyles

            by quantumblacklabsJupyter Notebook

            kedro-viz

            by quantumblacklabsJavaScript

            kedro-airflow

            by quantumblacklabsPython

            kedro-docker

            by quantumblacklabsPython