dvc | 🦉 Data Version Control | Git for Data & Models | ML Experiments Management | Machine Learning library

 by   iterative Python Version: 3.51.2 License: Apache-2.0

kandi X-RAY | dvc Summary

kandi X-RAY | dvc Summary

dvc is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning, Tensorflow applications. dvc has no bugs, it has no vulnerabilities, it has a Permissive License and it has high support. However dvc build file is not available. You can install using 'pip install dvc' or download it from GitHub, PyPI.

🦉 Data Version Control | Git for Data & Models | ML Experiments Management
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              dvc has a highly active ecosystem.
              It has 11637 star(s) with 1064 fork(s). There are 135 watchers for this library.
              There were 10 major release(s) in the last 12 months.
              There are 563 open issues and 3756 have been closed. On average issues are closed in 134 days. There are 9 open pull requests and 0 closed requests.
              It has a positive sentiment in the developer community.
              The latest version of dvc is 3.51.2

            kandi-Quality Quality

              dvc has 0 bugs and 0 code smells.

            kandi-Security Security

              dvc has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              dvc code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              dvc is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              dvc releases are available to install and integrate.
              Deployable package is available in PyPI.
              dvc has no build file. You will be need to create the build yourself to build the component from source.
              It has 59024 lines of code, 4179 functions and 516 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed dvc and discovered the below as its top functions. This is intended to give you an instant insight into dvc implemented functionality, and help decide if they suit your requirements.
            • Adds the argument parser
            • Set required subparsers
            • Append documentation link to Dvc documentation
            • Format a link
            • Produce a reproducible experiment
            • Get the args and kwargs from the working directory
            • Convert environment variable to bool
            • Unpack the arguments from a pickle file
            • Run the command
            • Push the given targets to the remote server
            • Fetch the experiments from the remote repo
            • Reproduces the experiment
            • Removes experiments from the current workspace
            • Pull commits from git remote
            • Remove exp_names from repo
            • Load data from a pipeline
            • Replicate the repo
            • List the files under the given path
            • Return a dict of parameters matching the given parameters
            • Fetch dependencies
            • Generate y values for y values
            • Log standard exceptions
            • Commit repository
            • Run celery
            • Run the function
            • Configure logging
            Get all kandi verified functions for this library.

            dvc Key Features

            No Key Features are available at this moment for dvc.

            dvc Examples and Code Snippets

            DVC filesystem abstraction layer (0.8.0),Installation,Usage
            Pythondot img2Lines of Code : 49dot img2no licencesLicense : No License
            copy iconCopy
            from fs import open_fs
            fs1 = open_fs("dvc://github.com/covid-genomics/data-artifacts") # Clone by https
            fs2 = open_fs("dvc://ssh@github.com/covid-genomics/data-artifacts") # Clone by ssh
            fs3 = open_fs("dvc://@github.com/covid-genomics/data-artifacts"  
            DVC command
            Pythondot img3Lines of Code : 46dot img3License : Non-SPDX (NOASSERTION)
            copy iconCopy
            :param str input_csv_file: Path to input file
            :param str output_csv_file_1: Path to output file 1
            :param str output_csv_file_2: Path to output file 2
            [...]
            
            [:dvc-[in|out][\s{related_param}]?:[\s{file_path}]?]*
            [:dvc-extra: {python_other_param}]?
            
            :d  
            Use parameters from additional configs in dvc 2.0
            Pythondot img4Lines of Code : 22dot img4License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            stages:
              train:
                cmd: python train.py
                deps:
                  - users.csv
                params:
                  - params.py:
                      - BOOL
                      - INT
                      - TrainConfig.EPOCHS
                      - TrainConfig.layers
                outs:
                  - model.pkl
            
            <
            How to access DVC-controlled files from Oracle?
            Pythondot img5Lines of Code : 23dot img5License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            with dvc.api.open(
                    'activity.log',
                    repo='location/of/dvc/project',
                    remote='my-s3-bucket'
                    ) as fd:
                for line in fd:
                    match = re.search(r'user=(\w+)', line)
                    # ... Process users activity log
            <
            Pip install multiple extra dependencies of a single package via requirement file
            Pythondot img6Lines of Code : 3dot img6License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            # requirements.txt
            dvc[s3,gs]
            
            Pyinstaller: AttributeError: module 'enum' has no attribute 'IntFlag'
            Pythondot img7Lines of Code : 2dot img7License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            pip uninstall enum34
            
            By how much can i approx. reduce disk volume by using dvc?
            Pythondot img8Lines of Code : 33dot img8License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            $ dvc add data.xml
            
            $ echo "" >> data.xml
            $ dvc add data.xml
            
            (.env) [ivan@ivan ~/Projects/test]$ md5 data.xml
            0c12dce03223117e423606e92650192c
            
            (.env) [ivan@ivan ~/Projects/test]$ tree
            Parsing XML by element in Python using ElementTree
            Pythondot img9Lines of Code : 5dot img9License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            elems_to_delete = [child for child in root if child.tag != 'DVC']
            
            for elem in elems_to_delete:
                root.remove(elem)
            
            wxPython DataViewCtrl Child Text Item Editor Appears Incorrectly
            Pythondot img10Lines of Code : 22dot img10License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            def GetParent(self, item):
                obj = self.ItemToObject(item)
            
                if obj.parent is None or obj.parent == self.data:  # if the parent is the invisible root node, return null
                    return dv.NullDataViewItem
            
                return self.ObjectToItem(o

            Community Discussions

            QUESTION

            Error with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repo
            Asked 2022-Mar-11 at 18:08

            I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.

            I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.

            The problem comes when I use colab to run my project. So what I did was the following:

            1. Created a new notebook on colab
            2. Successfully git-cloned my machine learning project (repository A)
            3. Ran "!pip install dvc"
            4. Ran "!dvc pull -v" (This is what causes the error)

            On step 4, I got the error (this is the full stack trace)

            ...

            ANSWER

            Answered 2022-Mar-11 at 18:08

            To summarize the discussion in the comments thread.

            Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)

            The same way you would not be able to run:

            Source https://stackoverflow.com/questions/71378280

            QUESTION

            DVC Experiment management workflow
            Asked 2022-Mar-03 at 15:05

            I'm struggling with the DVC experiment management. Suppose the following scenario:

            I have params.yaml file:

            ...

            ANSWER

            Answered 2022-Mar-03 at 15:05

            You did everything alright. In the end, after pulling, you can see that when using dvc exp show your experiments will be there. To restore the experiment available from your experiment list into your workspace, you simply need to run dvc exp apply exp_66. DVC will make sure that the changes corresponding to this experiment will be checked out.

            Your workflow seems correct so far. One addition: once you make sure one of the experiments is what you want to "keep" in git history, you can use dvc exp branch {exp_id} {branch_name} to create a separate branch for this experiment. Then you can use git commands to save the changes.

            Source https://stackoverflow.com/questions/71338160

            QUESTION

            How to merge data (CSV) files from multiple branches (Git and DVC)?
            Asked 2022-Feb-17 at 21:13

            Background: In my projects I'm using GIT and DVC to keep track of versions:

            • GIT - only for source codes
            • DVC - for dataset, model objects and outputs

            I'm testing different approaches in separate branches, i.e:

            • random_forest
            • neural_network_1
            • ...

            Typically as an output I'm keeping predictions in csv file with standarised name (i.e.: pred_test.csv). As a consequence in different branches I've different pred_test.csv files. The structure of the file is very simple, it contains two columns:

            • ID
            • Prediction

            Question: What is the best way to merge those prediction files into single big file?

            I would like to obtain a file with structure:

            • ID
            • Prediction_random_forest
            • Prediction_neural_network_1
            • Prediction_...

            My main issue is how to access files with predictions which are in different branches?

            ...

            ANSWER

            Answered 2022-Feb-17 at 15:51

            I would try to use dvc get in this case:

            Source https://stackoverflow.com/questions/71155959

            QUESTION

            Multiple users in DVC
            Asked 2022-Jan-31 at 17:45

            I would like to ask if it is possible to use DVC with several accounts on the same machine. At the moment, all commands (dvc pull, dvc push, ...) are executed under my name. But after several people joined this project too, I do not want them to execute commands under my name.

            When I was alone on this project I generated ssh key:

            ...

            ANSWER

            Answered 2022-Jan-31 at 17:45

            You need to make the "username" part of the config personalized based on who is running the command. There are a few options to do this (based on this document, see the SSH part):

            Basic options are:
            • User defined in the SSH config file (e.g. ~/.ssh/config) for this host (URL);
            • Current system user;

            So, the simplest even options could be just remove it from the URL and rely on the current system user?

            Local (git-ignored or per-project DVC config) config

            You could do is to remove the username part from the url and run something like this:

            Source https://stackoverflow.com/questions/70928144

            QUESTION

            Run `rlang::last_error()` to see where the error occurred
            Asked 2022-Jan-23 at 23:54

            I'm hoping someone can help me to figure out the issue with my code. I'm trying to figure out the issue with my code for like 5 hours and checked these links 1,2 but couldn't figure out the issue with my code I'm trying to build a Shiny app but when I run my code, I keep getting the error message:

            ...

            ANSWER

            Answered 2022-Jan-20 at 15:53

            You put every column name to lowercase with rename_all(tolower). Therefore, column Date doesn't exist but column date does. Replacing Date by date works.

            You have to fix that for all other column names in mutate(). Also, you modify the variable City but it is not in the data you provide (perhaps you just forgot to include it).

            I didn't run the app but this should fix your dplyr error.

            Source https://stackoverflow.com/questions/70781242

            QUESTION

            CSS code problem that shows the navigation bar incorrectly
            Asked 2022-Jan-15 at 23:01

            There is something wrong with my CSS code, once I added the CSS code of the products (starting from #lap) the navigation bar showed incorrectly, and when I remove the "}" that close "@keyframe slide" the navigation bar shows correctly but ofc the products CSS code doesn't render cuz "@keyframe slide" remains unclosed.

            ...

            ANSWER

            Answered 2022-Jan-15 at 23:01

            I think your issue is with:

            Source https://stackoverflow.com/questions/70725937

            QUESTION

            DVC Shared Windows Directory Setup
            Asked 2022-Jan-03 at 08:44

            I have one Linux machine and one Windows machine for developments. For data sharing, we have set up a shared Windows directory in another Windows machine, which both my Linux and Windows can access.

            I am now using DVC for version control of the shared data. To make it easy, I mount the shared Windows folder both in Windows and in Linux development machine. In Windows, it looks like

            ...

            ANSWER

            Answered 2022-Jan-03 at 03:08

            If you are using a local remote this way, you won't be able to have to the same url on both platforms since the mount points are different (as you already realized).

            The simplest way to configure this would be to pick one (Linux or Windows) url to use as your default case that gets git-committed into .dvc/config. On the other platform you (or your users) can override that url in the local configuration file: .dvc/config.local.

            (Note that .dvc/config.local is a git-ignored file and will not be included in any commits)

            So if you wanted Windows to be the default case, in .dvc/config you would have:

            Source https://stackoverflow.com/questions/70560288

            QUESTION

            Not able to run linux command in background from dockerfile?
            Asked 2021-Dec-15 at 09:25

            Here's my docker file,

            ...

            ANSWER

            Answered 2021-Dec-15 at 09:25

            Why is that?

            Because your docker container is configured to run /usr/local/bin/gunicorn, as defined by the ENTRYPOINT instruction.

            how can I run that above command in background and go to entrypoint in docker file.

            The standard way to do this is to write a wrapper script which executes all programs you need. So for this example, something like run.sh:

            Source https://stackoverflow.com/questions/70355410

            QUESTION

            Why My cells in collectionView are formed faster than the method of receiving data, for their formation?
            Asked 2021-Nov-23 at 00:18

            In my application, after authorization, the user is taken to a screen that displays news according to the specified parameters, news is transmitted through the API.

            in the viewWillAppear method, the getUserSettings method is triggered, in which the fetchNewsData method is triggered, which fills an array with news, based on this array, collection cells are formed. The array is filled with actual data from the database, which contains user settings. My code is below:

            ...

            ANSWER

            Answered 2021-Nov-23 at 00:18

            Based on the code you posted, I'm guessing that you will need to clear the newsArray before you load content into it again. As your code is written now, you append new news to it. This would lead to you continually adding to it instead of replacing what was there.

            Source https://stackoverflow.com/questions/70073875

            QUESTION

            Is the default DVC behavior to store connection data in git?
            Asked 2021-Oct-27 at 11:06

            I've recently started to play with DVC, and I was a bit surprised to see the getting started docs are suggesting to store .dvc/config in git.

            This seemed like a fine idea at first, but then I noticed that my Azure Blob Storage account (i.e. my Azure username) is also stored in .dvc/config, which means it would end up in git. Making it not ideal for team collaboration scenarios.

            What's even less ideal (read: really scary) is that connection strings entered using dvc remote modify blah connection_string ... also end up in .dvc/config, making them end up in git and, in the case of open source projects, making them end up in very interesting places.

            Am I doing something obviously wrong? I wouldn't expect the getting started docs to go very deep into security issues, but I wouldn't expect them to store connection strings in source control either.

            My base assumption is that I'm misunderstanding/misconfiguring something, I'd be curious to know what.

            ...

            ANSWER

            Answered 2021-Oct-27 at 11:06

            DVC has few "levels" of config, that can be controlled with proper flag:

            • --local - repository level, ignored by git by default - designated for project-scope, sensitive data
            • project - same as above, not ignored - designated to specify non-sensitive data (it is the default)
            • --global / --system - for common config for more repositories.

            More information can be found in the docs.

            Source https://stackoverflow.com/questions/69725612

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install dvc

            You can install using 'pip install dvc' or download it from GitHub, PyPI.
            You can use dvc like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install dvc

          • CLONE
          • HTTPS

            https://github.com/iterative/dvc.git

          • CLI

            gh repo clone iterative/dvc

          • sshUrl

            git@github.com:iterative/dvc.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link