dvc | 🦉 Data Version Control | Git for Data & Models | ML Experiments Management | Machine Learning library

by iterative Python Version: 3.51.2 License: Apache-2.0

X-Ray Key Features Code Snippets(10)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | dvc Summary

dvc is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning, Tensorflow applications. dvc has no bugs, it has no vulnerabilities, it has a Permissive License and it has high support. However dvc build file is not available. You can install using 'pip install dvc' or download it from GitHub, PyPI.

🦉 Data Version Control | Git for Data & Models | ML Experiments Management

Support

Quality

Security

License

Reuse

Support

dvc has a highly active ecosystem.

It has 11637 star(s) with 1064 fork(s). There are 135 watchers for this library.

There were 4 major release(s) in the last 12 months.

There are 563 open issues and 3756 have been closed. On average issues are closed in 134 days. There are 9 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of dvc is 3.51.2

Quality

dvc has 0 bugs and 0 code smells.

Security

dvc has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

dvc code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

dvc is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

dvc releases are available to install and integrate.

Deployable package is available in PyPI.

dvc has no build file. You will be need to create the build yourself to build the component from source.

It has 59024 lines of code, 4179 functions and 516 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed dvc and discovered the below as its top functions. This is intended to give you an instant insight into dvc implemented functionality, and help decide if they suit your requirements.

Adds the argument parser
Set required subparsers
Append documentation link to Dvc documentation
Format a link
Produce a reproducible experiment
Get the args and kwargs from the working directory
Convert environment variable to bool
Unpack the arguments from a pickle file
Run the command
Push the given targets to the remote server
Fetch the experiments from the remote repo
Reproduces the experiment
Removes experiments from the current workspace
Pull commits from git remote
Remove exp_names from repo
Load data from a pipeline
Replicate the repo
List the files under the given path
Return a dict of parameters matching the given parameters
Fetch dependencies
Generate y values for y values
Log standard exceptions
Commit repository
Run celery
Run the function
Configure logging

Get all kandi verified functions for this library.

dvc Key Features

No Key Features are available at this moment for dvc.

dvc Examples and Code Snippets

DVC Recorder,Walkthrough

Jupyter Notebook

Lines of Code : 56

License : Permissive (Apache-2.0)

Copy

# 

# =============================
#load your params, input, and output files
params = yaml.safe_load(open('params.yaml'))['prepare']

if len(sys.argv) != 2:
    sys.stderr.write("Arguments error. Usage:\n")
    sys.stderr.write("\

DVC filesystem abstraction layer (0.8.0),Installation,Usage

Python

Lines of Code : 49

License : No License

Copy

from fs import open_fs
fs1 = open_fs("dvc://github.com/covid-genomics/data-artifacts") # Clone by https
fs2 = open_fs("dvc://ssh@github.com/covid-genomics/data-artifacts") # Clone by ssh
fs3 = open_fs("dvc://@github.com/covid-genomics/data-artifacts"

DVC command

Python

Lines of Code : 46

License : Non-SPDX (NOASSERTION)

Copy

:param str input_csv_file: Path to input file
:param str output_csv_file_1: Path to output file 1
:param str output_csv_file_2: Path to output file 2
[...]

[:dvc-[in|out][\s{related_param}]?:[\s{file_path}]?]*
[:dvc-extra: {python_other_param}]?

:d

Use parameters from additional configs in dvc 2.0

Python

Lines of Code : 22

License : Strong Copyleft (CC BY-SA 4.0)

Copy

stages:
  train:
    cmd: python train.py
    deps:
      - users.csv
    params:
      - params.py:
          - BOOL
          - INT
          - TrainConfig.EPOCHS
          - TrainConfig.layers
    outs:
      - model.pkl

How to access DVC-controlled files from Oracle?

Python

Lines of Code : 23

License : Strong Copyleft (CC BY-SA 4.0)

Copy

with dvc.api.open(
        'activity.log',
        repo='location/of/dvc/project',
        remote='my-s3-bucket'
        ) as fd:
    for line in fd:
        match = re.search(r'user=(\w+)', line)
        # ... Process users activity log
<

Pip install multiple extra dependencies of a single package via requirement file

Python

Lines of Code : 3

License : Strong Copyleft (CC BY-SA 4.0)

Copy

# requirements.txt
dvc[s3,gs]

Pyinstaller: AttributeError: module 'enum' has no attribute 'IntFlag'

Python

Lines of Code : 2

License : Strong Copyleft (CC BY-SA 4.0)

Copy

pip uninstall enum34

By how much can i approx. reduce disk volume by using dvc?

Python

Lines of Code : 33

License : Strong Copyleft (CC BY-SA 4.0)

Copy

$ dvc add data.xml

$ echo "" >> data.xml
$ dvc add data.xml

(.env) [ivan@ivan ~/Projects/test]$ md5 data.xml
0c12dce03223117e423606e92650192c

(.env) [ivan@ivan ~/Projects/test]$ tree

Parsing XML by element in Python using ElementTree

Python

Lines of Code : 5

License : Strong Copyleft (CC BY-SA 4.0)

Copy

elems_to_delete = [child for child in root if child.tag != 'DVC']

for elem in elems_to_delete:
    root.remove(elem)

wxPython DataViewCtrl Child Text Item Editor Appears Incorrectly

Python

Lines of Code : 22

License : Strong Copyleft (CC BY-SA 4.0)

Copy

def GetParent(self, item):
    obj = self.ItemToObject(item)

    if obj.parent is None or obj.parent == self.data:  # if the parent is the invisible root node, return null
        return dv.NullDataViewItem

    return self.ObjectToItem(o

Community Discussions

Trending Discussions on dvc

Error with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repo

DVC Experiment management workflow

How to merge data (CSV) files from multiple branches (Git and DVC)?

Multiple users in DVC

Run `rlang::last_error()` to see where the error occurred

CSS code problem that shows the navigation bar incorrectly

DVC Shared Windows Directory Setup

Not able to run linux command in background from dockerfile?

Why My cells in collectionView are formed faster than the method of receiving data, for their formation?

Is the default DVC behavior to store connection data in git?

QUESTION

Error with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repo

Asked 2022-Mar-11 at 18:08

I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.

I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.

The problem comes when I use colab to run my project. So what I did was the following:

Created a new notebook on colab
Successfully git-cloned my machine learning project (repository A)
Ran "!pip install dvc"
Ran "!dvc pull -v" (This is what causes the error)

On step 4, I got the error (this is the full stack trace)

...

ANSWER

Answered 2022-Mar-11 at 18:08

To summarize the discussion in the comments thread.

Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)

The same way you would not be able to run:

Source https://stackoverflow.com/questions/71378280

QUESTION

DVC Experiment management workflow

Asked 2022-Mar-03 at 15:05

I'm struggling with the DVC experiment management. Suppose the following scenario:

I have params.yaml file:

...

ANSWER

Answered 2022-Mar-03 at 15:05

You did everything alright. In the end, after pulling, you can see that when using dvc exp show your experiments will be there. To restore the experiment available from your experiment list into your workspace, you simply need to run dvc exp apply exp_66. DVC will make sure that the changes corresponding to this experiment will be checked out.

Your workflow seems correct so far. One addition: once you make sure one of the experiments is what you want to "keep" in git history, you can use dvc exp branch {exp_id} {branch_name} to create a separate branch for this experiment. Then you can use git commands to save the changes.

Source https://stackoverflow.com/questions/71338160

QUESTION

How to merge data (CSV) files from multiple branches (Git and DVC)?

Asked 2022-Feb-17 at 21:13

Background: In my projects I'm using GIT and DVC to keep track of versions:

GIT - only for source codes
DVC - for dataset, model objects and outputs

I'm testing different approaches in separate branches, i.e:

random_forest
neural_network_1
...

Typically as an output I'm keeping predictions in csv file with standarised name (i.e.: pred_test.csv). As a consequence in different branches I've different pred_test.csv files. The structure of the file is very simple, it contains two columns:

ID
Prediction

Question: What is the best way to merge those prediction files into single big file?

I would like to obtain a file with structure:

ID
Prediction_random_forest
Prediction_neural_network_1
Prediction_...

My main issue is how to access files with predictions which are in different branches?

...

ANSWER

Answered 2022-Feb-17 at 15:51

I would try to use dvc get in this case:

Source https://stackoverflow.com/questions/71155959

QUESTION

Multiple users in DVC

Asked 2022-Jan-31 at 17:45

I would like to ask if it is possible to use DVC with several accounts on the same machine. At the moment, all commands (dvc pull, dvc push, ...) are executed under my name. But after several people joined this project too, I do not want them to execute commands under my name.

When I was alone on this project I generated ssh key:

...

ANSWER

Answered 2022-Jan-31 at 17:45

You need to make the "username" part of the config personalized based on who is running the command. There are a few options to do this (based on this document, see the SSH part):

Basic options are:

User defined in the SSH config file (e.g. ~/.ssh/config) for this host (URL);
Current system user;

So, the simplest even options could be just remove it from the URL and rely on the current system user?

Local (git-ignored or per-project DVC config) config

You could do is to remove the username part from the url and run something like this:

Source https://stackoverflow.com/questions/70928144

QUESTION

Run `rlang::last_error()` to see where the error occurred

Asked 2022-Jan-23 at 23:54

I'm hoping someone can help me to figure out the issue with my code. I'm trying to figure out the issue with my code for like 5 hours and checked these links 1,2 but couldn't figure out the issue with my code I'm trying to build a Shiny app but when I run my code, I keep getting the error message:

...

ANSWER

Answered 2022-Jan-20 at 15:53

You put every column name to lowercase with rename_all(tolower). Therefore, column Date doesn't exist but column date does. Replacing Date by date works.

You have to fix that for all other column names in mutate(). Also, you modify the variable City but it is not in the data you provide (perhaps you just forgot to include it).

I didn't run the app but this should fix your dplyr error.

Source https://stackoverflow.com/questions/70781242

QUESTION

CSS code problem that shows the navigation bar incorrectly

Asked 2022-Jan-15 at 23:01

There is something wrong with my CSS code, once I added the CSS code of the products (starting from #lap) the navigation bar showed incorrectly, and when I remove the "}" that close "@keyframe slide" the navigation bar shows correctly but ofc the products CSS code doesn't render cuz "@keyframe slide" remains unclosed.

...

ANSWER

Answered 2022-Jan-15 at 23:01

I think your issue is with:

Source https://stackoverflow.com/questions/70725937

QUESTION

DVC Shared Windows Directory Setup

Asked 2022-Jan-03 at 08:44

I have one Linux machine and one Windows machine for developments. For data sharing, we have set up a shared Windows directory in another Windows machine, which both my Linux and Windows can access.

I am now using DVC for version control of the shared data. To make it easy, I mount the shared Windows folder both in Windows and in Linux development machine. In Windows, it looks like

...

ANSWER

Answered 2022-Jan-03 at 03:08

If you are using a local remote this way, you won't be able to have to the same url on both platforms since the mount points are different (as you already realized).

The simplest way to configure this would be to pick one (Linux or Windows) url to use as your default case that gets git-committed into .dvc/config. On the other platform you (or your users) can override that url in the local configuration file: .dvc/config.local.

(Note that .dvc/config.local is a git-ignored file and will not be included in any commits)

So if you wanted Windows to be the default case, in .dvc/config you would have:

Source https://stackoverflow.com/questions/70560288

QUESTION

Not able to run linux command in background from dockerfile?

Asked 2021-Dec-15 at 09:25

Here's my docker file,

...

ANSWER

Answered 2021-Dec-15 at 09:25

Why is that?

Because your docker container is configured to run /usr/local/bin/gunicorn, as defined by the ENTRYPOINT instruction.

how can I run that above command in background and go to entrypoint in docker file.

The standard way to do this is to write a wrapper script which executes all programs you need. So for this example, something like run.sh:

Source https://stackoverflow.com/questions/70355410

QUESTION

Why My cells in collectionView are formed faster than the method of receiving data, for their formation?

Asked 2021-Nov-23 at 00:18

In my application, after authorization, the user is taken to a screen that displays news according to the specified parameters, news is transmitted through the API.

in the viewWillAppear method, the getUserSettings method is triggered, in which the fetchNewsData method is triggered, which fills an array with news, based on this array, collection cells are formed. The array is filled with actual data from the database, which contains user settings. My code is below:

...

ANSWER

Answered 2021-Nov-23 at 00:18

Based on the code you posted, I'm guessing that you will need to clear the newsArray before you load content into it again. As your code is written now, you append new news to it. This would lead to you continually adding to it instead of replacing what was there.

Source https://stackoverflow.com/questions/70073875

QUESTION

Is the default DVC behavior to store connection data in git?

Asked 2021-Oct-27 at 11:06

I've recently started to play with DVC, and I was a bit surprised to see the getting started docs are suggesting to store .dvc/config in git.

This seemed like a fine idea at first, but then I noticed that my Azure Blob Storage account (i.e. my Azure username) is also stored in .dvc/config, which means it would end up in git. Making it not ideal for team collaboration scenarios.

What's even less ideal (read: really scary) is that connection strings entered using dvc remote modify blah connection_string ... also end up in .dvc/config, making them end up in git and, in the case of open source projects, making them end up in very interesting places.

Am I doing something obviously wrong? I wouldn't expect the getting started docs to go very deep into security issues, but I wouldn't expect them to store connection strings in source control either.

My base assumption is that I'm misunderstanding/misconfiguring something, I'd be curious to know what.

...

ANSWER

Answered 2021-Oct-27 at 11:06

DVC has few "levels" of config, that can be controlled with proper flag:

--local - repository level, ignored by git by default - designated for project-scope, sensitive data
project - same as above, not ignored - designated to specify non-sensitive data (it is the default)
--global / --system - for common config for more repositories.

More information can be found in the docs.

Source https://stackoverflow.com/questions/69725612

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install dvc

You can install using 'pip install dvc' or download it from GitHub, PyPI.
You can use dvc like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: