dvc | 🦉 Data Version Control | Git for Data & Models | ML Experiments Management | Machine Learning library
kandi X-RAY | dvc Summary
kandi X-RAY | dvc Summary
🦉 Data Version Control | Git for Data & Models | ML Experiments Management
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Adds the argument parser
- Set required subparsers
- Append documentation link to Dvc documentation
- Format a link
- Produce a reproducible experiment
- Get the args and kwargs from the working directory
- Convert environment variable to bool
- Unpack the arguments from a pickle file
- Run the command
- Push the given targets to the remote server
- Fetch the experiments from the remote repo
- Reproduces the experiment
- Removes experiments from the current workspace
- Pull commits from git remote
- Remove exp_names from repo
- Load data from a pipeline
- Replicate the repo
- List the files under the given path
- Return a dict of parameters matching the given parameters
- Fetch dependencies
- Generate y values for y values
- Log standard exceptions
- Commit repository
- Run celery
- Run the function
- Configure logging
dvc Key Features
dvc Examples and Code Snippets
from fs import open_fs
fs1 = open_fs("dvc://github.com/covid-genomics/data-artifacts") # Clone by https
fs2 = open_fs("dvc://ssh@github.com/covid-genomics/data-artifacts") # Clone by ssh
fs3 = open_fs("dvc://@github.com/covid-genomics/data-artifacts"
:param str input_csv_file: Path to input file
:param str output_csv_file_1: Path to output file 1
:param str output_csv_file_2: Path to output file 2
[...]
[:dvc-[in|out][\s{related_param}]?:[\s{file_path}]?]*
[:dvc-extra: {python_other_param}]?
:d
stages:
train:
cmd: python train.py
deps:
- users.csv
params:
- params.py:
- BOOL
- INT
- TrainConfig.EPOCHS
- TrainConfig.layers
outs:
- model.pkl
<with dvc.api.open(
'activity.log',
repo='location/of/dvc/project',
remote='my-s3-bucket'
) as fd:
for line in fd:
match = re.search(r'user=(\w+)', line)
# ... Process users activity log
<
# requirements.txt
dvc[s3,gs]
$ dvc add data.xml
$ echo "" >> data.xml
$ dvc add data.xml
(.env) [ivan@ivan ~/Projects/test]$ md5 data.xml
0c12dce03223117e423606e92650192c
(.env) [ivan@ivan ~/Projects/test]$ tree
elems_to_delete = [child for child in root if child.tag != 'DVC']
for elem in elems_to_delete:
root.remove(elem)
def GetParent(self, item):
obj = self.ItemToObject(item)
if obj.parent is None or obj.parent == self.data: # if the parent is the invisible root node, return null
return dv.NullDataViewItem
return self.ObjectToItem(o
Community Discussions
Trending Discussions on dvc
QUESTION
I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.
I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.
The problem comes when I use colab to run my project. So what I did was the following:
- Created a new notebook on colab
- Successfully git-cloned my machine learning project (repository A)
- Ran "!pip install dvc"
- Ran "!dvc pull -v" (This is what causes the error)
On step 4, I got the error (this is the full stack trace)
...ANSWER
Answered 2022-Mar-11 at 18:08To summarize the discussion in the comments thread.
Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)
The same way you would not be able to run:
QUESTION
I'm struggling with the DVC experiment management. Suppose the following scenario:
I have params.yaml
file:
ANSWER
Answered 2022-Mar-03 at 15:05You did everything alright. In the end, after pulling, you can see that when using dvc exp show
your experiments will be there. To restore the experiment available from your experiment list into your workspace, you simply need to run dvc exp apply exp_66
. DVC will make sure that the changes corresponding to this experiment will be checked out.
Your workflow seems correct so far. One addition: once you make sure one of the experiments is what you want to "keep" in git history, you can use dvc exp branch {exp_id} {branch_name}
to create a separate branch for this experiment. Then you can use git
commands to save the changes.
QUESTION
Background: In my projects I'm using GIT and DVC to keep track of versions:
- GIT - only for source codes
- DVC - for dataset, model objects and outputs
I'm testing different approaches in separate branches, i.e:
- random_forest
- neural_network_1
- ...
Typically as an output I'm keeping predictions in csv file with standarised name (i.e.: pred_test.csv). As a consequence in different branches I've different pred_test.csv files. The structure of the file is very simple, it contains two columns:
- ID
- Prediction
Question: What is the best way to merge those prediction files into single big file?
I would like to obtain a file with structure:
- ID
- Prediction_random_forest
- Prediction_neural_network_1
- Prediction_...
My main issue is how to access files with predictions which are in different branches?
...ANSWER
Answered 2022-Feb-17 at 15:51I would try to use dvc get
in this case:
QUESTION
I would like to ask if it is possible to use DVC with several accounts on the same machine. At the moment, all commands (dvc pull
, dvc push
, ...) are executed under my name. But after several people joined this project too, I do not want them to execute commands under my name.
When I was alone on this project I generated ssh key:
...ANSWER
Answered 2022-Jan-31 at 17:45You need to make the "username" part of the config personalized based on who is running the command. There are a few options to do this (based on this document, see the SSH part):
Basic options are:- User defined in the SSH config file (e.g.
~/.ssh/config
) for this host (URL); - Current system user;
So, the simplest even options could be just remove it from the URL and rely on the current system user?
Local (git-ignored or per-project DVC config) configYou could do is to remove the username
part from the url
and run something like this:
QUESTION
I'm hoping someone can help me to figure out the issue with my code. I'm trying to figure out the issue with my code for like 5 hours and checked these links 1,2 but couldn't figure out the issue with my code I'm trying to build a Shiny app but when I run my code, I keep getting the error message:
...ANSWER
Answered 2022-Jan-20 at 15:53You put every column name to lowercase with rename_all(tolower)
. Therefore, column Date
doesn't exist but column date
does. Replacing Date
by date
works.
You have to fix that for all other column names in mutate()
. Also, you modify the variable City
but it is not in the data you provide (perhaps you just forgot to include it).
I didn't run the app but this should fix your dplyr
error.
QUESTION
There is something wrong with my CSS code, once I added the CSS code of the products (starting from #lap) the navigation bar showed incorrectly, and when I remove the "}" that close "@keyframe slide" the navigation bar shows correctly but ofc the products CSS code doesn't render cuz "@keyframe slide" remains unclosed.
...ANSWER
Answered 2022-Jan-15 at 23:01I think your issue is with:
QUESTION
I have one Linux machine and one Windows machine for developments. For data sharing, we have set up a shared Windows directory in another Windows machine, which both my Linux and Windows can access.
I am now using DVC for version control of the shared data. To make it easy, I mount the shared Windows folder both in Windows and in Linux development machine. In Windows, it looks like
...ANSWER
Answered 2022-Jan-03 at 03:08If you are using a local remote this way, you won't be able to have to the same url
on both platforms since the mount points are different (as you already realized).
The simplest way to configure this would be to pick one (Linux or Windows) url
to use as your default case that gets git-committed into .dvc/config
. On the other platform you (or your users) can override that url
in the local configuration file: .dvc/config.local
.
(Note that .dvc/config.local
is a git-ignored file and will not be included in any commits)
So if you wanted Windows to be the default case, in .dvc/config
you would have:
QUESTION
Here's my docker file,
...ANSWER
Answered 2021-Dec-15 at 09:25Why is that?
Because your docker container is configured to run /usr/local/bin/gunicorn
, as defined by the ENTRYPOINT
instruction.
how can I run that above command in background and go to entrypoint in docker file.
The standard way to do this is to write a wrapper script which executes all programs you need. So for this example, something like run.sh
:
QUESTION
In my application, after authorization, the user is taken to a screen that displays news according to the specified parameters, news is transmitted through the API.
in the viewWillAppear method, the getUserSettings method is triggered, in which the fetchNewsData method is triggered, which fills an array with news, based on this array, collection cells are formed. The array is filled with actual data from the database, which contains user settings. My code is below:
...ANSWER
Answered 2021-Nov-23 at 00:18Based on the code you posted, I'm guessing that you will need to clear the newsArray before you load content into it again. As your code is written now, you append new news to it. This would lead to you continually adding to it instead of replacing what was there.
QUESTION
I've recently started to play with DVC, and I was a bit surprised to see the getting started docs are suggesting to store .dvc/config
in git.
This seemed like a fine idea at first, but then I noticed that my Azure Blob Storage account (i.e. my Azure username) is also stored in .dvc/config, which means it would end up in git. Making it not ideal for team collaboration scenarios.
What's even less ideal (read: really scary) is that connection strings entered using dvc remote modify blah connection_string ...
also end up in .dvc/config
, making them end up in git and, in the case of open source projects, making them end up in very interesting places.
Am I doing something obviously wrong? I wouldn't expect the getting started docs to go very deep into security issues, but I wouldn't expect them to store connection strings in source control either.
My base assumption is that I'm misunderstanding/misconfiguring something, I'd be curious to know what.
...ANSWER
Answered 2021-Oct-27 at 11:06DVC has few "levels" of config, that can be controlled with proper flag:
--local
- repository level, ignored by git by default - designated for project-scope, sensitive data- project - same as above, not ignored - designated to specify non-sensitive data (it is the default)
--global
/--system
- for common config for more repositories.
More information can be found in the docs.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install dvc
You can use dvc like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page