split-folders | Split folders with files | Machine Learning library

by jfilter Python Version: 0.5.1 License: MIT

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | split-folders Summary

split-folders is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning applications. split-folders has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. However split-folders build file is not available. You can install using 'pip install split-folders' or download it from GitHub, PyPI.

Split folders with files (e.g. images) into train, validation and test (dataset) folders.

Support

Quality

Security

License

Reuse

Support

split-folders has a low active ecosystem.

It has 265 star(s) with 46 fork(s). There are 7 watchers for this library.

It had no major release in the last 12 months.

There are 0 open issues and 28 have been closed. On average issues are closed in 149 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of split-folders is 0.5.1

Quality

split-folders has 0 bugs and 10 code smells.

Security

split-folders has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

split-folders code analysis shows 0 unresolved vulnerabilities.

There are 1 security hotspots that need review.

License

split-folders is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

split-folders releases are not available. You will need to build from source code and install.

Deployable package is available in PyPI.

split-folders has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions, examples and code snippets are available.

split-folders saves you 126 person hours of effort in developing the same functionality from scratch.

It has 448 lines of code, 31 functions and 7 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed split-folders and discovered the below as its top functions. This is intended to give you an instant insight into split-folders implemented functionality, and help decide if they suit your requirements.

Parse command line arguments
Copy files from a fixed directory
Group files by prefix
Copy the files contained in the input directory
Split the class directory with fixed parameters
Copy files_type to output directory
Return a list of training and validation files
Checks input folder
Return a list of the files in the class directory
Splits the class directory with the given ratio
List all files in a directory
List all directories in a directory

Get all kandi verified functions for this library.

split-folders Key Features

No Key Features are available at this moment for split-folders.

split-folders Examples and Code Snippets

No Code Snippets are available at this moment for split-folders.

Community Discussions

Trending Discussions on split-folders

What is "seed" for in splitting test-val data in Python and how to come up with a correct number?

QUESTION

What is "seed" for in splitting test-val data in Python and how to come up with a correct number?

Asked 2021-May-19 at 05:35

I'm trying to split my image dataset so it can have a training set and validation set. I found this Python's library called split-folders. The syntax is easy to understand

splitfolders.ratio("input_folder", output="output", seed=1337, ratio=(.8, .1, .1), group_prefix=None)

But I don't know about this seed parameter and what it does. The description on the page only says that "a seed makes splits reproducible" and that "it shuffles the items" but it doesn't really explain anything for me. I have googled about it and none of them gave me a clear answer. Anyone can give me a brief explanation?

The default number is 1337, but why? What does it mean to have the seed set to 1337? How did they come up with that number? How do I find the correct seed for my dataset?

...

ANSWER

Answered 2021-May-19 at 05:34

When you split your corpus to train, validate, and test set, you randomly assign one data point to one of these three sets. Randomness is traceable using seeds.

Imagine, you have a random generator, a BlackBox, that gives you a series of random numbers; But for each given seed, the sequence it generates will be always identical. For example, for seed=1337, a random generator will always generate a sequence of random numbers like 12,901,110,1,.... on the same computer.

Why we care about tracing the randomness, especially in the case of dividing the corpus for training? Because most of the time, you want to repeat the same experiment, with the same data. So if you do not use the seed value, each time you run the same experiment, you will end up with different settings for training.

The seed value itself is not important, as long as you get it by some value you know it is fixed during your experiments. I personally set it to a prime number.

Source https://stackoverflow.com/questions/67597367

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install split-folders

This package is Python only and there are no external dependencies. Optionally, you may install tqdm to get get a progress bar when moving files.

Support

If you have a question, found a bug or want to propose a new feature, have a look at the issues page. Pull requests are especially welcomed when they fix bugs or improve the code quality.

Find more information at: