Nested-Cross-Validation | Nested cross-validation for unbiased predictions | Machine Learning library

by casperbh96 Python Version: v0.916 License: MIT

X-Ray Key Features Code Snippets(2)Community Discussions(4)Vulnerabilities Install Support

kandi X-RAY | Nested-Cross-Validation Summary

Nested-Cross-Validation is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Machine Learning, Pytorch, Keras, Pandas applications. Nested-Cross-Validation has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can install using 'pip install Nested-Cross-Validation' or download it from GitHub, PyPI.

Nested cross-validation for unbiased predictions. Can be used with Scikit-Learn, XGBoost, Keras and LightGBM, or any other estimator that implements the scikit-learn interface.

Support

Quality

Security

License

Reuse

Support

Nested-Cross-Validation has a low active ecosystem.

It has 54 star(s) with 19 fork(s). There are 4 watchers for this library.

It had no major release in the last 12 months.

There are 6 open issues and 8 have been closed. On average issues are closed in 97 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of Nested-Cross-Validation is v0.916

Quality

Nested-Cross-Validation has 0 bugs and 12 code smells.

Security

Nested-Cross-Validation has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

Nested-Cross-Validation code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

Nested-Cross-Validation is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

Nested-Cross-Validation releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

Nested-Cross-Validation saves you 77 person hours of effort in developing the same functionality from scratch.

It has 199 lines of code, 9 functions and 3 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed Nested-Cross-Validation and discovered the below as its top functions. This is intended to give you an instant insight into Nested-Cross-Validation implemented functionality, and help decide if they suit your requirements.

Fit the model .
Initialize the model .
Predict and return the metric for the given test .
Returns the best score from the results .
Plots the score vs variance
Given a list of best_inner_params_list .
Fit the feature elimination transformation .
Transforms the score value of the score .

Get all kandi verified functions for this library.

Nested-Cross-Validation Key Features

No Key Features are available at this moment for Nested-Cross-Validation.

Nested-Cross-Validation Examples and Code Snippets

Nested-Cross-Validation,Usage,Simple

Python

Lines of Code : 16

License : Permissive (MIT)

Copy

from nested_cv import NestedCV
from sklearn.ensemble import RandomForestRegressor

# Define a parameters grid
param_grid = {
     'max_depth': [3, None],
     'n_estimators': [10]
}

NCV = NestedCV(model=RandomForestRegressor(), params_grid=param_gri

Nested-Cross-Validation,Installing the pacakge:

Python

Lines of Code : 1

License : Permissive (MIT)

Copy

pip install nested-cv

Community Discussions

Trending Discussions on Nested-Cross-Validation

Why I got an TypeError: __init__() got an unexpected keyword argument 'outer_cv' in python sklearn?

Nested cross-validation: How does cross_validate handle GridSearchCV as its input estimator?

How to speed up nested cross validation in python?

GridSearchCV.best_score_ meaning when scoring set to 'accuracy' and CV

QUESTION

Why I got an TypeError: __init__() got an unexpected keyword argument 'outer_cv' in python sklearn?

Asked 2019-Dec-12 at 03:29

I have run the example code in the link: https://github.com/casperbh96/Nested-Cross-Validation/blob/master/Example%20Notebook%20-%20NestedCV.ipynb, but one error was got:init() got an unexpected keyword argument 'outer_cv', I have check the source code, the 'outer_cv' was included in the int(), how to solve it? The example code also pasted as following:

...

ANSWER

Answered 2019-Dec-12 at 03:23

This should get you closer, though i can't get NCV.fit to finish in any reasonable time:

Source https://stackoverflow.com/questions/59296954

QUESTION

Nested cross-validation: How does cross_validate handle GridSearchCV as its input estimator?

Asked 2019-May-16 at 12:15

The following code combines cross_validate with GridSearchCV to perform a nested cross-validation for an SVC on the iris dataset.

(Modified example of the following documentation page: https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py.)

...

ANSWER

Answered 2019-Mar-07 at 22:26

If clf is used as the estimator for cross_validate, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).

Update Yes, when you will pass the GridSearchCV classifier into cross-validate it will again split the training set into a test and train set. Here is a link describing this in more detail. Your diagram and assumption is correct.

Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?

Yes, as you can see from the answers here and here, the GridSearchCV returns the best_estimator in your case(since refit parameter is True by default in your case.) However, this best estimator will has to be trained again

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

As per your third and final question, Yes, it trains an estimator and returns it if return_estimator is set to True. See this line. Which makes sense, since how else is it supposed to return the scores without training an estimator in the first place ?

Update The reason the model is trained again is because the default use case for cross-validate does not assume that you give in the best classfier with the optimum parameters. In this case specifically, you are sending in a classifier from the GridSearchCV but if you send any untrained classifier it is supposed to be trained. What I mean to say here is that, yes, in your case it shouldn't train it again since you are already doing cross-validation using GridSearchCV and using the best estimator. However, there is no way for cross-validate to know this, hence, it assumes that you are sending in an un-optimized or rather untrained estimator, thus it has to train it again and return the scores for the same.

Source https://stackoverflow.com/questions/55030190

QUESTION

How to speed up nested cross validation in python?

Asked 2019-Apr-30 at 15:07

From what I've found there is 1 other question like this (Speed-up nested cross-validation) however installing MPI does not work for me after trying several fixes also suggested on this site and microsoft, so I am hoping there is another package or answer to this question.

I am looking to compare multiple algorithms and gridsearch a wide range of parameters (maybe too many parameters?), what ways are there besides mpi4py which could speed up running my code? As I understand it I cannot use n_jobs=-1 as that is then not nested?

Also to note, I have not been able to run this on the many parameters I am trying to look at below (runs longer than I have time). Only have results after 2 hours if I give each model only 2 parameters to compare. Also I run this code on a dataset of 252 rows and 25 feature columns with 4 categorical variables to predict ('certain', 'likely', 'possible', or 'unknown') whether a gene (with 252 genes) affects a disease. Using SMOTE increases the sample size to 420 which is then what goes into use.

...

ANSWER

Answered 2019-Apr-30 at 15:07

IIUC, you are trying to parallelize this example from the sklearn docs. If this is the case, then here is one possible approach to address

why dask is not working

and

Any kind of constructive guidance or further knowledge on this problem

General imports

Source https://stackoverflow.com/questions/55808504

QUESTION

GridSearchCV.best_score_ meaning when scoring set to 'accuracy' and CV

Asked 2018-Dec-28 at 16:46

I'm trying to find the best model Neural Network model applied for the classification of breast cancer samples on the well-known Wisconsin Cancer dataset (569 samples, 31 features + target). I'm using sklearn 0.18.1. I'm not using Normalization so far. I'll add it when I solve this question.

...

ANSWER

Answered 2017-Jun-12 at 09:12

The grid.best_score_ is the average of all cv folds for a single combination of the parameters you specify in the tuned_params.

In order to access other relevant details about the grid searching process, you can look at the grid.cv_results_ attribute.

From the documentation of GridSearchCV:

cv_results_ : dict of numpy (masked) ndarrays

Source https://stackoverflow.com/questions/44459845

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install Nested-Cross-Validation

You can install using 'pip install Nested-Cross-Validation' or download it from GitHub, PyPI.
You can use Nested-Cross-Validation like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: