Nested-Cross-Validation | Nested cross-validation for unbiased predictions | Machine Learning library
kandi X-RAY | Nested-Cross-Validation Summary
kandi X-RAY | Nested-Cross-Validation Summary
Nested cross-validation for unbiased predictions. Can be used with Scikit-Learn, XGBoost, Keras and LightGBM, or any other estimator that implements the scikit-learn interface.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Fit the model .
- Initialize the model .
- Predict and return the metric for the given test .
- Returns the best score from the results .
- Plots the score vs variance
- Given a list of best_inner_params_list .
- Fit the feature elimination transformation .
- Transforms the score value of the score .
Nested-Cross-Validation Key Features
Nested-Cross-Validation Examples and Code Snippets
from nested_cv import NestedCV
from sklearn.ensemble import RandomForestRegressor
# Define a parameters grid
param_grid = {
'max_depth': [3, None],
'n_estimators': [10]
}
NCV = NestedCV(model=RandomForestRegressor(), params_grid=param_gri
Community Discussions
Trending Discussions on Nested-Cross-Validation
QUESTION
I have run the example code in the link: https://github.com/casperbh96/Nested-Cross-Validation/blob/master/Example%20Notebook%20-%20NestedCV.ipynb, but one error was got:init() got an unexpected keyword argument 'outer_cv', I have check the source code, the 'outer_cv' was included in the int(), how to solve it? The example code also pasted as following:
...ANSWER
Answered 2019-Dec-12 at 03:23This should get you closer, though i can't get NCV.fit to finish in any reasonable time:
QUESTION
The following code combines cross_validate
with GridSearchCV
to perform a nested cross-validation for an SVC on the iris dataset.
(Modified example of the following documentation page: https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py.)
...ANSWER
Answered 2019-Mar-07 at 22:26If
clf
is used as the estimator forcross_validate
, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?
Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).
Update Yes, when you will pass the GridSearchCV
classifier into cross-validate
it will again split the training set into a test and train set. Here is a link describing this in more detail. Your diagram and assumption is correct.
Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?
Yes, as you can see from the answers here and here, the GridSearchCV returns the best_estimator in your case(since refit
parameter is True
by default in your case.) However, this best estimator will has to be trained again
Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?
As per your third and final question, Yes, it trains an estimator and returns it if return_estimator
is set to True
. See this line. Which makes sense, since how else is it supposed to return the scores without training an estimator in the first place ?
Update
The reason the model is trained again is because the default use case for cross-validate does not assume that you give in the best classfier with the optimum parameters. In this case specifically, you are sending in a classifier from the GridSearchCV
but if you send any untrained classifier it is supposed to be trained. What I mean to say here is that, yes, in your case it shouldn't train it again since you are already doing cross-validation using GridSearchCV
and using the best estimator. However, there is no way for cross-validate
to know this, hence, it assumes that you are sending in an un-optimized or rather untrained estimator, thus it has to train it again and return the scores for the same.
QUESTION
From what I've found there is 1 other question like this (Speed-up nested cross-validation) however installing MPI does not work for me after trying several fixes also suggested on this site and microsoft, so I am hoping there is another package or answer to this question.
I am looking to compare multiple algorithms and gridsearch a wide range of parameters (maybe too many parameters?), what ways are there besides mpi4py which could speed up running my code? As I understand it I cannot use n_jobs=-1 as that is then not nested?
Also to note, I have not been able to run this on the many parameters I am trying to look at below (runs longer than I have time). Only have results after 2 hours if I give each model only 2 parameters to compare. Also I run this code on a dataset of 252 rows and 25 feature columns with 4 categorical variables to predict ('certain', 'likely', 'possible', or 'unknown') whether a gene (with 252 genes) affects a disease. Using SMOTE increases the sample size to 420 which is then what goes into use.
...ANSWER
Answered 2019-Apr-30 at 15:07IIUC, you are trying to parallelize this example from the sklearn
docs. If this is the case, then here is one possible approach to address
why dask is not working
and
Any kind of constructive guidance or further knowledge on this problem
General imports
QUESTION
I'm trying to find the best model Neural Network model applied for the classification of breast cancer samples on the well-known Wisconsin Cancer dataset (569 samples, 31 features + target). I'm using sklearn 0.18.1. I'm not using Normalization so far. I'll add it when I solve this question.
...ANSWER
Answered 2017-Jun-12 at 09:12The grid.best_score_
is the average of all cv folds for a single combination of the parameters you specify in the tuned_params
.
In order to access other relevant details about the grid searching process, you can look at the grid.cv_results_
attribute.
From the documentation of GridSearchCV:
cv_results_ : dict of numpy (masked) ndarrays
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Nested-Cross-Validation
You can use Nested-Cross-Validation like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page