SuperLearner | Current version of the SuperLearner R package | Machine Learning library
kandi X-RAY | SuperLearner Summary
kandi X-RAY | SuperLearner Summary
This is the current version of the SuperLearner R package (version 2.*).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of SuperLearner
SuperLearner Key Features
SuperLearner Examples and Code Snippets
Community Discussions
Trending Discussions on SuperLearner
QUESTION
In the documentation for glm
, there is a method
option where in the documentation it says that
User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as glm.fit. If specified as a character string it is looked up from within the stats namespace.
I would like to use a binomial non-negative least squares optimizer so that the coefficients are non-negative and sum up to 1. An example of this optimizer being used is in the SuperLearner
package with the option method = "method.NNLS"
. Below is a reproducible example:
ANSWER
Answered 2021-Aug-15 at 18:54A way (not the only way, possibly not the most efficient or most compact) to solve this problem is to do general maximum-likelihood estimation on parameters in a transformed space; probably the most common such transformation is the additive log ratio transformation (e.g. as in compositions::alr()
). This solution uses the bbmle
package, which is a wrapper around optim()
; in this case, it doesn't actually offer that much advantage over directly using optim()
QUESTION
I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3
help file tells me (?mlr_learners_avg
)
Predictions are averaged using weights (in order of appearance in the data) which are optimized using nonlinear optimization from the package "nloptr" for a measure provided in measure (defaults to classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg). Learned weights can be obtained from $model. Using non-linear optimization is implemented in the SuperLearner R package. For a more detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think
LearnerClassifAvg$new()
defaults to"classif.ce"
, is that true? I think I could set it toclassif.auc
withparam_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the
SuperLearner
package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented inmlr3
? Or do I miss something? Could this solution be applied inmlr3
? In themlr3
book I found a paragraph regarding calling an external optimization function, would that be possible forSuperLearner
?
ANSWER
Answered 2021-Apr-20 at 10:07As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg
basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc"
you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA
and SuperLearner ... uses the Nelder-Mead method via the optim
function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead
bbotk::Optimizer
similar to here that simply wraps stats::optim
with method Nelder-Mead
and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA
delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!
QUESTION
I am trying to scale my data within the crossvalidation folds of a MLENs Superlearner pipeline. When I use StandardScaler in the pipeline (as demonstrated below), I receive the following warning:
/miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] Could not score pipeline-1.mlpclassifier. Details: ValueError("Classification metrics can't handle a mix of binary and continuous-multioutput targets") (name, inst_name, exc), MetricWarning)
Of note, when I omit the StandardScaler() the warning disappears, but the data is not scaled.
...ANSWER
Answered 2021-Apr-06 at 21:50You are currently passing your preprocessing steps as two separate arguments when calling the add method. You can instead combine them as follows:
QUESTION
My overall goal is to determine variable importance from a Superlearner as performed on the Boston dataset. However, when I attempt to determine the variable importance using the VIP package in R, I receive the error below. My suspicion is that the prediction wrapper containing the SuperLeaner object is the cause of the error, but I am by no means sure.
...ANSWER
Answered 2020-Sep-04 at 19:51For the SuperLearner
object, you can see it returns a list of probabilities
QUESTION
library(SuperLearner)
library(MASS)
set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
sl_cv = SuperLearner(Y = Y, X = X, family = gaussian(),
SL.library = c("SL.mean", "SL.ranger"),
verbose = TRUE, cvControl = list(V = 5))
...ANSWER
Answered 2020-Jul-01 at 11:15There are some control parameters for the cross-validation procedure. You could use the validRows
parameter. You will need a list with 5 elements, each element having a vector of all rows that correspond to the clusters you have predefined. Assuming you added a column that shows which cluster an observation belongs to, you could write something like:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install SuperLearner
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page