mlr | Machine Learning in R | Machine Learning library
kandi X-RAY | mlr Summary
kandi X-RAY | mlr Summary
R does not define a standardized interface for its machine-learning algorithms. Therefore, for any non-trivial experiments, you need to write lengthy, tedious and error-prone wrappers to call the different algorithms and unify their respective output. Additionally you need to implement infrastructure to. As this becomes computationally expensive, you might want to parallelize your experiments as well. This often forces users to make crummy trade-offs in their experiments due to time constraints or lacking expert programming skills. {mlr} provides this infrastructure so that you can focus on your experiments! The framework provides supervised methods like classification, regression and survival analysis along with their corresponding evaluation and optimization methods, as well as unsupervised methods like clustering. It is written in a way that you can extend it yourself or deviate from the implemented convenience methods and construct your own complex experiments or algorithms. Furthermore, the package is nicely connected to the OpenML R package and its online platform, which aims at supporting collaborative machine learning online and allows to easily share datasets as well as machine learning tasks, algorithms and experiments in order to support reproducible research.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of mlr
mlr Key Features
mlr Examples and Code Snippets
Community Discussions
Trending Discussions on mlr
QUESTION
I am trying to tune the hyperparameters in mlr
using the tuneParams
function. However, I can't make sense of the results it is giving me (or else Im using it incorrectly).
For example, if I create some data with a binary response and then create an mlr
h2o
classification model and then check the accuracy and AUC I will get some values.
Then, if I use tuneParams
on some parameters and find a better accuracy and AUC and then plug them into my model. The resulting accuracy and AUC (for the model) does not match that found by using tuneParams
.
Hopefully the code below will illustrate my issue:
...ANSWER
Answered 2021-May-27 at 15:33You're getting different results because you're evaluating the learner using different train and test data. If I use the same 3-fold CV, I get the same results:
QUESTION
I've been stuck on optimizing this query. I'm hoping I haven't over-simplified my example.
I have a table the following tables:
Athletes:
id, name, team_id
Measurements:
id, type, value, athlete_id, measured_at
I need to find the latest, oldest and penultimate measurements and so I wrote this query:
...ANSWER
Answered 2021-May-11 at 16:13I would suggest conditional aggregation:
QUESTION
i´m a beginner in Powershell, i have some scripts working. but now I´m stucked with this little problem. I have this little script. I have a folder with some files $ORIGEN. I need to get all those name files that match with te values in the variable $ARCHIVOS, and put them in a new variable $DATA. Can anyone tell me how i can make a match between the names in $ARCHIVOS with the files in the folder?. If i use only 1 value in the variable $ARCHIVOS, it works fine, but when i have an array y doesn´t match with anything. I tried many solutions, but nothing. THanks in advance for some help. And sorry by my english
...ANSWER
Answered 2021-May-03 at 17:50Combine Get-ChildItem
(or, since you're not recursing, just Get-Item
) with *
and -Include
, which (unlike -Filter
) accepts an array of wildcard patterns:[1]
QUESTION
I am using mlr3proba
package for machine learning survival analysis.
My dataset contains factor, numeric and integer features.
I used 'scale' and 'encode' pipeops to preprocess my dataset for deephit and deepsurv neural network methods as following codes:
ANSWER
Answered 2021-Apr-26 at 07:15Hi thanks for using mlr3proba! The reason for this is because the parameter names change when wrapped in the pipeline, you can see this in the example below. There are a few options to solve this, you could change the parameter ids to match the new names after wrapping in PipeOps (Option 1 below), or you could specify the tuning ranges for the learner first then wrap it in the PipeOp (Option 2 below), or you could use an AutoTuner and wrap this in the PipeOps. I use the final option in this tutorial.
QUESTION
I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3
help file tells me (?mlr_learners_avg
)
Predictions are averaged using weights (in order of appearance in the data) which are optimized using nonlinear optimization from the package "nloptr" for a measure provided in measure (defaults to classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg). Learned weights can be obtained from $model. Using non-linear optimization is implemented in the SuperLearner R package. For a more detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think
LearnerClassifAvg$new()
defaults to"classif.ce"
, is that true? I think I could set it toclassif.auc
withparam_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the
SuperLearner
package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented inmlr3
? Or do I miss something? Could this solution be applied inmlr3
? In themlr3
book I found a paragraph regarding calling an external optimization function, would that be possible forSuperLearner
?
ANSWER
Answered 2021-Apr-20 at 10:07As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg
basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc"
you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA
and SuperLearner ... uses the Nelder-Mead method via the optim
function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead
bbotk::Optimizer
similar to here that simply wraps stats::optim
with method Nelder-Mead
and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA
delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!
QUESTION
For survival analysis, I am using mlr3proba
package of R.
My dataset consists of 39 features(both continuous and factor, which i converted all to integer and numeric) and target (time & status).
I want to tune hyperparameter: num_nodes, in Param_set
.
This is a ParamUty
class parameter with default value: 32,32
.
so I decided to transform it.
I wrote the code as follows for hyperparamters optimization of surv.deephit
learner using 'nested cross-validation' (with 10 inner and 3 outer folds).
ANSWER
Answered 2021-Apr-17 at 08:46Hi thanks for using mlr3proba. I have actually just finished writing a tutorial that answers exactly this question! It covers training, tuning, and evaluating the neural networks in mlr3proba. For your specific question, the relevant part of the tutorial is this:
QUESTION
I'm trying to perform a Multiple Linear Regression with TensorFlow and confront the results with statsmodels
library.
I generated two random variables X1
and X2
(so that anyone can reproduce it) that will explain the Y variable. The X2
variable is completely useless for this regression, it's just noise with a big scale so that the coefficient will result not significant (p-val close to 1).
At the end I should obtain a model that is basically. y_data = alpha + (0.25)x1 + (0.00)x2 + error.
I tried to adapt this code to my randomly generated data, but unfortunately, this is not working at all. This below is my try:
...ANSWER
Answered 2021-Apr-14 at 21:44The key issues with your code are the following:
- While it is necessary to add a column of ones to the features matrix
x_data
before running the regression with statsmodels, this is not necessary when running the regression with tensorflow. This means that you are passing 3 features to tensorflow instead of 2, where the additional feature (the first column ofx_data
) is constant. - You are normalizing
x_data
after the first column of ones has already been added withx_data = sm.add_constant(x_data)
. As a column of ones has zero variance, after normalization you get a column ofnan
(as you are dividing by zero). This means that the first of the 3 features that you are passing to tensorflow is completely missing (i.e. it's alwaysnan
). - While statsmodels takes as inputs at first
y
and thenX
, tensorflow takes as inputs at firstX
and theny
. This means that you have switched the features and target when running the regression in tensorflow.
I included a complete example below.
QUESTION
I am new to jq, I have a JSON file from (call: all.2):
...ANSWER
Answered 2021-Apr-13 at 15:05Try
QUESTION
In mlr
there was a function to draw calibration plots:
ANSWER
Answered 2021-Apr-08 at 21:23According to jakob-r's comment, this function is not yet available in mlr3
. It might become available in mlr3
.
QUESTION
I try to understand more about the AUC filter of mlr3filters. I use a classification task (task
) and the following code:
ANSWER
Answered 2021-Mar-23 at 18:54While the documentation of the AUC filter states that it is analogous to mlr3measures::auc
, it does not actually use that function. Instead, it uses its own implementation which calculates the Area Under the ("Receiver Operating Characteristic") Curve, which does not require a probability value, only a continuous value that can be used to divide samples with a cutoff.
The Filter
s, as I understand them, are primarily used to calculate filter scores (see the documentation of the Filter
base class). They are mostly useful in combination with the PipeOpFilter
from the mlr3pipelines
package, which is also what the example from the book that you cite is doing. PipeOp
s are generally useful to integrate a filtering step (or any kind of preprocessing) into a learning process (the book chapter on mlr3pipelines
is probably a good point to learn about that). However, these can also be used to compute steps, e.g. filter out columns from a Task
:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install mlr
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page