mlr3pipelines | Dataflow Programming for Machine Learning in R | Machine Learning library

by mlr-org R Version: v0.1.2 License: LGPL-3.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | mlr3pipelines Summary

mlr3pipelines is a R library typically used in Artificial Intelligence, Machine Learning, Deep Learning applications. mlr3pipelines has no bugs, it has no vulnerabilities, it has a Weak Copyleft License and it has low support. You can download it from GitHub.

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:.

Support

Quality

Security

License

Reuse

Support

mlr3pipelines has a low active ecosystem.

It has 121 star(s) with 23 fork(s). There are 18 watchers for this library.

It had no major release in the last 12 months.

There are 113 open issues and 340 have been closed. On average issues are closed in 232 days. There are 12 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of mlr3pipelines is v0.1.2

Quality

mlr3pipelines has 0 bugs and 0 code smells.

Security

mlr3pipelines has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

mlr3pipelines code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

mlr3pipelines is licensed under the LGPL-3.0 License. This license is Weak Copyleft.

Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.

Reuse

mlr3pipelines releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of mlr3pipelines

Get all kandi verified functions for this library.

mlr3pipelines Key Features

No Key Features are available at this moment for mlr3pipelines.

mlr3pipelines Examples and Code Snippets

No Code Snippets are available at this moment for mlr3pipelines.

Community Discussions

Trending Discussions on mlr3pipelines

Error in UUIDgenerate() : Too many DLL modules. in mlr3 pakcage

mlr3 AutoFSelector glmnet: Error in (if(cv)glmnet::cv.glmnet else glmnet::glmnet)(x = data, y = target, :# x should be a matrix with 2 or more columns

How to save mlr3 lightgbm model correctly?

How to extract mlr3 tuned graph step by step?

mlr3 optimized average of ensemble

Two-level stacked learner (enseble model) combining elastic net and logistic regression using mlr3

How to rename mlr3 task feature values within pipeline

Create branches with different subsets of data with mlr3 PipeOps

Is there a way to group rows (especially dummy variables) in the recipes package in R (or ml3)

mlr3 distrcompose cdf: subscript out of bounds

QUESTION

Error in UUIDgenerate() : Too many DLL modules. in mlr3 pakcage

Asked 2022-Mar-14 at 22:09

I using the mlr3 package for autotuning ML models (mlr3pipelines graph, to be more correct).

It is very hard to reproduce the problem because the error occurs occasionally. The same code sometimes returns an error and sometimes doesn't.

Here is the code snippet

...

ANSWER

Answered 2022-Mar-14 at 22:09

There is a fixed buffer for loading RNG functions in uuid which will fail if there are too many DLLs loaded already. A simple work-around is to run

Source https://stackoverflow.com/questions/71352627

QUESTION

mlr3 AutoFSelector glmnet: Error in (if(cv)glmnet::cv.glmnet else glmnet::glmnet)(x = data, y = target, :# x should be a matrix with 2 or more columns

Asked 2022-Jan-24 at 18:05

I am a beginner on mlr3 and am facing problems while running AutoFSelector learner associated to glmnet on a classification task containing >2000 numeric variables. I reproduce this error while using the simpler mlr3 predefined task Sonar. For note, I am using R version 4.1.2 (2021-11-01)on macOS Monterey 12.1. All required packages have been loaded on CRAN.

...

ANSWER

Answered 2022-Jan-24 at 18:05

This is a problem specific to glmnet. glmnet requires at least two features to fit a model, but in at least one configuration (the first ones in a sequential forward search) you only have one feature.

There are two possibilities to solve this:

Open an issue in mlr3fselect and request a new argument min_features (there already is max_features) to be able to start the search with 2 or more features.
Augment the base learner with a fallback which gets fitted if the base learner fails. Here is fallback to a simple logistic regression:

Source https://stackoverflow.com/questions/70835642

QUESTION

How to save mlr3 lightgbm model correctly?

Asked 2021-Nov-01 at 08:17

I have some following codes. I met error when save trained model. It's only error when i using lightgbm.

...

ANSWER

Answered 2021-Nov-01 at 08:17

lightgbm uses special functions to save and read models. You have to extract the model before saving and add it to the graph learner after loading. However, this might be not practical for benchmarks. We will look into it.

Source https://stackoverflow.com/questions/69793716

QUESTION

How to extract mlr3 tuned graph step by step?

Asked 2021-Jun-09 at 07:49

My codes in following

...

ANSWER

Answered 2021-Jun-08 at 09:22

To be able to fiddle with the models after resampling its best to call resample with store_models = TRUE

Using your example

Source https://stackoverflow.com/questions/67869401

QUESTION

mlr3 optimized average of ensemble

Asked 2021-Apr-20 at 10:07

I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.

My measure of interest is classif.auc

The mlr3 help file tells me (?mlr_learners_avg)

Predictions are averaged using weights (in order of appearance in the data) which are optimized using nonlinear optimization from the package "nloptr" for a measure provided in measure (defaults to classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg). Learned weights can be obtained from $model. Using non-linear optimization is implemented in the SuperLearner R package. For a more detailed analysis the reader is referred to LeDell (2015).

I have two questions regarding this information:

When I look at the source code I think LearnerClassifAvg$new() defaults to "classif.ce", is that true? I think I could set it to classif.auc with param_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the SuperLearner package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented in mlr3? Or do I miss something? Could this solution be applied in mlr3? In the mlr3 book I found a paragraph regarding calling an external optimization function, would that be possible for SuperLearner?

...

ANSWER

Answered 2021-Apr-20 at 10:07

As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.

Assuming I understood the paper correctly:

The LearnerClassifAvg basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc" you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA and SuperLearner ... uses the Nelder-Mead method via the optim function to minimize rank loss (from the documentation).

So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead bbotk::Optimizer similar to here that simply wraps stats::optim with method Nelder-Mead and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.

Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!

Source https://stackoverflow.com/questions/66926354

QUESTION

Two-level stacked learner (enseble model) combining elastic net and logistic regression using mlr3

Asked 2021-Mar-20 at 02:24

I try to solve a common problem in medicine: the combination of a prediction model with other sources, eg, an expert opinion [sometimes heavily emphysised in medicine], called superdoc predictor in this post.

This could be solved by stack a model with a logistic regression (that enters the expert opinion) as described on page 26 in this paper:

Afshar P, Mohammadi A, Plataniotis KN, Oikonomou A, Benali H. From Handcrafted to Deep-Learning-Based Cancer Radiomics: Challenges and Opportunities. IEEE Signal Process Mag 2019; 36: 132–60. Available here

I've tried this here without considering overfitting (I did not apply out of fold predictions of the lower learner):

Example data

...

ANSWER

Answered 2021-Mar-17 at 00:10

I think mlr3 / mlr3pipelines is well suited for your task. It appears that what you are missing is mainly the PipeOpSelect / po("select"), which lets you extract features based on their name or other properties and makes use of Selector objects. Your code should probably look something like

Source https://stackoverflow.com/questions/66644375

QUESTION

How to rename mlr3 task feature values within pipeline

Asked 2020-Nov-05 at 19:38

I have a mlr3 task

...

ANSWER

Answered 2020-Nov-05 at 19:38

This is quite straightforward to do using PipeOpColApply.

We need to define a function that will take the provided input and perform the requested operation (applicator).

Source https://stackoverflow.com/questions/64669624

QUESTION

Create branches with different subsets of data with mlr3 PipeOps

Asked 2020-Oct-01 at 02:53

I want to train models on different subsets of data using mlr3, and I was wondering if there a way to train models on different subsets of data in a pipeline.

What I want to do is similar to the example from R for Data Science - Chapter 25: Many models. Say we use the same data set, gapminder, a data set containing different variables for countries around the world, such as GDP and life expectancy. If I wanted to train models for life expectancy for each country, is there an easy way to create such a pipeline using mlr3?

Ideally, I want to use mlr3pipelines to create a branch in the graph for each subset (e.g. a separate branch for each country) with a model at the end. Therefore, the final graph will start at a single node, and have n trained learners at the end nodes, one for each group (i.e. country) in the data set, or a final node that aggregates the results. I would also expect it to work for new data, for example if we obtain new data in the future for 2020, I would want it to be able to create predictions for each country using the model trained for that specific country.

All the mlr3 examples I have found seem to deal with models for the entire data set, or have models trained with all the groups in the training set.

Currently, I am just manually creating a separate task for each group of data, but it would be nice to have the data subsetting step incorporated into the modelling pipeline.

...

ANSWER

Answered 2020-Oct-01 at 02:53

It would help if you had functions from these two packages: dplyr and tidyr. The following code shows you how to train multiple models by country:

Source https://stackoverflow.com/questions/64129815

QUESTION

Is there a way to group rows (especially dummy variables) in the recipes package in R (or ml3)

Asked 2020-Aug-18 at 18:30

# Packages
library(dplyr)
library(recipes)

# toy dataset, with A being multicolored
df <- tibble(name = c("A", "A", "A", "B", "C"), color = c("green", "yellow", "purple", "green", "blue"))


    #> # A tibble: 5 x 2
    #>   name  color 
    #>     
    #> 1 A     green 
    #> 2 A     yellow
    #> 3 A     purple
    #> 4 B     green 
    #> 5 C     blue

...

ANSWER

Answered 2020-Aug-17 at 08:52

I wrote the following custom step for the recipes package.

Source https://stackoverflow.com/questions/63372731

QUESTION

mlr3 distrcompose cdf: subscript out of bounds

Asked 2020-Jul-30 at 12:04

R version used: 3.6.3, mlr3 version: 0.4.0-9000, mlr3proba version: 0.1.6.9000, mlr3pipelines version: 0.1.2 and xgboost version: 0.90.0.2 (as stated on Rstudio package manager)

I have deployed the following graph pipeline:

...

ANSWER

Answered 2020-Jul-29 at 08:51

The problem lies in distr6 here, please install the latest versions of distr6 (1.4.2) and mlr3proba (0.2.0) from CRAN and then try again.

Source https://stackoverflow.com/questions/63149750

Community Discussions, Code Snippets contain sources that include Stack Exchange Network