estimators | Machine Learning Versioning made Simple | Machine Learning library

by fridiculous Python Version: 0.1.0.dev0 License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | estimators Summary

estimators is a Python library typically used in Artificial Intelligence, Machine Learning applications. estimators has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can install using 'pip install estimators' or download it from GitHub, PyPI.

Machine Learning Versioning made Simple

Support

Quality

Security

License

Reuse

Support

estimators has a low active ecosystem.

It has 36 star(s) with 5 fork(s). There are 3 watchers for this library.

It had no major release in the last 12 months.

There are 1 open issues and 0 have been closed. On average issues are closed in 744 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of estimators is 0.1.0.dev0

Quality

estimators has 0 bugs and 0 code smells.

Security

estimators has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

estimators code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

estimators is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

estimators releases are not available. You will need to build from source code and install.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

estimators saves you 260 person hours of effort in developing the same functionality from scratch.

It has 631 lines of code, 67 functions and 15 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed estimators and discovered the below as its top functions. This is intended to give you an instant insight into estimators implemented functionality, and help decide if they suit your requirements.

Save a NumPy array to disk
Save an object
Set the data to be displayed
Get the shape of a DataFrame
Set the object
Load the object
Get object property
Compute the hash of an object
Sets the X test
Returns the proxy object
Return the estimator object
Set the proxy for the prediction
Sets the y - test object
Sets the estimator s estimator

Get all kandi verified functions for this library.

estimators Key Features

No Key Features are available at this moment for estimators.

estimators Examples and Code Snippets

No Code Snippets are available at this moment for estimators.

Community Discussions

Trending Discussions on estimators

Recovering features names of StandardScaler().fit_transform() with sklearn

How to use SageMaker Estimator for model training and saving

How does training of sklearn Stacking metaclassifier work?

How to integrate spark.ml pipeline fitting and hyperparameter optimisation in AWS Sagemaker?

Scikeras with multioutput

Why doesn't GridSearchCV have best_estimator_ even after fitting?

Is it possible to optimize hyperparameters for optional sklearn pipeline steps?

! Missing } inserted. }

Code for the Harrell-Davis quantile estimator in Postgres?

how to properly initialize a child class of XGBRegressor?

QUESTION

Recovering features names of StandardScaler().fit_transform() with sklearn

Asked 2022-Mar-17 at 10:59

Edited from a tutorial in Kaggle, I try to run the code below and data (available to download from here):

Code:

...

ANSWER

Answered 2022-Mar-17 at 10:58

The reason behind this is that StandardScaler returns a numpy.ndarray of your feature values (same shape as pandas.DataFrame.values, but not normalized) and you need to convert it back to pandas.DataFrame with the same column names.

Here's the part of your code that needs changing.

Source https://stackoverflow.com/questions/71509883

QUESTION

How to use SageMaker Estimator for model training and saving

Asked 2022-Mar-12 at 19:39

The documentations of how to use SageMaker estimators are scattered around, sometimes obsolete, incorrect. Is there a one stop location which gives the comprehensive views of how to use SageMaker SDK Estimator to train and save models?

...

ANSWER

Answered 2022-Mar-12 at 19:39

Answer

There is no one such resource from AWS that provides the comprehensive view of how to use SageMaker SDK Estimator to train and save models.

Alternative Overview Diagram

I put a diagram and brief explanation to get the overview on how SageMaker Estimator runs a training.

SageMaker sets up a docker container for a training job where:
- Environment variables are set as in SageMaker Docker Container. Environment Variables.
- Training data is setup under /opt/ml/input/data.
- Training script codes are setup under /opt/ml/code.
- /opt/ml/model and /opt/ml/output directories are setup to store training outputs.

Source https://stackoverflow.com/questions/69024005

QUESTION

How does training of sklearn Stacking metaclassifier work?

Asked 2022-Mar-02 at 15:08

In the docs it is said that metaclassifier is trained through cross_val_predict. From my perspective it means that data is splitten by folds, and all base estimators predict values on one fold, trained on all other folds. And that procedure goes for every fold. Then metaclassifier is trained on predictions of base estimators on these folds. Is it correct? If so, doesn't it contradict to

Note that estimators_ are fitted on the full X

in the way that base estimators are trained on several folds, not full X?

...

ANSWER

Answered 2022-Mar-02 at 15:08

There is no contradiction, because estimators_ is not used when training the metaclassifier. After the cross-val-predictions are made, you don't actually have fitted base estimators (or rather, you have multiple copies of each, depending on your cv parameter). For predicting on new data, you need a single fitted copy of each base estimator; those are obtained by fitting on the full X, and are stored in the attribute estimators_.

Source https://stackoverflow.com/questions/71324449

QUESTION

How to integrate spark.ml pipeline fitting and hyperparameter optimisation in AWS Sagemaker?

Asked 2022-Feb-25 at 12:57

Here is a high-level picture of what I am trying to achieve: I want to train a LightGBM model with spark as a compute backend, all in SageMaker using their Training Job api. To clarify:

I have to use LightGBM in general, there is no option here.
The reason I need to use spark compute backend is because the training with the current dataset does not fit in memory anymore.
I want to use SageMaker Training job setting so I could use SM Hyperparameter optimisation job to find the best hyperparameters for LightGBM. While LightGBM spark interface itself does offer some hyperparameter tuning capabilities, it does not offer Bayesian HP tuning.

Now, I know the general approach to running custom training in SM: build a container in a certain way, and then just pull it from ECR and kick-off a training job/hyperparameter tuning job through sagemaker.Estimator API. Now, in this case SM would handle resource provisioning for you, would create an instance and so on. What I am confused about is that essentially, to use spark compute backend, I would need to have an EMR cluster running, so the SDK would have to handle that as well. However, I do not see how this is possible with the API above.

Now, there is also that thing called Sagemaker Pyspark SDK. However, the provided SageMakerEstimator API from that package does not support on-the-fly cluster configuration either.

Does anyone know a way how to run a Sagemaker training job that would use an EMR cluster so that later the same job could be used for hyperparameter tuning activities?

One way I see is to run an EMR cluster in the background, and then just create a regular SM estimator job that would connect to the EMR cluster and do the training, essentially running a spark driver program in SM Estimator job.

Has anyone done anything similar in the past?

Thanks

...

ANSWER

Answered 2022-Feb-25 at 12:57

Thanks for your questions. Here are answers:

SageMaker PySpark SDK https://sagemaker-pyspark.readthedocs.io/en/latest/ does the opposite of what you want: being able to call a non-spark (or spark) SageMaker job from a Spark environment. Not sure that's what you need here.
Running Spark in SageMaker jobs. While you can use SageMaker Notebooks to connect to a remote EMR cluster for interactive coding, you do not need EMR to run Spark in SageMaker jobs (Training and Processing). You have 2 options:
- SageMaker Processing has a built-in Spark Container, which is easy to use but unfortunately not connected to SageMaker Model Tuning (that works with Training only). If you use this, you will have to find and use a third-party, external parameter search library ; for example Syne Tune from AWS itself (that supports bayesian optimization)
- SageMaker Training can run custom docker-based jobs, on one or multiple machines. If you can fit your Spark code within SageMaker Training spec, then you will be able to use SageMaker Model Tuning to tune your Spark code. However there is no framework container for Spark on SageMaker Training, so you would have to build your own, and I am not aware of any examples. Maybe you could get inspiration from the Processing container code here to build a custom Training container

Your idea of using the Training job as a client to launch an EMR cluster is good and should work (if SM has the right permissions), and will indeed allow you to use SM Model Tuning. I'd recommend:

each SM job to create a new transient cluster (auto-terminate after step) to keep costs low and avoid tuning results to be polluted by inter-job contention that could arise if running everything on the same cluster.
use the cheapest possible instance type for the SM estimator, because it will need to stay up during all duration of your EMR experiment to collect and print your final metric (accuracy, duration, cost...)

In the same spirit, I once used SageMaker Training myself to launch Batch Transform jobs for the sole purpose of leveraging the bayesian search API to find an inference configuration that minimizes cost.

Source https://stackoverflow.com/questions/70835006

QUESTION

Scikeras with multioutput

Asked 2022-Feb-25 at 00:19

I tried to create stacking regressor to predict multiple output with SVR and Neural network as estimators and final estimator is linear regression.

...

ANSWER

Answered 2022-Feb-25 at 00:19

Imo the point here is the following. On one side, NN models do support multi-output regression tasks on their own, which might be solved defining an output layer similar to the one you built, namely with a number of nodes equal to the number of outputs (though, with respect to your construction, I would specify a linear activation with activation=None rather than a sigmoid activation).

Source https://stackoverflow.com/questions/71224003

QUESTION

Why doesn't GridSearchCV have best_estimator_ even after fitting?

Asked 2022-Feb-12 at 22:05

I am learning about multiclass classification using scikit learn. My goal is to develop a code which tries to include all the possible metrics needed to evaluate the classification. This is my code:

...

ANSWER

Answered 2022-Feb-12 at 22:05

The point of refit is that the model will be refitted using the best parameter set found before and the entire dataset. To find the best parameters, cross-validation is used which means that the dataset is always split into a training and a validation set, i.e. not the entire dataset is used for training here.

When you define multiple metrics, you have to tell scikit-learn how it should determine what best means for you. For convenience, you can just specify any of your scorers to be used as the decider so to say. In that case, the parameter set that maximizes this metric will be used for refitting.

If you want something more sophisticated, like taking the parameter set that returned the highest mean of all scorers, you have to pass a function to refit that given all the created metrics returns the index of the corresponding best parameter set. This parameter set will then be used to refit the model.

Those metrics will be passed as a dictionary of strings as keys and NumPy arrays as values. Those NumPy arrays have as many entries as parameter sets that have been evaluated. You find a lot of things in there. What is probably the most relevant is mean_test_*scorer-name*. Those arrays contain for each tested parameter set the mean scorer-name-scorer computed across the cv splits.

In code, to get the index of the parameter set, that returns the highest mean across all scorers, you can do the following

Source https://stackoverflow.com/questions/71094924

QUESTION

Is it possible to optimize hyperparameters for optional sklearn pipeline steps?

Asked 2022-Jan-26 at 17:16

I tried to construct a pipeline that has some optional steps. However, I would like to optimize hyperparameters for those steps as I want to get the best option between not using them and using them with different configurations (in my case SelectFromModel - sfm).

...

ANSWER

Answered 2022-Jan-26 at 16:03

Referring to this example you could just make a list of dictionaries. One containing sfm and its related parameters and the other one not using "passthrough".

Source https://stackoverflow.com/questions/70865376

QUESTION

! Missing } inserted. }

Asked 2022-Jan-18 at 10:10

I'm writing TeX files on overleaf, and suddenly I got an error:

...

ANSWER

Answered 2022-Jan-18 at 10:10

The problem is the {\iffalse}\fi in the abc bib entry. This syntax makes no sense, just remove it.

Source https://stackoverflow.com/questions/70712052

QUESTION

Code for the Harrell-Davis quantile estimator in Postgres?

Asked 2022-Jan-15 at 05:14

We're taking a fresh look at how to review possible outliers in large data sets. We've sorted out some code for IQR and fences, MAD (Median Absolute Deviation), and Double MAD. Those three sound reasonably good at coping with series that include a lot of variabilities, but they're sensitive to asymmetry in the series. Our values are commonly skewed.

Doubled proves less susceptible as it splits the distribution in two and performs the MAD scoring on each half. So, points on either side of the overall median do not distort issues on the other side of the median. As I understand it, what I know comes from here:

https://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/

All of these estimators depend on quantiles, and it sounds like the Harrell-Davis quantile estimator improves the quality of these other methods:

https://aakinshin.net/posts/harrell-davis-double-mad-outlier-detector/

MAD, DoubleMad, and Harrell-Davis seem to be widely used in the sciences, academia, and stats generally. You can get everything in R, but we're hoping to do some outlier checking directly in Postgres. (RDS deploy, no R.)

Does this ring a bell? Has anyone seen code like this for Postgres or any other SQL idiom?

And, not to give a misimpression, I'm not a stats person and have zero ability to translate greek formulas into SQL code. But, I can do okay translating between SQL idioms and following basic concepts.

...

ANSWER

Answered 2022-Jan-15 at 05:14

Now I know why people do this sort of work in R: Because R is fantastic for this kind of work. If anyone comes across this in the future, go get R. It's a compact, easy-to-use, easy-to-learn language with a great IDE.

If you've got a Postgres server where you can install PL/R, so much the better. PL/R is written to use the DBI and RPostgreSQL R packages to connect with Postgres. Meaning, you should be able to develop your code in RStudio, and then add the bits of wrapping required to make it run in PL/R within your Postgres server.

For outliers, I'm happy with univOutl (Univariate Outliers) so far, which provides 10 common, and less common, methods.

Source https://stackoverflow.com/questions/70389629

QUESTION

how to properly initialize a child class of XGBRegressor?

Asked 2021-Dec-26 at 11:58

I want to build a quantile regressor based on XGBRegressor, the scikit-learn wrapper class for XGBoost. I have the following two versions: the second version is simply trimmed from the first one, but it no longer works.

I am wondering why I need to put every parameters of XGBRegressor in its child class's initialization? What if I just want to take all the default parameter values except for max_depth?

(My XGBoost is of version 1.4.2.)

No.1 the full version that works as expected:

...

ANSWER

Answered 2021-Dec-26 at 11:58

I am not an expert with scikit-learn but it seems that one of the requirements of various objects used by this framework is that they can be cloned by calling the sklearn.base.clone method. This appears to be something that the existing XGBRegressor class does, so is something your subclass of XGBRegressor must also do.

What may help is to pass any other unexpected keyword arguments as a **kwargs parameter. In your constructor, kwargs will contain a dict of all of the other keyword parameters that weren't assigned to other constructor parameters. You can pass this dict of parameters on to the call to the superclass constructor by referring to them as **kwargs again: this will cause Python to expand them out:

Source https://stackoverflow.com/questions/70473831

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install estimators

You can install using 'pip install estimators' or download it from GitHub, PyPI.
You can use estimators like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: