automl | automatic machine learning toolkit , including hyper | Machine Learning library

by Angel-ML Scala Version: 0.1.0 License: Apache-2.0

X-Ray Key Features Code Snippets(1)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | automl Summary

automl is a Scala library typically used in Artificial Intelligence, Machine Learning, Spark applications. automl has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Angel's automatic machine learning toolkit. Angel-AutoML provides automatic hyper-parameter tuning and feature engineering operators. It is developed with Scala. As a stand-alone library, Angel-AutoML can be easily integrated in Java and Scala projects. We welcome everyone interested in machine learning to contribute code, create issues or pull requests. Please refer to Angel Contribution Guide for more detail.

Support

Quality

Security

License

Reuse

Support

automl has a low active ecosystem.

It has 51 star(s) with 21 fork(s). There are 4 watchers for this library.

It had no major release in the last 12 months.

There are 1 open issues and 0 have been closed. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of automl is 0.1.0

Quality

automl has 0 bugs and 0 code smells.

Security

automl has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

automl code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

automl is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

automl releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

It has 4493 lines of code, 354 functions and 92 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of automl

Get all kandi verified functions for this library.

automl Key Features

No Key Features are available at this moment for automl.

automl Examples and Code Snippets

Usage

Scala

Lines of Code : 8

License : Permissive (Apache-2.0)

Copy

val param1 = ParamSpace.fromConfigString("param1", "{1.0,2.0,3.0,4.0,5.0}")
val param2 = ParamSpace.fromConfigString("param2", "{1:10:1}")

val param1 = ParamSpace.fromConfigString("param1", "[1,10]")
val param2 = ParamSpace.fromConfigString("param2"

Community Discussions

Trending Discussions on automl

difference between stack, ensemble & stack ensemble steps in MLJAR

Why is my GCP Vertex pipeline api_endpoint not right?

Vertex AI Pipeline Failed Precondition

Send pdf instead of TextSnippet in goole automl enrity extraction

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: getaddrinfo() thread failed to start

ML.NET AutoML saved model schema issue

H2O Stacked Ensemble Prediction ArrayIndexOutOfBoundsException

How to adjust feature importance in Azure AutoML

Vertex AI model batch prediction failed with internal error

get metrics out of AutoMLRun based on test_data

QUESTION

difference between stack, ensemble & stack ensemble steps in MLJAR

Asked 2022-Apr-01 at 10:44

While experimenting with MLJar, I figured out in 'Compete' mode it uses the below 3 steps towards the end of the training:

...

ANSWER

Answered 2022-Apr-01 at 10:44

The description of each ensemble type in MLJAR AutoML package is in Algorithms section in the docs.

The docs for Ensemble is here. It is simple average of previous models. Models are selected until they improve the ensemble performance.

The docs for Stacked Algorithm is here. It is a model trained on original data plus stacked predictions of previous models.

The Stacked Ensemble is the Ensemble build from models trained on original data and models on stacked data (original+stacked predictions).

@mehul-gupta please let me know if it is clear now.

Source https://stackoverflow.com/questions/71676068

QUESTION

Why is my GCP Vertex pipeline api_endpoint not right?

Asked 2022-Mar-20 at 07:08

My API_ENDPOINT is set to europe-west1-aiplatform.googleapis.com.

I define a pipeline:

...

ANSWER

Answered 2022-Jan-04 at 14:06

Set location = API_ENDPOINT in google.cloud.aiplatform.init.

Source https://stackoverflow.com/questions/70577610

QUESTION

Vertex AI Pipeline Failed Precondition

Asked 2022-Mar-09 at 12:14

I have been following this video: https://www.youtube.com/watch?v=1ykDWsnL2LE&t=310s

Code located at: https://codelabs.developers.google.com/vertex-pipelines-intro#5 (I have done the last two steps as per the video which isn't an issue for google_cloud_pipeline_components version: 0.1.1)

I have created a pipeline in vertex ai which ran and used the following code to create the pipeline (from video not code extract in link above):

...

ANSWER

Answered 2022-Mar-04 at 09:45

As @scottlucas confirmed, this question was solved by upgrading to the latest version of google-cloud-aiplatform that can be done through pip install --upgrade google-cloud-aiplatform.

Upgrading to the latest library ensures that all official documentations available to be used as reference, are aligned with the actual product.

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

Feel free to edit this answer for additional information.

Source https://stackoverflow.com/questions/71245000

QUESTION

Send pdf instead of TextSnippet in goole automl enrity extraction

Asked 2022-Feb-14 at 07:06

I have created a custom processor using google AutoML entity extractor and trained few pdfs. The Pdf's actually contains Photo identity card. I was able to test it in their UI and it was able to extract the entity properly. Now Im using their Java client library to do it using code given below. Here is the sample

https://github.com/googleapis/java-automl/blob/b4c760c01efbd2174d93af85c5fbab3c09eee9f2/samples/snippets/src/main/java/com/example/automl/LanguageEntityExtractionPredict.java

Here I see that they pass the text content into the library instead I want to send the PDF content. I don't want to use the google cloud storage bucket instead I want to load file locally and sent it to the entity extractor. I tried using the Document class as below

Document.parseDelimitedFrom(FileInputStream("test.pdf")) but it gives me an error.

Any help is highly appriciated.

...

ANSWER

Answered 2022-Feb-14 at 07:06

Document.parseDelimitedFrom(FileInputStream("test.pdf")) throws an error because the parseDelimitedFrom() method expects a protobuf message for parsing not the InputStream of the local PDF file. That being said, currently, there is no provision to send local files for prediction as seen in this REST API documentation. The DocumentInputConfig parameter supports only GCS source.

Feature Request

I have raised this requirement as a feature request in Google’s Issue Tracker. The issue can be found here- Issue #218865096. You can STAR the issue to receive automatic updates and give it traction by referring to this link. Also, please be reminded that there is no timeline nor implementation guarantee for feature requests. All communication regarding this feature request will be done on the Issue Tracker.

Source https://stackoverflow.com/questions/70940619

QUESTION

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: getaddrinfo() thread failed to start

Asked 2022-Jan-27 at 19:14

I am experiencing a persistent error while trying to use H2O's h2o.automl function. I am trying to repeatedly run this model. It seems to completely fail after 5 or 10 runs.

...

ANSWER

Answered 2022-Jan-27 at 19:14

I think I also experienced this issue, although on macOS 12.1. I tried to debug it and found out that sometimes I also get another error:

Source https://stackoverflow.com/questions/69485936

QUESTION

ML.NET AutoML saved model schema issue

Asked 2022-Jan-18 at 16:18

I am able to use ML.NET to manually train a model, save it, load it to create a PredictionEngine and make predictions. But when I try to use the AutoML feature, I run into problems loading the model due to schema binding issue below.

Does anyone know what the issue is? I can get my model to load fine and make predictions if I train and save it without using AutoML so this really puzzles me.

...

ANSWER

Answered 2022-Jan-16 at 21:23

I think you need to specify that your CategoricalFeature1 column is both an string and categorical with a EstimatorChain. This is because under the hood all ML models only use a vector of float.

Try adding an IEstimater like this:

Source https://stackoverflow.com/questions/70724199

QUESTION

H2O Stacked Ensemble Prediction ArrayIndexOutOfBoundsException

Asked 2022-Jan-06 at 14:54

Using the h2o package for R, I created a set of base models using AutoML with StackedEnsemble's disabled. Thus, the set of models only contains the base models that AutoML generates by default (GLM, GBM, XGBoost, DeepLearning, and DRF). Using these base models I was able to successfully train a default stacked ensemble manually using the h2o.stackedEnsemble function (i.e., a GLM with default params). I exported the model as a MOJO, shutdown the H2O cluster, restarted R, initialized a new H2O cluster, imported the stacked ensemble MOJO, and successfully generated predictions on a new validation set.

So far so good.

Next, I did the exact same thing following the exact same process, but this time I made one change: I trained the stacked ensemble with all pairwise interactions between the base models. The interactions were created automatically by feeding a list of the base model Ids to the interaction metalearner_parameter. The model appeared to train without issue and (as I described above) was able to export it as a MOJO, restart the h2o cluster, restart R, and import the MOJO. However, when I attempt to generate predictions on the same validation set I used above I get the following error:

...

ANSWER

Answered 2022-Jan-06 at 14:54

Unfortunately, H2O-3 doesn't currently support exporting GLM with interactions as MOJO. There's a bug that allows the GLM to be exported with interactions but the MOJO doesn't work correctly - the interactions are replaced by missing values. This should be fixed in the next release (3.36.0.2) - it will not allow to export that MOJO in the first place.

There's not much other than writing the stacked ensemble in R (base model predictions preprocessing (e.g., interaction creation) and then feeding it to the h2o.glm) that you can do. There is now an unmaintained package h2oEnsemble that might be helpful for that. You can also use another metalearner model that is more flexible, e.g., GBM.

Source https://stackoverflow.com/questions/70597370

QUESTION

How to adjust feature importance in Azure AutoML

Asked 2022-Jan-03 at 11:55

I am hoping to have some low code model using Azure AutoML, which is really just going to the AutoML tab, running a classification experiment with my dataset, after it's done, I deploy the best selected model.

The model kinda works (meaning, I publish the endpoint and then I do some manual validation, seems accurate), however, I am not confident enough, because when I am looking at the explanation, I can see something like this:

4 top features are not really closely important. The most "important" one is really not the one I prefer it to use. I am hoping it will use the Title feature more.

Is there such a thing I can adjust the importance of individual features, like ranking all features before it starts the experiment?

I would love to do more reading, but I only found this:

Increase feature importance

The only answer seems to be about how to measure if a feature is important.

Hence, does it mean, if I want to customize the experiment, such as selecting which features to "focus", I should learn how to use the "designer" part in Azure ML? Or is it something I can't do, even with the designer. I guess my confusion is, with ML being such a big topic, I am looking for a direction of learning, in this case of what I am having, so I can improve my current model.

...

ANSWER

Answered 2022-Jan-03 at 11:55

Here is link to the document for feature customization.

Using the SDK you can specify "feauturization": 'auto' / 'off' / 'FeaturizationConfig' in your AutoMLConfig object. Learn more about enabling featurization.

Automated ML tries out different ML models that have different settings which control for overfitting. Automated ML will pick which overfitting parameter configuration is best based on the best score (e.g. accuracy) it gets from hold-out data. The kind of overfitting settings these models has includes:

Explicitly penalizing overly-complex models in the loss function that the ML model is optimizing
Limiting model complexity before training, for example by limiting the size of trees in an ensemble tree learning model (e.g. gradient boosting trees or random forest)

https://docs.microsoft.com/en-us/azure/machine-learning/concept-manage-ml-pitfalls

Source https://stackoverflow.com/questions/70268372

QUESTION

Vertex AI model batch prediction failed with internal error

Asked 2021-Nov-18 at 11:44

I have trained the AutoMl classification model on Vertex AI, unfortunately model does not work with batch predictions, whenever I try to score training dataset (same which was used for the successful model training) with batch predictions on Vertex AI I get a following error:

"Due to one or more errors, this training job was canceled on Nov 11, 2021 at 09:42AM".

There is an option to get a details from this error and those say the following thing:

"Batch prediction job customer_value_label_cv_automl_gui encountered the following errors: INTERNAL"

Does anyone know what might be the reason for getting this kind of error? I am very surprised that the model cannot score the dataset that it was trained on. My dataset consists of 570 columns and about 300k of records.

...

ANSWER

Answered 2021-Nov-18 at 11:44

We have been able to finally figure this out. As we were using model.batch_predict method described in the official documentation we unnecessary set the machine_type parameter. Finally, we were able to figure out that it was causing the issue, the machine was probably too weak. Once we removed this declaration this method started to use automatic resources and that solved the case. I wish Vertex AI errors were a little bit more informative because it took us a lot of trials and error to figure out.

Source https://stackoverflow.com/questions/69925931

QUESTION

get metrics out of AutoMLRun based on test_data

Asked 2021-Nov-03 at 18:28

I’m using the following script to execute an AutoML run, also passing the test dataset

...

ANSWER

Answered 2021-Nov-03 at 15:42

Looks like you also need to specify test_size parameter according to the AutoMLConfig docs for the test_data:

If this parameter or the test_size parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If test_data is specified then the label_column_name parameter must be specified.

As for how to extract said metrics and predictions, I imagine they'll be associated with the AutoMLRun itself (as opposed to one of the child runs).

Source https://stackoverflow.com/questions/69827748

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install automl

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: