automl | automatic machine learning toolkit , including hyper | Machine Learning library

 by   Angel-ML Scala Version: 0.1.0 License: Apache-2.0

kandi X-RAY | automl Summary

kandi X-RAY | automl Summary

automl is a Scala library typically used in Artificial Intelligence, Machine Learning, Spark applications. automl has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Angel's automatic machine learning toolkit. Angel-AutoML provides automatic hyper-parameter tuning and feature engineering operators. It is developed with Scala. As a stand-alone library, Angel-AutoML can be easily integrated in Java and Scala projects. We welcome everyone interested in machine learning to contribute code, create issues or pull requests. Please refer to Angel Contribution Guide for more detail.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              automl has a low active ecosystem.
              It has 51 star(s) with 21 fork(s). There are 4 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 1 open issues and 0 have been closed. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of automl is 0.1.0

            kandi-Quality Quality

              automl has 0 bugs and 0 code smells.

            kandi-Security Security

              automl has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              automl code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              automl is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              automl releases are available to install and integrate.
              Installation instructions are not available. Examples and code snippets are available.
              It has 4493 lines of code, 354 functions and 92 files.
              It has low code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of automl
            Get all kandi verified functions for this library.

            automl Key Features

            No Key Features are available at this moment for automl.

            automl Examples and Code Snippets

            Usage
            Scaladot img1Lines of Code : 8dot img1License : Permissive (Apache-2.0)
            copy iconCopy
            val param1 = ParamSpace.fromConfigString("param1", "{1.0,2.0,3.0,4.0,5.0}")
            val param2 = ParamSpace.fromConfigString("param2", "{1:10:1}")
            
            val param1 = ParamSpace.fromConfigString("param1", "[1,10]")
            val param2 = ParamSpace.fromConfigString("param2"  

            Community Discussions

            QUESTION

            difference between stack, ensemble & stack ensemble steps in MLJAR
            Asked 2022-Apr-01 at 10:44

            While experimenting with MLJar, I figured out in 'Compete' mode it uses the below 3 steps towards the end of the training:

            ...

            ANSWER

            Answered 2022-Apr-01 at 10:44

            The description of each ensemble type in MLJAR AutoML package is in Algorithms section in the docs.

            The docs for Ensemble is here. It is simple average of previous models. Models are selected until they improve the ensemble performance.

            The docs for Stacked Algorithm is here. It is a model trained on original data plus stacked predictions of previous models.

            The Stacked Ensemble is the Ensemble build from models trained on original data and models on stacked data (original+stacked predictions).

            @mehul-gupta please let me know if it is clear now.

            Source https://stackoverflow.com/questions/71676068

            QUESTION

            Why is my GCP Vertex pipeline api_endpoint not right?
            Asked 2022-Mar-20 at 07:08

            My API_ENDPOINT is set to europe-west1-aiplatform.googleapis.com.

            I define a pipeline:

            ...

            ANSWER

            Answered 2022-Jan-04 at 14:06

            Set location = API_ENDPOINT in google.cloud.aiplatform.init.

            Source https://stackoverflow.com/questions/70577610

            QUESTION

            Vertex AI Pipeline Failed Precondition
            Asked 2022-Mar-09 at 12:14

            I have been following this video: https://www.youtube.com/watch?v=1ykDWsnL2LE&t=310s

            Code located at: https://codelabs.developers.google.com/vertex-pipelines-intro#5 (I have done the last two steps as per the video which isn't an issue for google_cloud_pipeline_components version: 0.1.1)

            I have created a pipeline in vertex ai which ran and used the following code to create the pipeline (from video not code extract in link above):

            ...

            ANSWER

            Answered 2022-Mar-04 at 09:45

            As @scottlucas confirmed, this question was solved by upgrading to the latest version of google-cloud-aiplatform that can be done through pip install --upgrade google-cloud-aiplatform.

            Upgrading to the latest library ensures that all official documentations available to be used as reference, are aligned with the actual product.

            Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

            Feel free to edit this answer for additional information.

            Source https://stackoverflow.com/questions/71245000

            QUESTION

            Send pdf instead of TextSnippet in goole automl enrity extraction
            Asked 2022-Feb-14 at 07:06

            I have created a custom processor using google AutoML entity extractor and trained few pdfs. The Pdf's actually contains Photo identity card. I was able to test it in their UI and it was able to extract the entity properly. Now Im using their Java client library to do it using code given below. Here is the sample

            https://github.com/googleapis/java-automl/blob/b4c760c01efbd2174d93af85c5fbab3c09eee9f2/samples/snippets/src/main/java/com/example/automl/LanguageEntityExtractionPredict.java

            Here I see that they pass the text content into the library instead I want to send the PDF content. I don't want to use the google cloud storage bucket instead I want to load file locally and sent it to the entity extractor. I tried using the Document class as below

            Document.parseDelimitedFrom(FileInputStream("test.pdf")) but it gives me an error.

            Any help is highly appriciated.

            ...

            ANSWER

            Answered 2022-Feb-14 at 07:06

            Document.parseDelimitedFrom(FileInputStream("test.pdf")) throws an error because the parseDelimitedFrom() method expects a protobuf message for parsing not the InputStream of the local PDF file. That being said, currently, there is no provision to send local files for prediction as seen in this REST API documentation. The DocumentInputConfig parameter supports only GCS source.


            Feature Request

            I have raised this requirement as a feature request in Google’s Issue Tracker. The issue can be found here- Issue #218865096. You can STAR the issue to receive automatic updates and give it traction by referring to this link. Also, please be reminded that there is no timeline nor implementation guarantee for feature requests. All communication regarding this feature request will be done on the Issue Tracker.

            Source https://stackoverflow.com/questions/70940619

            QUESTION

            Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, : Unexpected CURL error: getaddrinfo() thread failed to start
            Asked 2022-Jan-27 at 19:14

            I am experiencing a persistent error while trying to use H2O's h2o.automl function. I am trying to repeatedly run this model. It seems to completely fail after 5 or 10 runs.

            ...

            ANSWER

            Answered 2022-Jan-27 at 19:14

            I think I also experienced this issue, although on macOS 12.1. I tried to debug it and found out that sometimes I also get another error:

            Source https://stackoverflow.com/questions/69485936

            QUESTION

            ML.NET AutoML saved model schema issue
            Asked 2022-Jan-18 at 16:18

            I am able to use ML.NET to manually train a model, save it, load it to create a PredictionEngine and make predictions. But when I try to use the AutoML feature, I run into problems loading the model due to schema binding issue below.

            Does anyone know what the issue is? I can get my model to load fine and make predictions if I train and save it without using AutoML so this really puzzles me.

            ...

            ANSWER

            Answered 2022-Jan-16 at 21:23

            I think you need to specify that your CategoricalFeature1 column is both an string and categorical with a EstimatorChain. This is because under the hood all ML models only use a vector of float.

            Try adding an IEstimater like this:

            Source https://stackoverflow.com/questions/70724199

            QUESTION

            H2O Stacked Ensemble Prediction ArrayIndexOutOfBoundsException
            Asked 2022-Jan-06 at 14:54

            Using the h2o package for R, I created a set of base models using AutoML with StackedEnsemble's disabled. Thus, the set of models only contains the base models that AutoML generates by default (GLM, GBM, XGBoost, DeepLearning, and DRF). Using these base models I was able to successfully train a default stacked ensemble manually using the h2o.stackedEnsemble function (i.e., a GLM with default params). I exported the model as a MOJO, shutdown the H2O cluster, restarted R, initialized a new H2O cluster, imported the stacked ensemble MOJO, and successfully generated predictions on a new validation set.

            So far so good.

            Next, I did the exact same thing following the exact same process, but this time I made one change: I trained the stacked ensemble with all pairwise interactions between the base models. The interactions were created automatically by feeding a list of the base model Ids to the interaction metalearner_parameter. The model appeared to train without issue and (as I described above) was able to export it as a MOJO, restart the h2o cluster, restart R, and import the MOJO. However, when I attempt to generate predictions on the same validation set I used above I get the following error:

            ...

            ANSWER

            Answered 2022-Jan-06 at 14:54

            Unfortunately, H2O-3 doesn't currently support exporting GLM with interactions as MOJO. There's a bug that allows the GLM to be exported with interactions but the MOJO doesn't work correctly - the interactions are replaced by missing values. This should be fixed in the next release (3.36.0.2) - it will not allow to export that MOJO in the first place.

            There's not much other than writing the stacked ensemble in R (base model predictions preprocessing (e.g., interaction creation) and then feeding it to the h2o.glm) that you can do. There is now an unmaintained package h2oEnsemble that might be helpful for that. You can also use another metalearner model that is more flexible, e.g., GBM.

            Source https://stackoverflow.com/questions/70597370

            QUESTION

            How to adjust feature importance in Azure AutoML
            Asked 2022-Jan-03 at 11:55

            I am hoping to have some low code model using Azure AutoML, which is really just going to the AutoML tab, running a classification experiment with my dataset, after it's done, I deploy the best selected model.

            The model kinda works (meaning, I publish the endpoint and then I do some manual validation, seems accurate), however, I am not confident enough, because when I am looking at the explanation, I can see something like this:

            4 top features are not really closely important. The most "important" one is really not the one I prefer it to use. I am hoping it will use the Title feature more.

            Is there such a thing I can adjust the importance of individual features, like ranking all features before it starts the experiment?

            I would love to do more reading, but I only found this:

            Increase feature importance

            The only answer seems to be about how to measure if a feature is important.

            Hence, does it mean, if I want to customize the experiment, such as selecting which features to "focus", I should learn how to use the "designer" part in Azure ML? Or is it something I can't do, even with the designer. I guess my confusion is, with ML being such a big topic, I am looking for a direction of learning, in this case of what I am having, so I can improve my current model.

            ...

            ANSWER

            Answered 2022-Jan-03 at 11:55

            Here is link to the document for feature customization.

            Using the SDK you can specify "feauturization": 'auto' / 'off' / 'FeaturizationConfig' in your AutoMLConfig object. Learn more about enabling featurization.

            Automated ML tries out different ML models that have different settings which control for overfitting. Automated ML will pick which overfitting parameter configuration is best based on the best score (e.g. accuracy) it gets from hold-out data. The kind of overfitting settings these models has includes:

            • Explicitly penalizing overly-complex models in the loss function that the ML model is optimizing
            • Limiting model complexity before training, for example by limiting the size of trees in an ensemble tree learning model (e.g. gradient boosting trees or random forest)

            https://docs.microsoft.com/en-us/azure/machine-learning/concept-manage-ml-pitfalls

            Source https://stackoverflow.com/questions/70268372

            QUESTION

            Vertex AI model batch prediction failed with internal error
            Asked 2021-Nov-18 at 11:44

            I have trained the AutoMl classification model on Vertex AI, unfortunately model does not work with batch predictions, whenever I try to score training dataset (same which was used for the successful model training) with batch predictions on Vertex AI I get a following error:

            "Due to one or more errors, this training job was canceled on Nov 11, 2021 at 09:42AM".

            There is an option to get a details from this error and those say the following thing:

            "Batch prediction job customer_value_label_cv_automl_gui encountered the following errors: INTERNAL"

            Does anyone know what might be the reason for getting this kind of error? I am very surprised that the model cannot score the dataset that it was trained on. My dataset consists of 570 columns and about 300k of records.

            ...

            ANSWER

            Answered 2021-Nov-18 at 11:44

            We have been able to finally figure this out. As we were using model.batch_predict method described in the official documentation we unnecessary set the machine_type parameter. Finally, we were able to figure out that it was causing the issue, the machine was probably too weak. Once we removed this declaration this method started to use automatic resources and that solved the case. I wish Vertex AI errors were a little bit more informative because it took us a lot of trials and error to figure out.

            Source https://stackoverflow.com/questions/69925931

            QUESTION

            get metrics out of AutoMLRun based on test_data
            Asked 2021-Nov-03 at 18:28

            I’m using the following script to execute an AutoML run, also passing the test dataset

            ...

            ANSWER

            Answered 2021-Nov-03 at 15:42

            Looks like you also need to specify test_size parameter according to the AutoMLConfig docs for the test_data:

            If this parameter or the test_size parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If test_data is specified then the label_column_name parameter must be specified.

            As for how to extract said metrics and predictions, I imagine they'll be associated with the AutoMLRun itself (as opposed to one of the child runs).

            Source https://stackoverflow.com/questions/69827748

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install automl

            You can download it from GitHub.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/Angel-ML/automl.git

          • CLI

            gh repo clone Angel-ML/automl

          • sshUrl

            git@github.com:Angel-ML/automl.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link