tune | An abstraction layer for parameter tuning | Machine Learning library

by fugue-project Python Version: 0.1.5 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | tune Summary

tune is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning, Tensorflow, Spark applications. tune has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can install using 'pip install tune' or download it from GitHub, PyPI.

Tune is an abstraction layer for general parameter tuning. It is built on Fugue so it can seamlessly run on any backend supported by Fugue, such as Spark, Dask and local.

Support

Quality

Security

License

Reuse

Support

tune has a low active ecosystem.

It has 9 star(s) with 1 fork(s). There are 2 watchers for this library.

It had no major release in the last 12 months.

There are 6 open issues and 15 have been closed. On average issues are closed in 6 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of tune is 0.1.5

Quality

tune has no bugs reported.

Security

tune has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

tune is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

tune releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed tune and discovered the below as its top functions. This is intended to give you an instant insight into tune implemented functionality, and help decide if they suit your requirements.

Run a function using the given function
Logs a test report
Log the metrics
Log the metadata
Run a function on the given trial
Convert a trial to a dictionary
Convert a uniform value to a discrete value
Convert a uniform value to a continuous value
Run the trial
Runs a single Rung
Generate a sampling of the parameters
Returns a Trial decision
Run a function on the experiment
Returns a decision decision
Runs the RemoteTrial test suite
Return a new trial s sorting metric
Fill the data in a dictionary
Run single Rung
Fill a dictionary of parameters
Invoked when a trial is finished
Return the value of the tuning expression
Get the version number
Expression LoggerLogger
Parses an optimizer object
Get budget budget for rung
Returns budget budget budget for rung

Get all kandi verified functions for this library.

tune Key Features

No Key Features are available at this moment for tune.

tune Examples and Code Snippets

No Code Snippets are available at this moment for tune.

Community Discussions

Trending Discussions on tune

Deeplabv3 re-train result is skewed for non-square images

Tidymodels / XGBoost error in last_fit with rsplit value

unable to mmap 1024 bytes - Cannot allocate memory - even though there is more than enough ram

How to calculate the f1-score?

Some problems with numpy array

differences in bitmap or rasterized font bitmaps and text display on 3.5" TFT LCD

What values could go into parameter l2_regularization for HistGradientBoostingRegressor in sklearn

How to create a list with the y-axis labels of a TreeExplainer shap chart?

Multiple RestTemplate calls for each Id in a List

How to improve spark performace in EMR cluster

QUESTION

Deeplabv3 re-train result is skewed for non-square images

Asked 2021-Jun-15 at 09:13

I have issues fine-tuning the pretrained model deeplabv3_mnv2_pascal_train_aug in Google Colab.

When I do the visualization with vis.py, the results appear to be displaced to the left/upper side of the image if it has a bigger height/width, namely, the image is not square.

The dataset used for the fine-tune is Look Into Person. The steps done to do so are:

Create dataset in deeplab/datasets/data_generator.py

...

ANSWER

Answered 2021-Jun-15 at 09:13

After some time, I did find a solution for this problem. An important thing to know is that, by default, train_crop_size and vis_crop_size are 513x513.

The issue was due to vis_crop_size being smaller than the input images, so vis_crop_size is needed to be greater than the max dimension of the biggest image.

In case you want to use export_model.py, you must use the same logic than vis.py, so your masks are not cropped to 513 by default.

Source https://stackoverflow.com/questions/67887078

QUESTION

Tidymodels / XGBoost error in last_fit with rsplit value

Asked 2021-Jun-15 at 04:08

I am trying to follow this tutorial here - https://juliasilge.com/blog/xgboost-tune-volleyball/

I am using it on the most recent Tidy Tuesday dataset about great lakes fishing - trying to predict agency based on many other values.

ALL of the code below works except the final row where I get the following error:

...

ANSWER

Answered 2021-Jun-15 at 04:08

If we look at the documentation of last_fit() We see that split must be

An rsplit object created from `rsample::initial_split().

You accidentally passed the cross-validation folds object stock_folds into split but you should have passed rsplit object stock_split instead

Source https://stackoverflow.com/questions/67978723

QUESTION

unable to mmap 1024 bytes - Cannot allocate memory - even though there is more than enough ram

Asked 2021-Jun-14 at 11:16

I'm currently working on a seminar paper on nlp, summarization of sourcecode function documentation. I've therefore created my own dataset with ca. 64000 samples (37453 is the size of the training dataset) and I want to fine tune the BART model. I use for this the package simpletransformers which is based on the huggingface package. My dataset is a pandas dataframe. An example of my dataset:

My code:

...

ANSWER

Answered 2021-Jun-08 at 08:27

While I do not know how to deal with this problem directly, I had a somewhat similar issue(and solved). The difference is:

I use fairseq
I can run my code on google colab with 1 GPU
Got RuntimeError: unable to mmap 280 bytes from file : Cannot allocate memory (12) immediately when I tried to run it on multiple GPUs.

From the other people's code, I found that he uses python -m torch.distributed.launch -- ... to run fairseq-train, and I added it to my bash script and the RuntimeError is gone and training is going.

So I guess if you can run with 21000 samples, you may use torch.distributed to make whole data into small batches and distribute them to several workers.

Source https://stackoverflow.com/questions/67876741

QUESTION

How to calculate the f1-score?

Asked 2021-Jun-14 at 07:07

I have a pyTorch-code to train a model that should be able to detect placeholder-images among product-images. I didn't write the code by myself as I am very unexperienced with CNNs and Machine Learning.

My boss told me to calculate the f1-score for that model and i found out that the formula for that is ((precision * recall)/(precision + recall)) but I don't know how I get precision and recall. Is someone able to tell me how I can get those two parameters from that following code? (Sorry for the long piece of code, but I didn't really know what is necessary and what isn't)

...

ANSWER

Answered 2021-Jun-13 at 15:17

You can use sklearn to calculate f1_score

Source https://stackoverflow.com/questions/67959327

QUESTION

Some problems with numpy array

Asked 2021-Jun-12 at 17:57

li = np.array(list("123"))
li[0] = "fff"
print(li)

...

ANSWER

Answered 2021-Jun-12 at 17:57

Why so?

Some debugging will help here. Take the first line for example:

Source https://stackoverflow.com/questions/67951421

QUESTION

differences in bitmap or rasterized font bitmaps and text display on 3.5" TFT LCD

Asked 2021-Jun-12 at 16:19

I am using a 3.5: TFT LCD display with an Arduino Uno and the library from the manufacturer, the KeDei TFT library. The library came with a bitmap font table that is huge for the small amount of memory of an Arduino Uno so I've been looking for alternatives.

What I am running into is that there doesn't seem to be a standard representation and some of the bitmap font tables I've found work fine and others display as strange doodles and marks or they display upside down or they display with letters flipped. After writing a simple application to display some of the characters, I finally realized that different bitmaps use different character orientations.

My question

What are the rules or standards or expected representations for the bit data for bitmap fonts? Why do there seem to be several different text character orientations used with bitmap fonts?

Thoughts about the question

Are these due to different target devices such as a Windows display driver or a Linux display driver versus a bare metal Arduino TFT LCD display driver?

What is the criteria used to determine a particular bitmap font representation as a series of unsigned char values? Are different types of raster devices such as a TFT LCD display and its controller have a different sequence of bits when drawing on the display surface by setting pixel colors?

What other possible bitmap font representations requiring a transformation which my version of the library currently doesn't offer, are there?

Is there some method other than the approach I'm using to determine what transformation is needed? I currently plug the bitmap font table into a test program and print out a set of characters to see how it looks and then fine tune the transformation by testing with the Arduino and the TFT LCD screen.

My experience thus far

The KeDei TFT library came with an a bitmap font table that was defined as

...

ANSWER

Answered 2021-Jun-12 at 16:19

Raster or bitmap fonts are represented in a number of different ways and there are bitmap font file standards that have been developed for both Linux and Windows. However raw data representation of bitmap fonts in programming language source code seems to vary depending on:

the memory architecture of the target computer,
the architecture and communication pathways to the display controller,
character glyph height and width in pixels and
the amount of memory for bitmap storage and what measures are taken to make that as small as possible.

A brief overview of bitmap fonts

A generic bitmap is a block of data in which individual bits are used to indicate a state of either on or off. One use of a bitmap is to store image data. Character glyphs can be created and stored as a collection of images, one for each character in the character set, so using a bitmap to encode and store each character image is a natural fit.

Bitmap fonts are bitmaps used to indicate how to display or print characters by turning on or off pixels or printing or not printing dots on a page. See Wikipedia Bitmap fonts

A bitmap font is one that stores each glyph as an array of pixels (that is, a bitmap). It is less commonly known as a raster font or a pixel font. Bitmap fonts are simply collections of raster images of glyphs. For each variant of the font, there is a complete set of glyph images, with each set containing an image for each character. For example, if a font has three sizes, and any combination of bold and italic, then there must be 12 complete sets of images.

A brief history of using bitmap fonts

The earliest user interface terminals such as teletype terminals used dot matrix printer mechanisms to print on rolls of paper. With the development of Cathode Ray Tube terminals bitmap fonts were readily transferable to that technology as dots of luminescence turned on and off by a scanning electron gun.

Earliest bitmap fonts were of a fixed height and width with the bitmap acting as a kind of stamp or pattern to print characters on the output medium, paper or display tube, with a fixed line height and a fixed line width such as the 80 columns and 24 lines of the DEC VT-100 terminal.

With increasing processing power, a more sophisticated typographical approach became available with vector fonts used to improve displayed text quality and provide improved scaling while also reducing memory required to describe the character glyphs.

In addition, while a matrix of dots or pixels worked fairly well for languages such as English, written languages with complex glyph forms were poorly served by bitmap fonts.

Representation of bitmap fonts in source code

There are a number of bitmap font file formats which provide a way to represent a bitmap font in a device independent description. For an example see Wikipedia topic - Glyph Bitmap Distribution Format

The Glyph Bitmap Distribution Format (BDF) by Adobe is a file format for storing bitmap fonts. The content takes the form of a text file intended to be human- and computer-readable. BDF is typically used in Unix X Window environments. It has largely been replaced by the PCF font format which is somewhat more efficient, and by scalable fonts such as OpenType and TrueType fonts.

Other bitmap standards such as XBM, Wikipedia topic - X BitMap, or XPM, Wikipedia topic - X PixMap, are source code components that describe bitmaps however many of these are not meant for bitmap fonts specifically but rather other graphical images such as icons, cursors, etc.

As bitmap fonts are an older format many times bitmap fonts are wrapped within another font standard such as TrueType in order to be compatible with the standard font subsystems of modern operating systems such as Linux and Windows.

However embedded systems that are running on the bare metal or using an RTOS will normally need the raw bitmap character image data in the form similar to the XBM format. See Encyclopedia of Graphics File Formats which has this example:

Following is an example of a 16x16 bitmap stored using both its X10 and X11 variations. Note that each array contains exactly the same data, but is stored using different data word types:

Source https://stackoverflow.com/questions/67465098

QUESTION

What values could go into parameter l2_regularization for HistGradientBoostingRegressor in sklearn

Asked 2021-Jun-12 at 09:55

I am trying to tune hyperparameters for HistGradientBoostingRegressor in sklearn and would like to know what possible values could be for l2_regularization, the rest of the parameter grid that works for me looks like this -

...

ANSWER

Answered 2021-Jun-12 at 09:55

Indeed, Regularizations are constraints that are added to the loss function. The model when minimizing the loss function will have to also minimize the regularization term. Hence, This will reduce the model variance as it cannot overfit.

Acceptable parameters for l2_regularization are often on a logarithmic scale between 0 and 0.1, such as 0.1, 0.001, 0.0001.

Source https://stackoverflow.com/questions/67946302

QUESTION

How to create a list with the y-axis labels of a TreeExplainer shap chart?

Asked 2021-Jun-10 at 17:29

How to create a list with the y-axis labels of a TreeExplainer shap chart?

Hello,

I was able to generate a chart that sorts my variables by order of importance on the y-axis. It is an impotant solution to visualize in graph form, but now I need to extract the list of ordered variables as they are on the y-axis of the graph. Does anyone know how to do this? I put here an example picture.

Obs.: Sorry, I was not able to add a minimal reproducible example. I don't know how to paste the Jupyter Notebook cells here, so I've pasted below the link to the code shared via Github.

In this example, the list would be "vB0 , mB1 , vB1, mB2, mB0, vB2".

minimal reproducible example

...

ANSWER

Answered 2021-Jun-09 at 16:36

TL;DR

Source https://stackoverflow.com/questions/67855111

QUESTION

Multiple RestTemplate calls for each Id in a List

Asked 2021-Jun-09 at 11:07

I need to make multiple RestTemplate calls for each of the Id in a List. What is the best way to perform this?

I have used parallelStream(). Below code snippet is a similar scenario.

...

ANSWER

Answered 2021-Jun-09 at 11:07

.parallelStream() does not guarantee multi threading since it will create only threads equals to number of cores you have. To really enforce multiple threads doing this simultaneously, you need to use .parallelStream() with ForkJoinPool.

High Performance using ForkJoinPool

Source https://stackoverflow.com/questions/67821785

QUESTION

How to improve spark performace in EMR cluster

Asked 2021-Jun-08 at 07:23

I was looking at consolidate location where I can look what all parameters at a high level that needs to be tuned in Spark job to get better performance out from the cluster assuming you have allocated sufficient nodes. I did go through the link but it's too much to process in one go https://spark.apache.org/docs/2.4.5/configuration.html#available-properties

I have listed my findings below that will help people to look at first before deep diving into the above link with what is use case

...

ANSWER

Answered 2021-Mar-22 at 08:57

Below is a list of parameters which I found helpful in tuning of the job, I will keep appending this with whenever I found out use case for a parameter

Parameter What to look for spark.scheduler.mode FAIR or FIFO, This decides how you want to allocate executors to jobs executor-memory Check OOM in executors if you find they are going OOM probably this is the reason or check for executor-cores values, wheather they are too small causing the load on executors
https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html driver-memory If you are doing a collect kind of operation (i.e. any operation that sends data back to Driver) then look for tuning this value executor-cores Its value really depends on what kind of processing you are looking for is it a multi-threaded approach/ light process. The below doc can help you to understand it better
https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html spark.default.parallelism This helped us a quite bit in reducing job execution time, initially run the job without this value & observe what value is default set by the cluster (it does base on cluster size). If you see the value too high then try to reduce it, We generally reduced it to the below logic

number of Max core nodes * number of threads per machine + number of Max on-demand nodes * number of threads per machine + number of Max spot nodes * number of threads per machine spark.sql.shuffle.partitions This value is used when your job is doing quite shuffling of data e.g. DF with cross joins or inner join when it's not repartitioned on joins clause dynamic executor allocation This helped us quite a bit from the pain of allocating the exact number of the executors to the job. Try to tune below spark.dynamicAllocation.minExecutors To start your application these numbers of executors are needed else it will not start. This is quite helpful when you don't want to make your job crawl on 1 or 2 available executors spark.dynamicAllocation.maxExecutors Max amount of executors can be used to ensure the job does not end up consuming all cluster resources in case its multi-job cluster running parallel jobs spark.dynamicAllocation.initialExecutors This is quite helpful when the driver is doing some initial job before spawning the jobs to executors e.g. listing the files in a folder so it will delete only those files at end of the job. This ensures you can provide min executors but can get a head start with fact know that driver is going to take some time to start spark.dynamicAllocation.executorIdleTimeout This is also helpful in the above-mentioned case where the driver is doing some work & has nothing to assign to the executors & you don't want them to time out causing reallocation of executors which will take some time
https://spark.apache.org/docs/2.4.5/configuration.html#dynamic-allocation Trying to reduce the number of files created while writing the partitions As our data is read by different executors while writing each executor will write its own file. This will end up in creating a large number of small files & in intern the query on those will be heavy. There are 2 ways to do it
Coalesce: This will try to do minimum shuffle across the executors & will create an un-even file size
repartition: This will do a shuffle of data & creates files with ~ equal size
https://stackoverflow.com/questions/31610971/spark-repartition-vs-

coalescemaxRecordsPerFile: This parameter is helpful in informing spark, how many records per file you are looking for When you are joining small DF with large DF Check if you can use broadcasting of the small DF by default Spark use the sort-merge join, but if your table is quite low in size see if you can broadcast those variables
https://towardsdatascience.com/the-art-of-joining-in-spark-dcbd33d693c
How one can hint spark to use broadcasting: https://stackoverflow.com/a/37487575/814074
Below parameters you need to look for doing broadcast joins are spark.sql.autoBroadcastJoinThreshold This helps spark to understand for a given size of DF whether to used broadcast join or not spark.driver.maxResultSize Max result will be returned to the driver so it can broadcast them driver-memory As the driver is doing broadcasting of result this needs to be bigger spark.network.timeout spark.executor.heartbeatInterval This helps in the case where you see an abrupt termination of executors from drivers, there could be multiple reasons but if nothing is specifically found you can check on these parameters
https://spark.apache.org/docs/2.4.5/configuration.html#execution-behavior Data is Skewed across customers Try to find out a way where you can trigger the jobs for descending order of volume per customer. This ensures you that cluster will be well occupied during the initial run & long-running customer gets some time while small customers are completing their job. Also, you can drop the customer if no data is present for a given customer to reduce the load on the cluster

Source https://stackoverflow.com/questions/66742937

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install tune

It's recommended to also install Scikit-Learn (for all compatible models tuning) and Hyperopt (to enable Bayesian Optimization).
To quickly start, please go through these tutorials on Kaggle:.
Search Space
Non-iterative Problems, such as Scikit-Learn model tuning
Iterative Problems, such as Keras model tuning

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: