quantization | deep dive into Apple 's coremltools quantization | Machine Learning library

by kingreza Swift Version: Current License: No License

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | quantization Summary

quantization is a Swift library typically used in Artificial Intelligence, Machine Learning applications. quantization has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

A deep dive into Apple's coremltools quantization: Reduce the size of a Core ML model without losing (too much) accuracy and performance. Last year Apple gave us Core ML, an easy to use framework for running trained models on our devices. However the technology was not without its challenges. There were limited integration with third party frameworks, training was still a non trivial process, (which we covered last year on how to train your own Core ML model) and model sizes could run into 100s of MBs. This year Apple introduced an array of solutions to address these challenges. Among them, more third party ML frameworks support, the ability to define custom models and layers, introduction of CreateML for easy training and quantization for size reduction. In this post we are going to dig a little deeper into one of these new features: Quantization. Model size is one of the most common reason for skipping using a local model and opting for an online cloud solution. Fully trained models can go into 100s of MBs and can easily deter potential users from downloading our app. However if you followed WWDC’s What’s new in Core ML session we got a taste of quantization. An approach that can possibly cut down the size of a fully trained model by two third without losing much in accuracy or performance. So let’s test it out together. We're going to take a previously trained model for food classification and see what kind of size/accuracy trade off we can get through quantization. But first, lets go over quantization and what it really means to quantize a model. The simplest way to explain the idea is to perhaps phrase it as “reducing the resolution on a model”. Each trained Core ML model comes with a finite number of weights that are set when the model’s trained. Imagine each of these weights represent 1 cm^2 on an image. For example, if you have a high resolution image you can fit a lot of pixels in that space and get crisp clear picture of a pizza. However if the purpose of your image is for the person who is looking at it to figure out they're looking at pizza, then you don't need a lot of pixels in that 1 cm^2. You can opt for less pixels in that space and still get something that resembles a pizza. You can in fact do this by quite a bit and still see pizza. It's at the lower end where things get a bit more complicated and the image starts to look like something that can be a plate of pasta or lasagna. We will see similar behavior later on. Depending on the model, you could be dealing with tens of millions of weights, which by default are stored as Float32 (Since iOS 11.2 weight are stored as half precision Float16). A Float32 is a 32 bit single precision floating point number that takes 4 bytes. When we use a Float32 we have billions (2^31 − 1) of possible values that our weight can take. It turns out we can reduce the possibilities to a smaller subset and retain most of our accuracy. When we quantize a model, we iterate through its weights and use a number format with lower precision. These Float32 weights can be reduced to half precision (16-bits) or 8-bits and lower. The distribution of the quantization process can either be linear, linear lookup table, k-means generated look up tables or a custom look-up table function. We can see that there are multiple options available to us. We have to pick a bit size we want to quantize down to and a function we want to use for the the quantization distribution. It's important not to forget that reducing precision doesn’t come free, it will affect how the model performs. However we can reduce precision by quite a bit before we notice major reduction in accuracy. So if there is a sweet spot between accuracy and quantization, where is it? How can we find it? The bad news is there is no simple formula, a lot of this will depend on your model and how its used. The good news is quantizing a model and testing it can be done fairly quickly. So lets Goldilock it. We will quantize a model into all its possible bit levels and functions. Then we will run a test against each model and compare its accuracy against its full precision model. We then use the data collected to find the Goldilocks model: the one model that is the smallest for the least loss in accuracy. For this example I will be using a squeeznet model that I’ve trained to detect from 101 different dishes. I have already converted the model into Core ML and I’m ready to quantize it. Before we can quantize a model we need to get the latest version of coremltools. At the time of writing, we are on 2.0b1 which is in beta. To get this version we need to run pip install coremltools==2.0b1. The method we are interested in is quantize_weights. lets look at its documentations. For quantize_weights there are four different modes available. However at the time of this writing the modes mentioned in the documentation are different than what is actually available in coremltools. The modes in the documentation are linear, linear_lut, kmeans_lut and custom_lut. The modes that are actually available are linear, kmeans, linear_lut, custom_lut and dequantization. We will omit custom_lut and dequantization since they are beyond the scope of this article and focus on linear, linear_lut and kmeans. Once coremltools version 2.0b1 is installed, we can run the following python script. Ensure that the script is located in the same folder that has our original model. This script will create all the possible permutations of bits and functions that quantize a model. First we set mode_name to be equal to the name of the model. This should be the same as the name of the file without its mlmodel extension. Then we run python run.py to create all the permutations. In less than ten minutes, we’re proud owners of 27 new models, all in different sizes. We can see that quantization can result in a substantial reduction in size. All quantized models are substantially smaller than the full precision model. Just by looking at the data, it seems like reducing precision by half to 16 bit reduced the models by 40%. This reveals just how much of a model is actually composed of weights. Of these 27 models, one holds the most reduction in size for the least reduction in accuracy. The question is, which one?. There are a few options available. First one is a method provided by coremltools called compare_method. Through this method we can pass the original full precision model, the quantized model and a folder of sample images and see how well the two models match. The problem with this method is that there isn't much we can do with it beyond observing what it prints to the console. Nothing else is returned. If you want more data and a more comprehensive comparison between multiple models there is another powerful tool available at your disposal: Xcode Playgrounds. One of many great things about Xcode Playgrounds is that you can perform inference on a fully trained CoreML model directly from the playground. There is no need to create a full-fledged iOS or macOS app. So with that in mind we are going to start a new Playground. We will iterate through the models and test their accuracy against our data and save the information we've collected from the tests into a CSV file. I have posted one way this can be done below. Although it may seem like a lot of code, it actually doesn't do anything beyond what I mentioned. If you're interested in playing around with it (non pun intended) here is a link to the repo with the Playground file, models and the test data. Our test data is spread among seven categories: French fries, hamburger, hot dog, pizza, ramen, steak and sushi. There will be 100 images for each category, none of which was used during training. When we test our original model we end up with the following accuracy result. This is fairly inline with what I got when I was training the model on DIGITS. Now that we have a base accuracy we can compare how our quantized models fairs against our test data. Lets look at all the 16 bit, half precision quantized models and see how they compare against the original.

Support

Quality

Security

License

Reuse

Support

quantization has a low active ecosystem.

It has 87 star(s) with 4 fork(s). There are 1 watchers for this library.

It had no major release in the last 6 months.

There are 3 open issues and 0 have been closed. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of quantization is current.

Quality

quantization has no bugs reported.

Security

quantization has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

quantization does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

quantization releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of quantization

Get all kandi verified functions for this library.

quantization Key Features

No Key Features are available at this moment for quantization.

quantization Examples and Code Snippets

Validate the full integer quantization type .

python

Lines of Code : 21

License : Non-SPDX (Apache License 2.0)

Copy

def _validate_full_integer_quantization_bias_type(self):
    """Validates bias type for full interger quantization."""
    bias_type = self._full_integer_quantization_bias_type
    if not bias_type:
      return

    if self.activations_type() == _dt

Returns whether the dynamic range quantization is enabled .

python

Lines of Code : 7

License : Non-SPDX (Apache License 2.0)

Copy

def is_post_training_dynamic_range_quantization(self):
    # Post-training dynamic range quantization is only enabled if post-training
    # int8 quantization and training time quantization was not done.
    return (self.is_any_optimization_enabled()

Check if the model is integer quantization .

python

Lines of Code : 4

License : Non-SPDX (Apache License 2.0)

Copy

def is_integer_quantization(self):
    return (self.is_post_training_integer_quantization() or
            self.is_quantization_aware_training() or
            self.is_low_bit_quantize_aware_training())

Community Discussions

Trending Discussions on quantization

Tflite detext error: cv2.error: OpenCV(4.5.2) :-1: error: (-5:Bad argument) in function 'rectangle'

Are these normal speed of Bert Pretrained Model Inference in PyTorch

Converting SSD object detection model to TFLite and quantize it from float to uint8 for EdgeTPU

Draw or resize plotted quantized image with nearest neighbour scaling

How to remove "infinite" while loop to improve MATLAB code?

Is it possible to apply h.264 compression to image?

tflite converter error operation not supported

Is it possible to convert tflite to pb?

Is it impossible to quantization the .tflite file? (OSError Occurred)

ValueError: Cannot set tensor: Dimension mismatch. Got 3 but expected 4 for input 0

QUESTION

Tflite detext error: cv2.error: OpenCV(4.5.2) :-1: error: (-5:Bad argument) in function 'rectangle'

Asked 2021-Jun-10 at 13:39

As I detect my tflite file, the problem happened.

The command I wrote.

...

ANSWER

Answered 2021-Jun-10 at 12:41

The problem is that you are passing tuples with floats into the function's parameters as the points. Here is the error reproduced:

Source https://stackoverflow.com/questions/67921192

QUESTION

Are these normal speed of Bert Pretrained Model Inference in PyTorch

Asked 2021-May-26 at 23:43

I am testing Bert base and Bert distilled model in Huggingface with 4 scenarios of speeds, batch_size = 1:

...

ANSWER

Answered 2021-May-26 at 20:38

No, you can speed it up.

First, why are you testing it with batch size 1?

Both tokenizer and model accept batched inputs. Basically, you can pass a 2D array/list that contains a single sample at each row. See the documentation for tokenizer: https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer.__call__ The same applies for the models.

Also, your for loop is sequential even if you use batch size larger than 1. You can create a test data and then use Trainer class with trainer.predict()

Also see this discussion of mine at the HF forums: https://discuss.huggingface.co/t/urgent-trainer-predict-and-model-generate-creates-totally-different-predictions/3426

Source https://stackoverflow.com/questions/67699354

QUESTION

Converting SSD object detection model to TFLite and quantize it from float to uint8 for EdgeTPU

Asked 2021-May-21 at 10:11

I am having problems converting a SSD object detection model into a uint8 TFLite for the EdgeTPU.

As far as I know, I have been searching in different forums, stack overflow threads and github issues and I think I am following the right steps. Something must be wrong on my jupyter notebook since I can't achive my proposal.

I am sharing with you my steps explained on a Jupyter Notebook. I think it will be more clear.

...

ANSWER

Answered 2021-May-04 at 08:17

The process, as @JaesungChung answered is well done.

My problem was on the application which was running the .tflite model. I quantized my model output to uint8, so I had to reescale my obtained values to get the right results.

I.e. I had 10 objects because I was requesting all the detected objects with an score above 0.5. My results were no scaled, so the detected objects scores could be perfectly 104. I had to reescale that number dividing by 255.

The same happened when graphing my results. So I had to divide that number and multiplicate by the height and width.

Source https://stackoverflow.com/questions/67330342

QUESTION

Draw or resize plotted quantized image with nearest neighbour scaling

Asked 2021-May-18 at 18:38

Following this example of K means clustering I want to recreate the same - only I'm very keen for the final image to contain just the quantized colours (+ white background). As it is, the colour bars get smooshed together to create a pixel line of blended colours.

Whilst they look very similar, the image (top half) is what I've got from CV2 it contains 38 colours total. The lower image only has 10 colours and is what I'm after.

Let's look at a bit of that with 6 times magnification:

I've tried :

...

ANSWER

Answered 2021-May-18 at 16:27

I recommend you to show the image using cv2.imshow, instead of using matplotlib.

cv2.imshow shows the image "pixel to pixel" by default, while matplotlib.pyplot matches the image dimensions to the size of the axes.

Source https://stackoverflow.com/questions/67589929

QUESTION

How to remove "infinite" while loop to improve MATLAB code?

Asked 2021-Apr-28 at 17:39

I am implementing a logarithmic quantizer and what I would like to do is to optimize the code as much as possible. The precise point where I would like to make a change is the last else statement where the equation to be implemented is:

q(u) = u_i if u_i/(1+step) < u <= u_i/(1-step)
u_i = p^(1-i)u_o for i=1,2,...

The parameters p, step, u_o are some constants to be chosen.

More information regarding the quantizer can be found at this paper: Adaptive Backstepping Control of Uncertain Nonlinear Systems with Input Quantization.

In order to code a function to implement it in MATLAB, I wrote the following piece of code:

...

ANSWER

Answered 2021-Apr-28 at 17:39

Assuming u_min>0 and 0

<1, you can simplify (u > u_i/(1+step)) && (u <= u_i/(1-step)) to:

Source https://stackoverflow.com/questions/66075396

QUESTION

Is it possible to apply h.264 compression to image?

Asked 2021-Apr-28 at 15:18

Currently, I am applying h.264 compression to a single image.
As you can see in the picture below, I want to compress that image based on h.264 with 40 constant quantization.

However, there are no relevant functionality anywhere, such as Python-based libraries (opencv, ffmpeg).

Also, there is no github for applying to single image and well-structured h.264 algorithm.

So, is there any github implemetation or library to do it?

Thanks in advance.

...

ANSWER

Answered 2021-Apr-28 at 15:18

There are cases (like academic purposes) that is does make sense to encode a single frame video.

There is an existing image format named HEIF that based on HEVC (H.265) codec, but for H.264, you must encode a video file.

You may use FFmpeg command line tool for creating a single frame video file:

Source https://stackoverflow.com/questions/67296517

QUESTION

tflite converter error operation not supported

Asked 2021-Apr-25 at 10:05

I was trying to convert .pb model of albert to tflite

I made .pb model using https://github.com/google-research/albert in tf 1.15

And I used tconverter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir) # path to the SavedModel directory to make tflite file(in tf 2.4.1)

but

...

ANSWER

Answered 2021-Apr-25 at 10:05

Please consider using the Select TF option in order to fall back to the TF ops when TFLite builtin op coverage does not fit your case.

For the conversion procedure, you can enable the Select TF option as follows:

Source https://stackoverflow.com/questions/67251401

QUESTION

Is it possible to convert tflite to pb?

Asked 2021-Apr-23 at 23:07

Is it possible to convert tflite model to pb model?

I have seen many articles about converting "pb->tflite", but no chance to find "tflite->pb".

If it is not possible, is there any way I can do the quantization with only tflite file?

(so far, I noticed that tf.lite.TFLiteConverter.from_saved_model() only accepts pb file, and that is why I am trying to convert tflite to pb).

Any hint or suggestions will be great!

Thanks

...

ANSWER

Answered 2021-Apr-23 at 23:07

First of all, there is no official TensorFlow API to support the conversion from tflite to graphdef (pb) file as jdduke@ described in the above section.

Actually, there are two TensorFlow graph serialization formats, that are using the "pb" extension:

(1) Saved Model (recommended) - Exporting the given TF graph to the saved model is possible in both TF version one and two. The saved model format is not simple and usually is represented as a directory. In the saved model directory, it consists the following files including the "pb" file:

saved_model.pb (or sometimes saved_model.pbtxt)
variables/variables.index
variables.data-00000-of-00001

You can provide the directory name of the above file location into the tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) to convert the saved model format to the corresponding TFLite model file.

(2) Graph def serialized file (deprecated) - The graph def serialized file is a TF v1 stuff and deprecated. The graph def file is stored with the "pb" extension at most of times. In such case, you can use tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(...) to convert.

The meaning of the "pb" keyword is the "protobuf", which is a binary serialization format that is being used in the TensorFlow product. So, there are possibilities that the "pb" files in the TensorFlow can carry different things per context.

Source https://stackoverflow.com/questions/67229953

QUESTION

Is it impossible to quantization the .tflite file? (OSError Occurred)

Asked 2021-Apr-23 at 09:52

I have to try the quantization to my model(tflite).

I want to change float32 to float 16 through the dynamic range quantization.

This is my code:

...

ANSWER

Answered 2021-Apr-23 at 09:52

The tf.lite.TFLiteConverter.from_saved_model function takes a tensorflow (.pb) model as a parameter. On the other hand, you give a tensorflowlite (.tflite) model, which necessarily leads to an error. If you want to convert your model to float 16, the only way I know of is to take the original model in ".pb" format and you convert it as you want

Source https://stackoverflow.com/questions/67222936

QUESTION

ValueError: Cannot set tensor: Dimension mismatch. Got 3 but expected 4 for input 0

Asked 2021-Apr-13 at 11:19

I am new to TF and Keras. I have model trained and saved using following code

...

ANSWER

Answered 2021-Apr-13 at 11:19

Instead of removing batch size in the graph, you can expand the dimension by using expand_dims:

Source https://stackoverflow.com/questions/67068742

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install quantization

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: