d-bert | Distilling BERT using natural language generation | Natural Language Processing library

by castorini Python Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(8)Vulnerabilities Install Support

kandi X-RAY | d-bert Summary

d-bert is a Python library typically used in Manufacturing, Utilities, Energy, Utilities, Artificial Intelligence, Natural Language Processing, Tensorflow, Bert applications. d-bert has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can download it from GitHub.

Distilling BERT using natural language generation.

Support

Quality

Security

License

Reuse

Support

d-bert has a low active ecosystem.

It has 23 star(s) with 11 fork(s). There are 4 watchers for this library.

It had no major release in the last 6 months.

d-bert has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of d-bert is current.

Quality

d-bert has 0 bugs and 0 code smells.

Security

d-bert has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

d-bert code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

d-bert is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

d-bert releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Top functions reviewed by kandi - BETA

kandi has reviewed d-bert and discovered the below as its top functions. This is intended to give you an instant insight into d-bert implemented functionality, and help decide if they suit your requirements.

Convert the given examples into a list of features
Truncate a sequence pair
Evaluate the model
Ingest the model
Perform batch prediction
Predict given texts
Augment texts using BERT encoding
Compute the BERT query
Run a single step
Generate a sequence of tokens
Compute the embedding
Fetch embedding
Sample from the corpus
Loads checkpoint
Balance multiple buffers
Creates a function that returns a function that increments checkpoint loss
Extract tensors from text
Ingest the matrix
Forward embedding
Bert - encoder
Convert a sequence of tokens into tokens
Evaluate a given model
Generate prompt
Forward the embedding
Sample a query
Encodes the given list of queries
Generate a sentence

Get all kandi verified functions for this library.

d-bert Key Features

No Key Features are available at this moment for d-bert.

d-bert Examples and Code Snippets

No Code Snippets are available at this moment for d-bert.

Community Discussions

Trending Discussions on d-bert

GCP Vertex AI Training: Auto-packaged Custom Training Job Yields Huge Docker Image

Type errors with BERT example

BERT - Is that needed to add new tokens to be trained in a domain specific environment?

How to use fine-tuned BERT model for sentence encoding?

pip getting killed in Docker

How to use example scripts from git repo when installing with pip

RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 3

Undefined symbol when importing tf-sentencepiece

QUESTION

GCP Vertex AI Training: Auto-packaged Custom Training Job Yields Huge Docker Image

Asked 2022-Mar-01 at 08:34

I am trying to run a Custom Training Job in Google Cloud Platform's Vertex AI Training service.

The job is based on a tutorial from Google that fine-tunes a pre-trained BERT model (from HuggingFace).

When I use the gcloud CLI tool to auto-package my training code into a Docker image and deploy it to the Vertex AI Training service like so:

...

ANSWER

Answered 2022-Mar-01 at 08:34

The image size shown in the UI is the virtual size of the image. It is the compressed total image size that will be downloaded over the network. Once the image is pulled, it will be extracted and the resulting size will be bigger. In this case, the PyTorch image's virtual size is 6.8 GB while the actual size is 17.9 GB.

Also, when a docker push command is executed, the progress bars show the uncompressed size. The actual amount of data that’s pushed will be compressed before sending, so the uploaded size will not be reflected by the progress bar.

To cut down the size of the docker image, custom containers can be used. Here, only the necessary components can be configured which would result in a smaller docker image. More information on custom containers here.

Source https://stackoverflow.com/questions/71284125

QUESTION

Type errors with BERT example

Asked 2021-Aug-21 at 07:45

I'm new to BERT QA model & was trying to follow the example found in this article. The problem is when I run the code attached to the example it produces a Type error as follows TypeError: argmax(): argument 'input' (position 1) must be Tensor, not str.

Here is the code that I've tried running :

...

ANSWER

Answered 2021-Aug-21 at 07:45

So after referring to the BERT Documentation we identified that the model output object contains multiple properties not only start & end scores. Thus, we applied the following changes to the code.

Source https://stackoverflow.com/questions/68870383

QUESTION

BERT - Is that needed to add new tokens to be trained in a domain specific environment?

Asked 2021-Apr-17 at 14:01

My question here is no how to add new tokens, or how to train using a domain-specific corpus, I'm already doing that.

The thing is, am I supposed to add the domain-specific tokens before the MLM training, or I just let Bert figure out the context? If I choose to not include the tokens, am I going to get a poor task-specific model like NER?

To give you more background of my situation, I'm training a Bert model on medical text using Portuguese language, so, deceased names, drug names, and other stuff are present in my corpus, but I'm not sure I have to add those tokens before the training.

I saw this one: Using Pretrained BERT model to add additional words that are not recognized by the model

But the doubts remain, as other sources say otherwise.

Thanks in advance.

...

ANSWER

Answered 2021-Apr-17 at 14:01

Yes, you have to add them to the models vocabulary.

Source https://stackoverflow.com/questions/67058709

QUESTION

How to use fine-tuned BERT model for sentence encoding?

Asked 2021-Mar-19 at 12:53

I fine-tuned the BERT base model on my own dataset following the script here:

https://github.com/cedrickchee/pytorch-pretrained-BERT/tree/master/examples/lm_finetuning

I saved the model as a .pt file and I want to use it now for a sentence similarity task. Unfortunately, it is not clear to me, how to load the fine-tuned model. I tried the following:

...

ANSWER

Answered 2021-Mar-19 at 12:53

To load a model with BertModel.from_pretrained() you need to have saved it using save_pretrained() (link).

Any other storage method would require the corresponding load. I am not familiar with S3, but I assume you can use get_object (link) to retrieve the model, and then save it using the huggingface api. From then on, you should be able to use from_pretrained() normally.

Source https://stackoverflow.com/questions/66707770

QUESTION

pip getting killed in Docker

Asked 2021-Feb-22 at 06:09

I am building a Docker container based on python:3.7-slim-stretch (same problem also happens on python:3.7-slim-stretch), and it is getting Killed on

...

ANSWER

Answered 2021-Feb-22 at 06:09

I experience something similar on Windows when my docker containers run out of memory in WSL. I think the settings are different for Mac, but it looks like there is info here on setting the VM RAM/disk size/swap file settings for Docker for Desktop on Mac:

https://docs.docker.com/docker-for-mac

Source https://stackoverflow.com/questions/66258967

QUESTION

How to use example scripts from git repo when installing with pip

Asked 2020-Nov-26 at 14:03

This might be a very strange question for most of you, but in this case I am grateful for an easy explanation. What confuses me is the following. Let's say I have a git repository such as the following:

https://github.com/cedrickchee/pytorch-pretrained-BERT

In the README they say, I can either install the repository with pip or from source. Within the repo, there are certain .py scripts that I want to use. What creates confusion to me is: How can I access those scripts when installing the repository with pip? I am talking about scripts like these:

...

ANSWER

Answered 2020-Nov-26 at 13:19

Those scripts are examples. They are not installed even if you install from the git repository. To access them you need to clone the repository and copy the scripts out of examples directory.

Source https://stackoverflow.com/questions/65022650

QUESTION

RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 3

Asked 2020-Aug-27 at 06:07

I am doing the following operation,

...

ANSWER

Answered 2020-Aug-27 at 06:07

I took a look at your code (which by the way, didnt run with seq_len = 10) and the problem is that you hard coded the batch_size to be equal 1 (line 143) in your code.

It looks like the example you are trying to run the model on has batch_size = 2.

Just uncomment the previous line where you wrote batch_size = query.shape[0] and everything runs fine.

Source https://stackoverflow.com/questions/63566232

QUESTION

Undefined symbol when importing tf-sentencepiece

Asked 2020-Jan-14 at 08:53

On my MacBook (version 10.14.6) I am succesfully running a Django application including TensorFlow and tf-sentencepiece (in particular to use the universal sentence encoder model). When I perform a pipenv lock -r > requirements.txt I get the following required packages:

...

ANSWER

Answered 2020-Jan-09 at 09:54

I have no skills in Django, but it seems that tensorflow is trying to find a package (with a strange name) and failing.

I'd first suggest to try and fix your docker container setup, and check that pipenv lock -r yield the same result inside and outside your container.

1) as you said in the commentaries, on the host pc

Source https://stackoverflow.com/questions/59613957

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install d-bert

You can download it from GitHub.
You can use d-bert like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: