longformer | Longformer : The Long-Document Transformer | Natural Language Processing library
kandi X-RAY | longformer Summary
kandi X-RAY | longformer Summary
Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. A LongformerEncoderDecoder (LED) model is now available. It supports seq2seq tasks with long input. With gradient checkpointing, fp16, and 48GB gpu, the input length can be up to 16K tokens. Check the updated paper for the model details and evaluation. A significant speed degradation in the hugginface/transformers was recenlty discovered and fixed (check this PR for details). To avoid this problem, either use the old release v2.11.0 but it doesn't support gradient checkpointing, or use the master branch. This problem should be fixed with the next hugginface/transformers release.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Convert a single example .
- Find library path .
- Compile GPU .
- Create a LongformerEncoder .
- Convert arguments to TVM arguments .
- Add command line arguments to the given parser .
- Export the module to a file .
- Register an extension .
- Find and return the path to include .
- Register a global function .
longformer Key Features
longformer Examples and Code Snippets
# Function to google drive from terminal
function gdrive_download () {
CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*co
@misc{sekuli2020longformer,
title={Longformer for MS MARCO Document Re-ranking Task},
author={Ivan Sekulić and Amir Soleimani and Mohammad Aliannejadi and Fabio Crestani},
year={2020},
eprint={2009.09392},
archivePrefix={arXiv},
primaryClass={cs.IR}
python main.py experiment=joint use_wandb=True
python main.py experiment=litbank trainer.label_smoothing_wt=0.0
python main.py experiment=ontonotes_pseudo model/doc_encoder/transformer=longformer_base
python main.py experiment=litbank model/memory
Community Discussions
Trending Discussions on longformer
QUESTION
I am training huggingface longformer for a classification problem and got below output.
I am confused about
Total optimization steps
. As I have 7000 training data points and 5 epochs andTotal train batch size (w. parallel, distributed & accumulation) = 64
, shouldn't I get7000*5/64
steps? that comes to546.875
? why is it showingTotal optimization steps = 545
Why in the below output, there are 16 steps of
Input ids are automatically padded from 1500 to 1536 to be a multiple of config.attention_window: 512
then[ 23/545 14:24 < 5:58:16, 0.02 it/s, Epoch 0.20/5]
? what are these steps?
==========================================================
...ANSWER
Answered 2022-Mar-29 at 00:30Looking at the implementation of the transformers
package, we see that the Trainer
uses a variable called max_steps
when printing the Total optimization steps
message in the train
method:
QUESTION
I am using longformer for sequence classification - binary problem
I have downloaded required files
...ANSWER
Answered 2022-Mar-22 at 23:48requires_grad==True
means that we will compute the gradient of this tensor, so the default setting is we will train/finetune all layers.- You can only train the output layer by freezing the encoder with
QUESTION
I am replicating code from this page and I am getting F1, precision and recall to be 0. I got accuracy as shown by the author. What could be reason?
I looked into compute_metrics
function and it seems to be correct. I tried some toy data as below and precision_recall_fscore_support
seems to be giving a correct answer
ANSWER
Answered 2022-Feb-15 at 20:46My guess is that the transformation of your dependend variable was somehow messed up. This I think because all your metrics which depend on TP (True Posivites) are 0 ->
Both Precision and Sensitivity(Recall) depend on TP as numerator:
QUESTION
Goal: Amend this Notebook to work with Albert and Distilbert models
Kernel: conda_pytorch_p36
. I did Restart & Run All, and refreshed file view in working directory.
Error occurs in Section 1.2, only for these 2 new models.
For filenames etc., I've created a variable used everywhere:
...ANSWER
Answered 2022-Jan-13 at 14:10When instantiating AutoModel
, you must specify a model_type
parameter in ./MRPC/config.json
file (downloaded during Notebook runtime).
List of model_types
can be found here.
Code that appends model_type
to config.json
, in the same format:
QUESTION
I'm working on a competition on Kaggle. First, I trained a Longformer base with the competition dataset and achieved a quite good result on the leaderboard. Due to the CUDA memory limit and time limit, I could only train 2 epochs with a batch size of 1. The loss started at about 2.5 and gradually decreased to 0.6 at the end of my training.
I then continued training 2 more epochs using that saved weights. This time I used a little bit larger learning rate (the one on the Longformer paper) and added the validation data to the training data (meaning I no longer split the dataset 90/10). I did this to try to achieve a better result.
However, this time the loss started at about 0.4 and constantly increased to 1.6 at about half of the first epoch. I stopped because I didn't want to waste computational resources.
Should I have waited more? Could it eventually lead to a better test result? I think the model could have been slightly overfitting at first.
...ANSWER
Answered 2022-Jan-11 at 08:50Your model got fitted to the original training data the first time you trained it. When you added the validation data to the training set the second time around, the distribution of your training data must have changed significantly. Thus, the loss increased in your second training session since your model was unfamiliar with this new distribution.
Should you have waited more? Yes, the loss would have eventually decreased (although not necessarily to a value lower than the original training loss)
Could it have led to a better test result? Probably. It depends on if your validation data contains patterns that are:
- Not present in your training data already
- Similar to those that your model will encounter in deployment
QUESTION
From Transformers library I use LongformerModel, LongformerTokenizerFast, LongformerConfig
(all of them use from_pretrained("allenai/longformer-base-4096")
).
When I do
...ANSWER
Answered 2021-Aug-27 at 18:33I have managed to fix this by reindexing my position_ids
.
When PyTorch was creating that tensor, for some reason some value in position_ids
was bigger than 4098.
I used:
QUESTION
I am trying to follow this example in the huggingface documentation here https://huggingface.co/transformers/model_doc/longformer.html:
...ANSWER
Answered 2021-Apr-02 at 13:20Do not select via index:
QUESTION
I am trying to save the tokenizer in huggingface so that I can load it later from a container where I don't need access to the internet.
...ANSWER
Answered 2020-Oct-28 at 09:27save_vocabulary()
, saves only the vocabulary file of the tokenizer (List of BPE tokens).
To save the entire tokenizer, you should use save_pretrained()
Thus, as follows:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install longformer
You can use longformer like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page