allennlp | An open-source NLP research library , built on PyTorch | Natural Language Processing library
kandi X-RAY | allennlp Summary
Support
Quality
Security
License
Reuse
- Construct a TrainModel from partial objects
- Construct a new instance of this class
- Returns the constructor of the wrapped function
- Instantiate a class from a dictionary
- Compute the matching between two sentences
- Compute the multi - perspective matching between two vectors
- Compute cosine similarity between two vectors
- Return a tiny value for a given dtype
- Construct a vocabulary from pretrained pretrained pretrained data
- Performs an ELMo forward transformation
- Return a decorator to register a key
- Compute token embedding
- Load weights from a given file
- Select a batch of spans matching the given spans
- Perform a forward projection
- Compute the embedding
- Compute the token embedding
- Convert a tag sequence into a list of spans
- Creates embeddings for the given tokens
- Permute the top k k
- Create a model from pretrained module
- Returns a T5StackOutput object
- Forward computation
- Performs the forward computation
- Get a pre - trained model
- Evaluate a model
allennlp Key Features
allennlp Examples and Code Snippets
OntoNotes-5.0-NER -conll-formatted-ontenotes-5.0/ -collect_conll.py -README.md -.. -onotenotes-release-5.0/
$ conda create --name py27 python=2.7 $ source activate py27
./conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D ./ontonotes-release-5.0/data/files/data ./conll-formatted-ontonotes-5.0/v4/
./conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D ./ontonotes-release-5.0/data/files/data ./conll-formatted-ontonotes-5.0/v12/
python collect_conll.py usage: collect_conll.py [-h] [-v VERSION] [-l LANGUAGE] [-d [DOMAIN [DOMAIN ...]]] optional arguments: -h, --help show this help message and exit -v VERSION, --version VERSION Which version of split, v4 or v12. -l LANGUAGE, --language LANGUAGE Which language to collect. -d [DOMAIN [DOMAIN ...]], --domain [DOMAIN [DOMAIN ...]] What domains to use. If not specified, all will be used. You can choose from bc bn mz nw tc wb.
python collect_conll.py -v v4
For file:v4/english/train.txt, there are 59924 sentences, 1088503 tokens. For file:v4/english/dev.txt, there are 8528 sentences, 147724 tokens. For file:v4/english/test.txt, there are 8262 sentences, 152728 tokens.
OntoNotes-5.0-NER/ -.. -v4/ -english/ -train.txt -dev.txt -test.txt
python collect_conll.py -v v12
For file:v12/english/train.txt, there are 94292 sentences, 1903816 tokens. For file:v12/english/dev.txt, there are 13900 sentences, 279495 tokens. For file:v12/english/test.txt, there are 10348 sentences, 204235 tokens.
python collect_conll.py -v v4 -l chinese
python collect_conll.py -v v4 -d bc bn mz nw
1 Document ID : ``str`` This is a variation on the document filename 2 Part number : ``int`` Some files are divided into multiple parts numbered as 000, 001, 002, ... etc. 3 Word number : ``int`` This is the word index of the word in that sentence. 4 Word : ``str`` This is the token as segmented/tokenized in the Treebank. Initially the ``*_skel`` file contain the placeholder [WORD] which gets replaced by the actual token from the Treebank which is part of the OntoNotes release. 5 POS Tag : ``str`` This is the Penn Treebank style part of speech. When parse information is missing, all part of speeches except the one for which there is some sense or proposition annotation are marked with a XX tag. The verb is marked with just a VERB tag. 6 Parse bit: ``str`` This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a ``*``. When the parse information is missing, the first word of a sentence is tagged as ``(TOP*`` and the last word is tagged as ``*)`` and all intermediate words are tagged with a ``*``. 7 Predicate lemma: ``str`` The predicate lemma is mentioned for the rows for which we have semantic role information or word sense information. All other rows are marked with a "-". 8 Predicate Frameset ID: ``int`` The PropBank frameset ID of the predicate in Column 7. 9 Word sense: ``float`` This is the word sense of the word in Column 3. 10 Speaker/Author: ``str`` This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data. When not available the rows are marked with an "-". 11 Named Entities: ``str`` These columns identifies the spans representing various named entities. For documents which do not have named entity annotation, each line is represented with an ``*``. 12+ Predicate Arguments: ``str`` There is one column each of predicate argument structure information for the predicate mentioned in Column 7. If there are no predicates tagged in a sentence this is a single column with all rows marked with an ``*``. -1 Co-reference: ``str`` Co-reference chain information encoded in a parenthesis structure. For documents that do not have co-reference annotations, each line is represented with a "-".
├── ner │ ├── JNLPBA │ ├── NCBI-disease │ ├── bc5cdr │ └── sciie ├── parsing │ └── genia ├── pico │ └── ebmnlp └── text_classification ├── chemprot ├── citation_intent ├── mag ├── rct-20k ├── sci-cite └── sciie-relation-extraction
DATASET='bc5cdr' TASK='ner' ...
export BERT_VOCAB=path-to/scibert_scivocab_uncased.vocab export BERT_WEIGHTS=path-to/scibert_scivocab_uncased.tar.gz
./scibert/scripts/train_allennlp_local.sh [serialization-directory]
conda create -n allennlp_spacy
source activate allennlp_spacy
pip install http://download.pytorch.org/whl/torch-0.2.0.post3-cp36-cp36m-macosx_10_7_x86_64.whl
python -m spacy download es
pip install -r requirements.txt
python setup.py develop
pip install tensorboard
download_prepare_fasttext.sh
import torch as th from torch.autograd import Function def batch2tensor(batch_adj, batch_feat, node_per_pool_graph): """ transform a batched graph to batched adjacency tensor and node feature tensor """ batch_size = int(batch_adj.size()[0] / node_per_pool_graph) adj_list = [] feat_list = [] for i in range(batch_size): start = i * node_per_pool_graph end = (i + 1) * node_per_pool_graph adj_list.append(batch_adj[start:end, start:end]) feat_list.append(batch_feat[start:end, :]) adj_list = list(map(lambda x: th.unsqueeze(x, 0), adj_list)) feat_list = list(map(lambda x: th.unsqueeze(x, 0), feat_list)) adj = th.cat(adj_list, dim=0) feat = th.cat(feat_list, dim=0) return feat, adj def masked_softmax( matrix, mask, dim=-1, memory_efficient=True, mask_fill_value=-1e32 ): """ masked_softmax for dgl batch graph code snippet contributed by AllenNLP (https://github.com/allenai/allennlp) """ if mask is None: result = th.nn.functional.softmax(matrix, dim=dim) else: mask = mask.float() while mask.dim() < matrix.dim(): mask = mask.unsqueeze(1) if not memory_efficient: result = th.nn.functional.softmax(matrix * mask, dim=dim) result = result * mask result = result / (result.sum(dim=dim, keepdim=True) + 1e-13) else: masked_matrix = matrix.masked_fill( (1 - mask).byte(), mask_fill_value ) result = th.nn.functional.softmax(masked_matrix, dim=dim) return result
def forward(self, input_ids, attention_mask=None, token_type_ids=None,
position_ids=None, head_mask=None, labels=None):
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
**loss**: (`optional`, returned when ``labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Classification (or regression if config.num_labels==1) loss.
**logits**: ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
Classification (or regression if config.num_labels==1) scores (before SoftMax).
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
of shape ``(batch_size, sequence_length, hidden_size)``:
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
import ast
df["OIE Triples"] = df["OIE output"].apply(ast.literal_eval)
df["OIE Triples"] = df["OIE Triples"].apply(lambda val: [a_dict["description"]
for a_dict in val["verbs"]])
df = df.explode("OIE Triples").drop(columns="OIE output")
sentence ID OIE Triples
0 'The girl went to the cinema' 'abcd' [ARG0: The girl] [V: went] [ARG1:to the cinema]
1 'He is right and he is an engineer' 'efgh' [ARG0: He] [V: is] [ARG1:right]
1 'He is right and he is an engineer' 'efgh' [ARG0: He] [V: is] [ARG1:an engineer]
elmo = ElmoEmbedder(
options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json',
weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
)
>>> from nlg.utils import load_spacy_model
>>> nlp = load_spacy_model()
>>> text = nlp("The virginica species has the least average sepal_width.")
def get_entity_attributes(obj, key, value):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif isinstance(obj, list):
for item in obj:
if(isinstance(item,dict)):
ky,vl = key, value
if ky in item and vl == item[ky]:
# print(type(item), item)
arr.append(item)
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
def parse_attributes(obj, key):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
# Create list of word tokens after removing stopwords
def get_clean_list(entities):
filtered_sentence = []
for word in entities:
lexeme = nlp.vocab[word]
if not lexeme.is_stop and not lexeme.is_punct:
filtered_sentence.append(word)
return filtered_sentence
text = "When I was walking to the park yesterday, I saw a man wearing a blue shirt."
tree = predictor.predict(sentence=text)
key = "word"
entity = "man"
entities = get_entity_attributes(tree, key, entity)
for ent in entities:
if ent['nodeType'] == 'dep':
attributes = parse_attributes(ent, key)
clean_attributes = get_clean_list(attributes)
clean_attributes.remove(entity)
print(f'entity: {entity} Attributes: {clean_attributes}')
else:
attributes = parse_attributes(ent, key)
clean_attributes = get_clean_list(attributes)
clean_attributes.remove(entity)
print(f'entity: {entity} Action Attributes: {clean_attributes}')
entity: man Attributes: ['wearing', 'shirt', 'blue']
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
Trending Discussions on allennlp
Trending Discussions on allennlp
QUESTION
I am trying to train my own custom ELMo model on AllenNLP.
The following bug RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
arises when training the model. There are instances where the size of tensor a is stated to be other values (e.g. 5300). When I tested on a small subset of files, I was able to train the model successfully.
Based on my intuition, this is something that deals with the number of tokens in my model. More specifically specific files which has tokens more than 5000. However, there is no parameter within the AllenNLP package which allows me to tweak this to bypass this error.
Any advice on how I can overcome this issue? Would tweaking the PyTorch code to set it at a 5000 size work (If yes, how can I do that)? Any insights will be deeply appreciated.
FYI, I am currently using a customised DatasetReader for tokenisation purposes. I've generated my own vocab list before training the model (to save some time) which is used to train the ELMo model via AllenNLP.
Update: I found out that there is this variable from AllenNLP max_len=5000
which is why the error is showing. See code here. I've tweaked the parameter to larger values and ended up with CUDA Out of Memory Error on many occasions instead. Making me believe this should not be touched.
Environment: Python 3.6.9, Linux Ubuntu, allennlp=2.9.1, allennlp-models=2.9.0
Traceback:
Traceback (most recent call last):
File "/home/jiayi/.local/bin/allennlp", line 8, in
sys.exit(run())
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 121, in main
args.func(args)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 120, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 179, in train_model_from_file
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 246, in train_model
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 470, in _train_worker
metrics = train_loop.run()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 543, in run
return self.trainer.train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 720, in train
metrics, epoch = self._try_train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 741, in _try_train
train_metrics = self._train_epoch(epoch)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 459, in _train_epoch
batch_outputs = self.batch_outputs(batch, for_training=True)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 352, in batch_outputs
output_dict = self._pytorch_model(**batch)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/models/language_model.py", line 257, in forward
embeddings, mask
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 282, in forward
token_embeddings = self._position(token_embeddings)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 68, in forward
return x + self.positional_encoding[:, : x.size(1)]
RuntimeError: The size of tensor a (5385) must match the size of tensor b (5000) at non-singleton dimension 1
AllenNLP training config file:
// For more info on config files generally, see https://guide.allennlp.org/using-config-files
local NUM_GRAD_ACC = 4;
local BATCH_SIZE = 1;
local BASE_LOADER = {
"max_instances_in_memory": 8,
"batch_sampler": {
"type": "bucket",
"batch_size": BATCH_SIZE,
"sorting_keys": ["source"]
}
};
{
"dataset_reader" : {
"type": "mimic_reader",
"token_indexers": {
"tokens": {
"type": "single_id"
},
"token_characters": {
"type": "elmo_characters"
}
},
"start_tokens": [""],
"end_tokens": [""],
},
"train_data_path": std.extVar("MIMIC3_NOTEEVENTS_DISCHARGE_PATH"),
// Note: We don't set a validation_data_path because the softmax is only
// sampled during training. Not sampling on GPUs results in a certain OOM
// given our large vocabulary. We'll need to evaluate against the test set
// (when we'll want a full softmax) with the CPU.
"vocabulary": {
// Use a prespecified vocabulary for efficiency.
"type": "from_files",
"directory": std.extVar("ELMO_VOCAB_PATH"),
// Plausible config for generating the vocabulary.
// "tokens_to_add": {
// "tokens": ["", ""],
// "token_characters": ["<>/S"]
// },
// "min_count": {"tokens": 3}
},
"model": {
"type": "language_model",
"bidirectional": true,
"num_samples": 8192,
# Sparse embeddings don't work with DistributedDataParallel.
"sparse_embeddings": false,
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "empty"
},
"token_characters": {
"type": "character_encoding",
"embedding": {
"num_embeddings": 262,
// Same as the Transformer ELMo in Calypso. Matt reports that
// this matches the original LSTM ELMo as well.
"embedding_dim": 16
},
"encoder": {
"type": "cnn-highway",
"activation": "relu",
"embedding_dim": 16,
"filters": [
[1, 32],
[2, 32],
[3, 64],
[4, 128],
[5, 256],
[6, 512],
[7, 1024]],
"num_highway": 2,
"projection_dim": 512,
"projection_location": "after_highway",
"do_layer_norm": true
}
}
}
},
// Consider the following.
// remove_bos_eos: true,
// Applies to the contextualized embeddings.
"dropout": 0.1,
"contextualizer": {
"type": "bidirectional_language_model_transformer",
"input_dim": 512,
"hidden_dim": 4096,
"num_layers": 2,
"dropout": 0.1,
"input_dropout": 0.1
}
},
"data_loader": BASE_LOADER,
// "distributed": {
// "cuda_devices": [0, 1],
// },
"trainer": {
"num_epochs": 10,
"cuda_devices": [0, 1, 2, 3],
"optimizer": {
// The gradient accumulators in Adam for the running stdev and mean for
// words not used in the sampled softmax would be decayed to zero with the
// standard "adam" optimizer.
"type": "dense_sparse_adam"
},
// "grad_norm": 10.0,
"learning_rate_scheduler": {
"type": "noam",
// See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401
"model_size": 512,
// See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51.
// Adjusted based on our sample size relative to Calypso's.
"warmup_steps": 6000
},
"num_gradient_accumulation_steps": NUM_GRAD_ACC,
"use_amp": true
}
}
ANSWER
Answered 2022-Mar-24 at 17:17By setting the max_tokens
variable for the custom DatasetReader built to below 5000, this error no longer persists. This was also suggested by one of AllenNLP's contributor to make sure the tokenizer truncates the input to 5000 tokens.
Same question was posted on AllenNLP: https://github.com/allenai/allennlp/discussions/5601
QUESTION
I have a custom classification model trained using transformers
library based on a BERT model. The model classifies text into 7 different categories. It is persisted in a directory using:
trainer.save_model(model_name)
tokenizer.save_pretrained(model_name)
I'm trying to load such persisted model using the allennlp
library for further analysis. I managed to do so after a lot of work. However, when running the model inside the allennlp
framework, the model tends to predict very different from the predictions I get when I run it using transformers
, which lead me think some part of the loading was not done correctly. There are no errors during the inference, it is just that the predictions don't match.
There is little documentation about how to load an existing model, so I'm wondering if someone faced the same situation before. There is just one example of how to do QA classification with ROBERTA, but couldn't extrapolate to what I'm looking for. Anyone have an idea if the steps are following are correct?
This is how I'm loading the trained model:
transformer_vocab = Vocabulary.from_pretrained_transformer(model_name)
transformer_tokenizer = PretrainedTransformerTokenizer(model_name)
transformer_encoder = BertPooler(model_name)
params = Params(
{
"token_embedders": {
"tokens": {
"type": "pretrained_transformer",
"model_name": model_name,
}
}
}
)
token_embedder = BasicTextFieldEmbedder.from_params(vocab=vocab, params=params)
token_indexer = PretrainedTransformerIndexer(model_name)
transformer_model = BasicClassifier(vocab=transformer_vocab,
text_field_embedder=token_embedder,
seq2vec_encoder=transformer_encoder,
dropout=0.1,
num_labels=7)
I also had to implement my own DatasetReader
as follows:
class ClassificationTransformerReader(DatasetReader):
def __init__(
self,
tokenizer: Tokenizer,
token_indexer: TokenIndexer,
max_tokens: int,
**kwargs
):
super().__init__(**kwargs)
self.tokenizer = tokenizer
self.token_indexers: Dict[str, TokenIndexer] = { "tokens": token_indexer }
self.max_tokens = max_tokens
self.vocab = vocab
def text_to_instance(self, text: str, label: str = None) -> Instance:
tokens = self.tokenizer.tokenize(text)
if self.max_tokens:
tokens = tokens[: self.max_tokens]
inputs = TextField(tokens, self.token_indexers)
fields: Dict[str, Field] = { "tokens": inputs }
if label:
fields["label"] = LabelField(label)
return Instance(fields)
It is instantiated as follows:
dataset_reader = ClassificationTransformerReader(tokenizer=transformer_tokenizer,
token_indexer=token_indexer,
max_tokens=400)
To run the model and test out if it works I'm doing the following:
instance = dataset_reader.text_to_instance("some sample text here")
dataset = Batch([instance])
dataset.index_instances(transformer_vocab)
model_input = util.move_to_device(dataset.as_tensor_dict(),
transformer_model._get_prediction_device())
outputs = transformer_model.make_output_human_readable(transformer_model(**model_input))
This works and returns the probabilities correctly, but there don't match what I would get running the model using transformers directly. Any idea what's going on?
ANSWER
Answered 2022-Mar-11 at 19:55As discussed on GitHub: The problem is that you are constructing a 7-way classifier on top of BERT. Even though the BERT model will be identical, the 7-way classifier on top of it is randomly initialized every time.
BERT itself does not come with a classifier. That has to be fine-tuned for your data.
QUESTION
I am just a beginner in NLP and was trying to learn the Semantic role labeling concept through implementation. I was trying to load the bert-base-srl model from the public storage of allennlp. But was facing the following error:
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11672/96061884.py in
1 from allennlp.predictors.predictor import Predictor
----> 2 predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
~\anaconda3\lib\site-packages\allennlp\predictors\predictor.py in from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs)
364 plugins.import_plugins()
365 return Predictor.from_archive(
--> 366 load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
367 predictor_name,
368 dataset_reader_to_load=dataset_reader_to_load,
~\anaconda3\lib\site-packages\allennlp\models\archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
233 config.duplicate(), serialization_dir
234 )
--> 235 model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
236
237 # Load meta.
~\anaconda3\lib\site-packages\allennlp\models\archival.py in _load_model(config, weights_path, serialization_dir, cuda_device)
277
278 def _load_model(config, weights_path, serialization_dir, cuda_device):
--> 279 return Model.load(
280 config,
281 weights_file=weights_path,
~\anaconda3\lib\site-packages\allennlp\models\model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
436 # get_model_class method, that recurses whenever it finds a from_archive model type.
437 model_class = Model
--> 438 return model_class._load(config, serialization_dir, weights_file, cuda_device)
439
440 def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:
~\anaconda3\lib\site-packages\allennlp\models\model.py in _load(cls, config, serialization_dir, weights_file, cuda_device)
378
379 if unexpected_keys or missing_keys:
--> 380 raise RuntimeError(
381 f"Error loading state dict for {model.__class__.__name__}\n\t"
382 f"Missing keys: {missing_keys}\n\t"
RuntimeError: Error loading state dict for SrlBert
Missing keys: ['bert_model.embeddings.position_ids']
Unexpected keys: []
Does someone know a fix for this?
ANSWER
Answered 2022-Mar-11 at 04:52If you are on the later versions of allennlp-models
, you can use this archive_file instead: https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz
.
The latest versions of the model archive files can be found on the demo page in the Model Card tab: https://demo.allennlp.org/semantic-role-labeling
QUESTION
I am new in allennlp. I trained an elmo model to apply it to other allennlp models as the embedding but failed. It seems that my model is not compatible to the interface the config gives. What can I do?
My elmo is trained by allennlp with the command:
allennlp train config/elmo.jsonnet --serialization-dir /xxx
The elmo.jsonnet is almost the same to https://github.com/allenai/allennlp-models/blob/main/training_config/lm/bidirectional_language_model.jsonnet except the dataset and vocabulary.
After that, I got an elmo model with:
config.json
weights.th
vocabulary/
vocabulary/.lock
vocabulary/non_padded_namespaces.txt
vocabulary/tokens.txt
meta.json
When I try to load the model into other models like bidaf-elmo in https://github.com/allenai/allennlp-models/blob/main/training_config/rc/bidaf_elmo.jsonnet, I found it requires the options and weights:
"elmo": {
"type": "elmo_token_embedder",
"do_layer_norm": false,
"dropout": 0,
"options_file": "xxx/options.json",
"weight_file": "xxx/weights.hdf5"
}
Which are not included in my model. I tried to change model.state_dict()
to weights.hdf5 but I received an error:
KeyError: "Unable to open object (object 'char_embed' doesn't exist)"
Which is required in
File "/home/xxx/anaconda3/envs/thesis_torch1.8/lib/python3.8/site-packages/allennlp/modules/elmo.py", line 393, in _load_char_embedding
char_embed_weights = fin["char_embed"][...]
It seems that the model I trained by allennlp is not compatible with the interface. How can I apply my elmo as the embedding of other models?
ANSWER
Answered 2022-Feb-24 at 19:15You are right, those two formats don't align.
I'm afraid there is no easy way out. I think you'll have to write a TokenEmbedder
that can read and apply the output from bidirectional_language_model.jsonnet
.
If you do, we'd love to have it as a contribution to AllenNLP!
QUESTION
Currently i'm able to train a Semantic Role Labeling model using the config file below. This config file is based on the one provided by AllenNLP and works for the default bert-base-uncased
model and also GroNLP/bert-base-dutch-cased
.
{
"dataset_reader": {
"type": "srl_custom",
"bert_model_name": "GroNLP/bert-base-dutch-cased"
},
"data_loader": {
"batch_sampler": {
"type": "bucket",
"batch_size": 32
}
},
"train_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
"validation_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
"model": {
"type": "srl_bert",
"embedding_dropout": 0.1,
"bert_model": "GroNLP/bert-base-dutch-cased"
},
"trainer": {
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"correct_bias": false,
"weight_decay": 0.01,
"parameter_groups": [
[
[
"bias",
"LayerNorm.bias",
"LayerNorm.weight",
"layer_norm.weight"
],
{
"weight_decay": 0.0
}
]
]
},
"learning_rate_scheduler": {
"type": "slanted_triangular"
},
"checkpointer": {
"keep_most_recent_by_count": 2
},
"grad_norm": 1.0,
"num_epochs": 3,
"validation_metric": "+f1-measure-overall"
}
}
Swapping the values of bert_model_name
and bert_model
parameters from GroNLP/bert-base-dutch-cased
to roberta-base
won't work out of the box since the SRL datareader only supports the BertTokenizer and not the RobertaTokenizer. So I changed the config file to the following:
{
"dataset_reader": {
"type": "srl_custom",
"token_indexers": {
"tokens": {
"type": "pretrained_transformer",
"model_name": "roberta-base"
}
}
},
"data_loader": {
"batch_sampler": {
"type": "bucket",
"batch_size": 32
}
},
"train_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
"validation_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
"model": {
"type": "srl_bert",
"embedding_dropout": 0.1,
"bert_model": "roberta-base"
},
"trainer": {
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"correct_bias": false,
"weight_decay": 0.01,
"parameter_groups": [
[
[
"bias",
"LayerNorm.bias",
"LayerNorm.weight",
"layer_norm.weight"
],
{
"weight_decay": 0.0
}
]
]
},
"learning_rate_scheduler": {
"type": "slanted_triangular"
},
"checkpointer": {
"keep_most_recent_by_count": 2
},
"grad_norm": 1.0,
"num_epochs": 15,
"validation_metric": "+f1-measure-overall"
}
}
However, this is still not working. I'm receiving the following error:
2022-02-22 16:19:34,122 - INFO - allennlp.training.gradient_descent_trainer - Training
0%| | 0/1546 [00:00
sys.exit(run())
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\__main__.py", line 39, in run
main(prog="allennlp")
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\__init__.py", line 119, in main
args.func(args)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 111, in train_model_from_args
train_model_from_file(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 177, in train_model_from_file
return train_model(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 258, in train_model
model = _train_worker(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 508, in _train_worker
metrics = train_loop.run()
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 581, in run
return self.trainer.train()
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 771, in train
metrics, epoch = self._try_train()
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 793, in _try_train
train_metrics = self._train_epoch(epoch)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 510, in _train_epoch
batch_outputs = self.batch_outputs(batch, for_training=True)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 403, in batch_outputs
output_dict = self._pytorch_model(**batch)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp_models\structured_prediction\models\srl_bert.py", line 141, in forward
bert_embeddings, _ = self.bert_model(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\transformers\models\bert\modeling_bert.py", line 989, in forward
embedding_output = self.embeddings(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\transformers\models\bert\modeling_bert.py", line 215, in forward
token_type_embeddings = self.token_type_embeddings(token_type_ids)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\sparse.py", line 156, in forward
return F.embedding(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\functional.py", line 1916, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
I don't fully understand whats going wrong and couldn't find any documentation on how to change the config file to load in a 'custom' BERT/RoBERTa model (one thats not mentioned here). I'm running the default allennlp train config.jsonnet
command to start training. allennlp train config.jsonnet --dry-run
produces no errors however.
Thanks in advance! Thijs
EDIT: I've now swapped out and inherited the "srl_bert" for a custom "srl_roberta" class to make use of the RobertaModel. This however still produces the same error.
EDIT2: I'm now using the AutoTokenizer as suggested by Dirk Groeneveld. It looks like changing the SrlReader class to support RoBERTa based models involves way more changes like swapping BERTs wordpiece tokenizer to RoBERTa's BPE tokenizer. Is there an easy way to adapt the SrlReader class or is it better to write a new RobertaSrlReader from scratch?
I've inherited the SrlReader class and changed this line to the following:
self.bert_tokenizer = AutoTokenizer.from_pretrained(bert_model_name)
It produces the following error since RoBERTa tokenization differs from BERT:
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp_models\structured_prediction\dataset_readers\srl.py", line 255, in text_to_instance
wordpieces, offsets, start_offsets = self._wordpiece_tokenize_input(
File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp_models\structured_prediction\dataset_readers\srl.py", line 196, in _wordpiece_tokenize_input
word_pieces = self.bert_tokenizer.wordpiece_tokenizer.tokenize(token)
AttributeError: 'RobertaTokenizerFast' object has no attribute 'wordpiece_tokenizer'
ANSWER
Answered 2022-Feb-24 at 02:14The easiest way to resolve this is to patch SrlReader
so that it uses PretrainedTransformerTokenizer
(from AllenNLP) or AutoTokenizer
(from Huggingface) instead of BertTokenizer
. SrlReader
is an old class, and was written against an old version of the Huggingface tokenizer API, so it's not so easy to upgrade.
If you want to submit a pull request in the AllenNLP project, I'd be happy to help you get it merged into AllenNLP!
QUESTION
I am working on extracting people and tasks from texts (multiple sentences) and need a way to resolve coreferencing. I found this model, and it seems very promising, but once I installed the required libraries allennlp
and allennlp_models
and testing the model out for myself I got:
Script:
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz")
prediction = predictor.predict(
document="Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen. Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.")
print(prediction)
Output:
{'top_spans': [[0, 1], [3, 3], [5, 8], [5, 14], [8, 8], [11, 13], [11, 14], [13, 13], [16, 18], [16, 22], [20, 22], [24, 24], [26, 52], [33, 33], [36, 36], [37, 37], [38, 52], [41, 42], [47, 47], [48, 48], [49, 52]],
'antecedent_indices': [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]],
'predicted_antecedents': [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, -1, 5, 11, -1, -1, -1, 11, -1, -1],
'document': ['Paul', 'Allen', 'was', 'born', 'on', 'January', '21', ',', '1953', ',', 'in', 'Seattle', ',', 'Washington', ',', 'to', 'Kenneth', 'Sam', 'Allen', 'and', 'Edna', 'Faye', 'Allen', '.', 'Allen', 'attended', 'Lakeside', 'School', ',', 'a', 'private', 'school', 'in', 'Seattle', ',', 'where', 'he', 'befriended', 'Bill', 'Gates', ',', 'two', 'years', 'younger', ',', 'with', 'whom', 'he', 'shared', 'an', 'enthusiasm', 'for', 'computers', '.'],
'clusters': [[[0, 1], [24, 24], [36, 36], [47, 47]], [[11, 13], [33, 33]]]}
I'm having trouble interpreting the format of this output. I was expecting something like
{entity_0_spans: [LIST_OF_INDEX_TUPLES], # Paul Allen in this example
entity_1_spans: [LIST_OF_INDEX_TUPLES], # Seattle in this example
...}
or something that more closely resembles the visualisation available on the demo page:
I've looked through https://demo.allennlp.org/coreference-resolution but couldn't find a breakdown of how to use the model output yet - can anyone suggest some resources that will help me? Any pointers are much appreciated!
ANSWER
Answered 2022-Feb-10 at 16:15The information you are looking for is in 'clusters', where each list corresponds to an entity. Within each entity list, you will find the mentions referring to the same entity. The number are indices that mark the beginning and ending of each coreferential mention. E.g. Paul Allen [0,1] and Allen [24, 24].
QUESTION
The configuration file for the HuggingFace google/mt5-small Model (https://huggingface.co/google/mt5-small)
defines
{
...
"d_model": 512,
...
"num_heads": 6,
...
}
Link to the config file: https://huggingface.co/google/mt5-small/resolve/main/config.json
Question:
As far as I understood, the number of attention-head should be a divider of the model dimension. This is clearly not true in this config file.
Do I misunderstand how self-attention is applied in mT5?
When I use the AllenNLP model (https://github.com/allenai/allennlp-models/blob/main/allennlp_models/generation/models/t5.py) as sequence-to-sequence model, I receive an error message
Summary:
allennlp.common.checks.ConfigurationError: The hidden size (512) is not a multiple of the number of attention heads (6)
Full
Traceback (most recent call last):
File "/snap/pycharm-professional/269/plugins/python/helpers/pydev/pydevd.py", line 1500, in _exec
runpy._run_module_as_main(module_name, alter_argv=False)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/__main__.py", line 50, in
run()
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/__main__.py", line 46, in run
main(prog="allennlp")
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/__init__.py", line 123, in main
args.func(args)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 112, in train_model_from_args
train_model_from_file(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 178, in train_model_from_file
return train_model(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 254, in train_model
model = _train_worker(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 490, in _train_worker
train_loop = TrainModel.from_params(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 652, in from_params
return retyped_subclass.from_params(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 686, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 766, in from_partial_objects
model_ = model.construct(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 66, in constructor_to_use
return self._constructor.from_params( # type: ignore[union-attr]
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 652, in from_params
return retyped_subclass.from_params(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 686, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp_models/generation/models/t5.py", line 32, in __init__
self.t5 = T5Module.from_pretrained_module(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/transformer_module.py", line 251, in from_pretrained_module
model = cls._from_config(config, **kwargs)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/t5.py", line 852, in _from_config
return cls(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/t5.py", line 783, in __init__
self.encoder: T5EncoderStack = encoder.construct(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/t5.py", line 600, in basic_encoder
self_attention=block_self_attention.construct(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 66, in constructor_to_use
return self._constructor.from_params( # type: ignore[union-attr]
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 686, in from_params
return constructor_to_call(**kwargs) # type: ignore
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/attention_module.py", line 471, in __init__
super().__init__(
File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/attention_module.py", line 91, in __init__
raise ConfigurationError(
allennlp.common.checks.ConfigurationError: The hidden size (512) is not a multiple of the number of attention heads (6)
ANSWER
Answered 2022-Jan-20 at 09:48This is a very good question, and shows a common misconception about Transformers, stemming from an (unfortunate) formulation in the original Transformers paper. In particular, the authors write the following in Section 3.2.2:
In this work, we employ
h = 8
parallel attention layers, or heads. For each of these we used_k = d_v = d_(model) / h = 64
. [...]
Note that the equality of d_k/d_v = d_(model)
is not strictly necessary; it is only important that you do match the final hidden representation (d_(model)
) after the Feed-Forward portion of each layer. Specifically for mt5-small
, the authors actually use an internal dimension of 384
which is simply the product of parameters d_kv * num_heads = 64 * 6
.
Now, the problem is that many libraries make a similar assumption of the enforced relation between d_kv
and d_(model)
, because it saves some implementation effort that most people won't use anyways. I suspect (not super familiar with AllenNLP) that they have made similar assumptions here, which is why you cannot load the model.
Also, to clarify this, here is a peek at the modules
of a loaded mt5-small
:
T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=384, bias=False)
(k): Linear(in_features=512, out_features=384, bias=False)
(v): Linear(in_features=512, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedGeluDense(
(wi_0): Linear(in_features=512, out_features=1024, bias=False)
(wi_1): Linear(in_features=512, out_features=1024, bias=False)
(wo): Linear(in_features=1024, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
You can get the full model layout by simply calling list(model.modules())
QUESTION
I'm writing down my first allennlp
project to detect specific spans in newspaper articles. I was able to have it train on my dataset. The loss computed with cross entropy seems to decrease correctly, but I'm having some issues with my metric. I wrote a custom metric which is supposed to give an estimate of how accurate my model predicts spans according to some ground truth spans. The problem is that right now, our metric doesn't seem to update correctly even though the loss is decreasing.
I'm not sure how to tackle the problem and guess my questions are the following:
- What is the exact use of the
reset()
function in theMetric
class ? - Apart from writing the
__call__()
,get_metric()
andreset()
function, are there other things to watch out for?
Below is a snapshot of my custom Metric
class in case you need it.
class SpanIdenficationMetric(Metric):
def __init__(self) -> None:
self._s_cardinality = 0 # S: model predicted spans
self._t_cardinality = 0 # T: article gold spans
self._s_sum = 0
self._t_sum = 0
def reset(self) -> None:
self._s_cardinality = 0
self._t_cardinality = 0
self._s_sum = 0
self._t_sum = 0
def __call__(self, prop_spans: torch.Tensor, gold_spans: torch.Tensor, mask: Optional[torch.BoolTensor] = None):
for i, article_spans in enumerate(prop_spans):
if article_spans.numel() == 0:
continue
article_gold_spans = gold_spans[i]
merged_prop_spans = self._merge_intervals(article_spans)
self._s_cardinality += merged_prop_spans.size(dim=0)
self._t_cardinality += article_gold_spans.size(dim=0)
for combination in itertools.product(merged_prop_spans, article_gold_spans):
sspan = combination[0]
tspan = combination[1]
self._s_sum += self._c_function(sspan, tspan, sspan[1].item() - sspan[0].item() + 1)
self._t_sum += self._c_function(sspan, tspan, tspan[1].item() - tspan[0].item() + 1)
def get_metric(self, reset: bool = False):
precision = 0
recall = 0
if self._s_cardinality != 0:
precision = self._s_sum / self._s_cardinality
if self._t_cardinality != 0:
recall = self._t_sum / self._t_cardinality
if reset:
self.reset()
return { "si-metric" : (2 * precision * recall) / (precision + recall) if precision + recall > 0 else 0 }
def _c_function(self, s, t, h): {}
def _intersect(self, s, t): {}
def _merge_intervals(self, prop_spans): {}
Thank you in advance. Cheers.
ANSWER
Answered 2021-Dec-10 at 02:12During training, the trainer will call the metric (using Metric.__call__()
) with the results from every batch. The metric is supposed to update its internal state when this happens. The trainer expects to get the current value(s) of the metric when it calls Metric.get_metric()
. Metric.reset()
has to reset the metric into a state as if it had never been called before. When get_metric()
gets called with reset = True
, it's expected to reset the metric as well.
From what I can tell, your code does all these things correctly. Your code will not run correctly in a distributed setting, but if you are not training on multiple GPUs, that's not a problem.
What you're doing is similar to the SQuAD metric: https://github.com/allenai/allennlp-models/blob/main/allennlp_models/rc/metrics/squad_em_and_f1.py The SQuAD metric goes out of its way to call the original SQuAD evaluation code, so it's a little more complicated than what you would want, but maybe you can adapt it? The main difference would be that you are calculating F scores across the whole dataset, while SQuAD calculates them per-document, and then averages across documents.
Finally, you can write a simple test for your metric, similar to the SQuAD test: https://github.com/allenai/allennlp-models/blob/main/tests/rc/metrics/squad_em_and_f1_test.py That might help narrow down where the problem is.
QUESTION
I'm using the AllenNLP (version 2.6) semantic role labeling model to process a large pile of sentences. My Python version is 3.7.9. I'm on MacOS 11.6.1. My goal is to use multiprocessing.Pool
to parallelize the work, but the calls via the pool are taking longer than they do in the parent process, sometimes substantially so.
In the parent process, I have explicitly placed the model in shared memory as follows:
from allennlp.predictors import Predictor
from allennlp.models.archival import load_archive
import allennlp_models.structured_prediction.predictors.srl
PREDICTOR_PATH = "......"
archive = load_archive(PREDICTOR_PATH)
archive.model.share_memory()
PREDICTOR = Predictor.from_archive(archive)
I know the model is only being loaded once, in the parent process. And I place the model in shared memory whether or not I'm going to make use of the pool. I'm using torch.multiprocessing
, as many recommend, and I'm using the spawn
start method.
I'm calling the predictor in the pool using Pool.apply_async
, and I'm timing the calls within the child processes. I know that the pool is using the available CPUs (I have six cores), and I'm nowhere near running out of physical memory, so there's no reason for the child processes to be swapped to disk.
Here's what happens, for a batch of 395 sentences:
- Without multiprocessing: 638 total processing seconds (and elapsed time).
- With a 4-process pool: 293 seconds elapsed time, 915 total processing seconds.
- With a 12-process pool: 263 seconds elapsed time, 2024 total processing seconds.
The more processes, the worse the total AllenNLP processing time - even though the model is explicitly in shared memory, and the only thing that crosses the process boundary during the invocation is the input text and the output JSON.
I've done some profiling, and the first thing that leaps out at me is that the function torch._C._nn.linear
is taking significantly longer in the multiprocessing cases. This function takes two tensors as arguments - but there are no tensors being passed across the process boundary, and I'm decoding, not training, so the model should be entirely read-only. It seems like it has to be a problem with locking or competition for the shared model resource, but I don't understand at all why that would be the case. And I'm not a torch
programmer, so my understanding of what's happening is limited.
Any pointers or suggestions would be appreciated.
ANSWER
Answered 2021-Dec-01 at 15:18Turns out that I wasn't comparing exactly the right things. This thread: https://github.com/allenai/allennlp/discussions/5471 goes into all the detail. Briefly, because pytorch
can use additional resources under the hood, my baseline test without multiprocessing wasn't taxing my computer enough when running two instances in parallel; I had to run 4 instances to see the penalty, and in that case, the total processing time was essentially the same for 4 parallel nonmultiprocessing invocations, or one multiprocessing case with 4 subprocesses.
QUESTION
I am a beginner and not a native English speaker, so I may ask poor questions.Sorry!
I recently finished the official AllenNLP tutorial(https://guide.allennlp.org/training-and-prediction) and want to change the simple classifier's word embedding to ELMo.
Also, I want to make the architecture of the simple classifier more complex to increase its accuracy. I think I'm done with the implementation of the model.
simple_classifier.py
@Model.register("simple_classifier")
class SimpleClassifier(Model):
def __init__(
self, vocab: Vocabulary, embedder: TextFieldEmbedder, encoder: Seq2VecEncoder
):
super().__init__(vocab)
self.embedder = embedder
self.encoder = encoder
num_labels = vocab.get_vocab_size("labels")
self.dropout = torch.nn.Dropout(p=0.2)
self.relu = torch.nn.ReLU()
self.layer1=torch.nn.Linear(encoder.get_output_dim(),512)
self.layer2 = torch.nn.Linear(512, 128)
self.layer3 = torch.nn.Linear(128, 50)
self.layer4 = torch.nn.Linear(50, 10)
self.classifier = torch.nn.Linear(10, num_labels)
self.accuracy = CategoricalAccuracy()
def forward(
self, text: TextFieldTensors, label: torch.Tensor = None
) -> Dict[str, torch.Tensor]:
# Shape: (batch_size, num_tokens, embedding_dim)
embedded_text = self.embedder(text)
# Shape: (batch_size, num_tokens)
mask = util.get_text_field_mask(text)
# Shape: (batch_size, encoding_dim)
encoded_text = self.encoder(embedded_text, mask)
x=self.relu(self.layer1(encoded_text))
x=self.relu(self.layer2(x))
x=self.relu(self.layer3(x))
x=self.relu(self.layer4(x))
# Shape: (batch_size, num_labels)
logits = self.classifier(x)
# Shape: (batch_size, num_labels)
probs = torch.nn.functional.softmax(logits)
# Shape: (1,)
output = {"probs": probs}
if label is not None:
self.accuracy(logits, label)
output["loss"] = torch.nn.functional.cross_entropy(logits, label)
return output
def get_metrics(self, reset: bool = False) -> Dict[str, float]:
return {"accuracy": self.accuracy.get_metric(reset)}
But I have no idea how to change the configuration file. How do I change the following configuration file in the official tutorial to use ELMo?
my_text_classifier.jsonnet
{
"dataset_reader" : {
"type": "classification-tsv",
"token_indexers": {
"tokens": {
"type": "single_id"
}
}
},
"train_data_path": "data/movie_review/train.tsv",
"validation_data_path": "data/movie_review/dev.tsv",
"model": {
"type": "simple_classifier",
"embedder": {
"token_embedders": {
"tokens": {
"type": "embedding",
"embedding_dim": 10
}
}
},
"encoder": {
"type": "bag_of_embeddings",
"embedding_dim": 10
}
},
"data_loader": {
"batch_size": 8,
"shuffle": true
},
"trainer": {
"optimizer": "adam",
"num_epochs": 5
}
}
I'm very happy if someone could help me.
ANSWER
Answered 2021-Nov-12 at 23:10Check out the way the BiDAF model uses ELMo: https://raw.githubusercontent.com/allenai/allennlp-models/main/training_config/rc/bidaf_elmo.jsonnet
You can steal some of the components of that config. You will need the token embedder under the name "elmo"
, and, I believe, both the token indexers under "tokens"
and "elmo"
.
It should work without having to write any code.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install allennlp
If you want to use allennlp train and config files to specify experiments, use this template. We recommend this approach.
If you'd prefer to use python code to configure your experiments and run your training loop, use this template. There are a few things that are currently a little harder in this setup (loading a saved model, and using distributed training), but otherwise it's functionality equivalent to the config files setup.
Hyperparameter optimization for AllenNLP using Optuna
Training with multiple GPUs in AllenNLP
Training on larger batches with less memory in AllenNLP
How to upload transformer weights and tokenizers from AllenNLP to HuggingFace
AllenNLP requires Python 3.6.1 or later and PyTorch. We support AllenNLP on Mac and Linux environments. We presently do not support Windows but are open to contributions.
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page