allennlp | An open-source NLP research library , built on PyTorch | Natural Language Processing library

 by   allenai Python Version: 2.10.1 License: Apache-2.0

kandi X-RAY | allennlp Summary

allennlp is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Natural Language Processing, Deep Learning, Pytorch applications. allennlp has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install allennlp' or download it from GitHub, PyPI.
An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
    Support
      Quality
        Security
          License
            Reuse
            Support
              Quality
                Security
                  License
                    Reuse

                      kandi-support Support

                        summary
                        allennlp has a highly active ecosystem.
                        summary
                        It has 11427 star(s) with 2245 fork(s). There are 281 watchers for this library.
                        summary
                        There were 1 major release(s) in the last 6 months.
                        summary
                        There are 80 open issues and 2477 have been closed. On average issues are closed in 30 days. There are 11 open pull requests and 0 closed requests.
                        summary
                        It has a positive sentiment in the developer community.
                        summary
                        The latest version of allennlp is 2.10.1
                        allennlp Support
                          Best in #Natural Language Processing
                            Average in #Natural Language Processing
                            allennlp Support
                              Best in #Natural Language Processing
                                Average in #Natural Language Processing

                                  kandi-Quality Quality

                                    summary
                                    allennlp has 0 bugs and 0 code smells.
                                    allennlp Quality
                                      Best in #Natural Language Processing
                                        Average in #Natural Language Processing
                                        allennlp Quality
                                          Best in #Natural Language Processing
                                            Average in #Natural Language Processing

                                              kandi-Security Security

                                                summary
                                                allennlp has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
                                                summary
                                                allennlp code analysis shows 0 unresolved vulnerabilities.
                                                summary
                                                There are 0 security hotspots that need review.
                                                allennlp Security
                                                  Best in #Natural Language Processing
                                                    Average in #Natural Language Processing
                                                    allennlp Security
                                                      Best in #Natural Language Processing
                                                        Average in #Natural Language Processing

                                                          kandi-License License

                                                            summary
                                                            allennlp is licensed under the Apache-2.0 License. This license is Permissive.
                                                            summary
                                                            Permissive licenses have the least restrictions, and you can use them in most projects.
                                                            allennlp License
                                                              Best in #Natural Language Processing
                                                                Average in #Natural Language Processing
                                                                allennlp License
                                                                  Best in #Natural Language Processing
                                                                    Average in #Natural Language Processing

                                                                      kandi-Reuse Reuse

                                                                        summary
                                                                        allennlp releases are available to install and integrate.
                                                                        summary
                                                                        Deployable package is available in PyPI.
                                                                        summary
                                                                        Build file is available. You can build the component from source.
                                                                        summary
                                                                        Installation instructions, examples and code snippets are available.
                                                                        summary
                                                                        It has 56841 lines of code, 3687 functions and 596 files.
                                                                        summary
                                                                        It has medium code complexity. Code complexity directly impacts maintainability of the code.
                                                                        allennlp Reuse
                                                                          Best in #Natural Language Processing
                                                                            Average in #Natural Language Processing
                                                                            allennlp Reuse
                                                                              Best in #Natural Language Processing
                                                                                Average in #Natural Language Processing
                                                                                  Top functions reviewed by kandi - BETA
                                                                                  kandi has reviewed allennlp and discovered the below as its top functions. This is intended to give you an instant insight into allennlp implemented functionality, and help decide if they suit your requirements.
                                                                                  • Construct a TrainModel from partial objects
                                                                                    • Construct a new instance of this class
                                                                                    • Returns the constructor of the wrapped function
                                                                                    • Instantiate a class from a dictionary
                                                                                  • Compute the matching between two sentences
                                                                                    • Compute the multi - perspective matching between two vectors
                                                                                    • Compute cosine similarity between two vectors
                                                                                    • Return a tiny value for a given dtype
                                                                                  • Construct a vocabulary from pretrained pretrained pretrained data
                                                                                  • Performs an ELMo forward transformation
                                                                                  • Return a decorator to register a key
                                                                                  • Compute token embedding
                                                                                  • Load weights from a given file
                                                                                  • Select a batch of spans matching the given spans
                                                                                  • Perform a forward projection
                                                                                  • Compute the embedding
                                                                                  • Compute the token embedding
                                                                                  • Convert a tag sequence into a list of spans
                                                                                  • Creates embeddings for the given tokens
                                                                                  • Permute the top k k
                                                                                  • Create a model from pretrained module
                                                                                  • Returns a T5StackOutput object
                                                                                  • Forward computation
                                                                                  • Performs the forward computation
                                                                                  • Get a pre - trained model
                                                                                  • Evaluate a model
                                                                                  Get all kandi verified functions for this library.
                                                                                  Get all kandi verified functions for this library.

                                                                                  allennlp Key Features

                                                                                  An open-source NLP research library, built on PyTorch.

                                                                                  allennlp Examples and Code Snippets

                                                                                  OntoNotes-5.0-NER
                                                                                  Pythondot imgLines of Code : 80dot imgno licencesLicense : No License
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      OntoNotes-5.0-NER -conll-formatted-ontenotes-5.0/ -collect_conll.py -README.md -.. -onotenotes-release-5.0/
                                                                                  $ conda create --name py27 python=2.7 $ source activate py27
                                                                                  ./conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D ./ontonotes-release-5.0/data/files/data ./conll-formatted-ontonotes-5.0/v4/
                                                                                  ./conll-formatted-ontonotes-5.0/scripts/skeleton2conll.sh -D ./ontonotes-release-5.0/data/files/data ./conll-formatted-ontonotes-5.0/v12/
                                                                                  python collect_conll.py usage: collect_conll.py [-h] [-v VERSION] [-l LANGUAGE] [-d [DOMAIN [DOMAIN ...]]] optional arguments: -h, --help show this help message and exit -v VERSION, --version VERSION Which version of split, v4 or v12. -l LANGUAGE, --language LANGUAGE Which language to collect. -d [DOMAIN [DOMAIN ...]], --domain [DOMAIN [DOMAIN ...]] What domains to use. If not specified, all will be used. You can choose from bc bn mz nw tc wb.
                                                                                  python collect_conll.py -v v4
                                                                                  For file:v4/english/train.txt, there are 59924 sentences, 1088503 tokens. For file:v4/english/dev.txt, there are 8528 sentences, 147724 tokens. For file:v4/english/test.txt, there are 8262 sentences, 152728 tokens.
                                                                                  OntoNotes-5.0-NER/ -.. -v4/ -english/ -train.txt -dev.txt -test.txt
                                                                                  python collect_conll.py -v v12
                                                                                  For file:v12/english/train.txt, there are 94292 sentences, 1903816 tokens. For file:v12/english/dev.txt, there are 13900 sentences, 279495 tokens. For file:v12/english/test.txt, there are 10348 sentences, 204235 tokens.
                                                                                  python collect_conll.py -v v4 -l chinese
                                                                                  python collect_conll.py -v v4 -d bc bn mz nw
                                                                                  1 Document ID : ``str`` This is a variation on the document filename 2 Part number : ``int`` Some files are divided into multiple parts numbered as 000, 001, 002, ... etc. 3 Word number : ``int`` This is the word index of the word in that sentence. 4 Word : ``str`` This is the token as segmented/tokenized in the Treebank. Initially the ``*_skel`` file contain the placeholder [WORD] which gets replaced by the actual token from the Treebank which is part of the OntoNotes release. 5 POS Tag : ``str`` This is the Penn Treebank style part of speech. When parse information is missing, all part of speeches except the one for which there is some sense or proposition annotation are marked with a XX tag. The verb is marked with just a VERB tag. 6 Parse bit: ``str`` This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a ``*``. When the parse information is missing, the first word of a sentence is tagged as ``(TOP*`` and the last word is tagged as ``*)`` and all intermediate words are tagged with a ``*``. 7 Predicate lemma: ``str`` The predicate lemma is mentioned for the rows for which we have semantic role information or word sense information. All other rows are marked with a "-". 8 Predicate Frameset ID: ``int`` The PropBank frameset ID of the predicate in Column 7. 9 Word sense: ``float`` This is the word sense of the word in Column 3. 10 Speaker/Author: ``str`` This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data. When not available the rows are marked with an "-". 11 Named Entities: ``str`` These columns identifies the spans representing various named entities. For documents which do not have named entity annotation, each line is represented with an ``*``. 12+ Predicate Arguments: ``str`` There is one column each of predicate argument structure information for the predicate mentioned in Column 7. If there are no predicates tagged in a sentence this is a single column with all rows marked with an ``*``. -1 Co-reference: ``str`` Co-reference chain information encoded in a parenthesis structure. For documents that do not have co-reference annotations, each line is represented with a "-".
                                                                                  SciBERT,Model training,Training new models using AllenNLP
                                                                                  Pythondot imgLines of Code : 22dot imgLicense : Permissive (Apache-2.0)
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      ├── ner │   ├── JNLPBA │   ├── NCBI-disease │   ├── bc5cdr │   └── sciie ├── parsing │   └── genia ├── pico │   └── ebmnlp └── text_classification ├── chemprot ├── citation_intent ├── mag ├── rct-20k ├── sci-cite └── sciie-relation-extraction
                                                                                  DATASET='bc5cdr' TASK='ner' ...
                                                                                  export BERT_VOCAB=path-to/scibert_scivocab_uncased.vocab export BERT_WEIGHTS=path-to/scibert_scivocab_uncased.tar.gz
                                                                                  ./scibert/scripts/train_allennlp_local.sh [serialization-directory]
                                                                                  Setup
                                                                                  Pythondot imgLines of Code : 8dot imgno licencesLicense : No License
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      conda create -n allennlp_spacy
                                                                                  source activate allennlp_spacy
                                                                                  pip install http://download.pytorch.org/whl/torch-0.2.0.post3-cp36-cp36m-macosx_10_7_x86_64.whl
                                                                                  python -m spacy download es
                                                                                  pip install -r requirements.txt
                                                                                  python setup.py develop
                                                                                  pip install tensorboard
                                                                                  download_prepare_fasttext.sh
                                                                                  dgl - model utils
                                                                                  Pythondot imgLines of Code : 35dot imgLicense : Non-SPDX (Apache License 2.0)
                                                                                  copy iconCopy
                                                                                  
                                                                                                                      import torch as th from torch.autograd import Function def batch2tensor(batch_adj, batch_feat, node_per_pool_graph): """ transform a batched graph to batched adjacency tensor and node feature tensor """ batch_size = int(batch_adj.size()[0] / node_per_pool_graph) adj_list = [] feat_list = [] for i in range(batch_size): start = i * node_per_pool_graph end = (i + 1) * node_per_pool_graph adj_list.append(batch_adj[start:end, start:end]) feat_list.append(batch_feat[start:end, :]) adj_list = list(map(lambda x: th.unsqueeze(x, 0), adj_list)) feat_list = list(map(lambda x: th.unsqueeze(x, 0), feat_list)) adj = th.cat(adj_list, dim=0) feat = th.cat(feat_list, dim=0) return feat, adj def masked_softmax( matrix, mask, dim=-1, memory_efficient=True, mask_fill_value=-1e32 ): """ masked_softmax for dgl batch graph code snippet contributed by AllenNLP (https://github.com/allenai/allennlp) """ if mask is None: result = th.nn.functional.softmax(matrix, dim=dim) else: mask = mask.float() while mask.dim() < matrix.dim(): mask = mask.unsqueeze(1) if not memory_efficient: result = th.nn.functional.softmax(matrix * mask, dim=dim) result = result * mask result = result / (result.sum(dim=dim, keepdim=True) + 1e-13) else: masked_matrix = matrix.masked_fill( (1 - mask).byte(), mask_fill_value ) result = th.nn.functional.softmax(masked_matrix, dim=dim) return result
                                                                                  PyTorch BERT TypeError: forward() got an unexpected keyword argument 'labels'
                                                                                  Pythondot imgLines of Code : 16dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  def forward(self, input_ids, attention_mask=None, token_type_ids=None,
                                                                                              position_ids=None, head_mask=None, labels=None):
                                                                                  
                                                                                  Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
                                                                                          **loss**: (`optional`, returned when ``labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
                                                                                              Classification (or regression if config.num_labels==1) loss.
                                                                                          **logits**: ``torch.FloatTensor`` of shape ``(batch_size, config.num_labels)``
                                                                                              Classification (or regression if config.num_labels==1) scores (before SoftMax).
                                                                                          **hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
                                                                                              list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
                                                                                              of shape ``(batch_size, sequence_length, hidden_size)``:
                                                                                              Hidden-states of the model at the output of each layer plus the initial embedding outputs.
                                                                                          **attentions**: (`optional`, returned when ``config.output_attentions=True``)
                                                                                              list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
                                                                                              Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. 
                                                                                  
                                                                                  copy iconCopy
                                                                                  import ast
                                                                                  df["OIE Triples"] = df["OIE output"].apply(ast.literal_eval)
                                                                                  
                                                                                  df["OIE Triples"] = df["OIE Triples"].apply(lambda val: [a_dict["description"]
                                                                                                                                           for a_dict in val["verbs"]])
                                                                                  df = df.explode("OIE Triples").drop(columns="OIE output")
                                                                                  
                                                                                                                sentence      ID                                      OIE Triples
                                                                                  0        'The girl went to the cinema'  'abcd'  [ARG0: The girl] [V: went] [ARG1:to the cinema]
                                                                                  1  'He is right and he is an engineer'  'efgh'                  [ARG0: He] [V: is] [ARG1:right]
                                                                                  1  'He is right and he is an engineer'  'efgh'            [ARG0: He] [V: is] [ARG1:an engineer]
                                                                                  
                                                                                  How to use Elmo word embedding with the original pre-trained model (5.5B) in interactive mode
                                                                                  Pythondot imgLines of Code : 5dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  elmo = ElmoEmbedder(
                                                                                      options_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json', 
                                                                                      weight_file='https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway_5.5B/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5'
                                                                                  )
                                                                                  
                                                                                  TypeError: 'NLP' object is not callable
                                                                                  Pythondot imgLines of Code : 4dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  >>> from nlg.utils import load_spacy_model
                                                                                  >>> nlp = load_spacy_model()
                                                                                  >>> text = nlp("The virginica species has the least average sepal_width.")
                                                                                  
                                                                                  Parsing nested dictionary (allen nlp hierplane_tree)
                                                                                  Pythondot imgLines of Code : 74dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  def get_entity_attributes(obj, key, value):
                                                                                      """Recursively fetch values from nested JSON."""
                                                                                      arr = []
                                                                                  
                                                                                      def extract(obj, arr, key):
                                                                                          """Recursively search for values of key in JSON tree."""
                                                                                          if isinstance(obj, dict):
                                                                                              for k, v in obj.items():
                                                                                                  if isinstance(v, (dict, list)):
                                                                                                      extract(v, arr, key)
                                                                                          elif isinstance(obj, list):
                                                                                              for item in obj:
                                                                                                  if(isinstance(item,dict)):
                                                                                                      ky,vl = key, value
                                                                                                      if ky in item and vl == item[ky]:
                                                                                  #                         print(type(item), item)
                                                                                                          arr.append(item)
                                                                                                  extract(item, arr, key)
                                                                                          return arr
                                                                                  
                                                                                      values = extract(obj, arr, key)
                                                                                      return values
                                                                                  
                                                                                  def parse_attributes(obj, key):
                                                                                      """Recursively fetch values from nested JSON."""
                                                                                      arr = []
                                                                                  
                                                                                      def extract(obj, arr, key):
                                                                                          """Recursively search for values of key in JSON tree."""
                                                                                          if isinstance(obj, dict):
                                                                                              for k, v in obj.items():
                                                                                                  if isinstance(v, (dict, list)):
                                                                                                      extract(v, arr, key)
                                                                                                  elif k == key:
                                                                                                      arr.append(v)
                                                                                          elif isinstance(obj, list):
                                                                                              for item in obj:
                                                                                                  extract(item, arr, key)
                                                                                          return arr
                                                                                  
                                                                                      values = extract(obj, arr, key)
                                                                                      return values
                                                                                  
                                                                                  # Create list of word tokens after removing stopwords
                                                                                  def get_clean_list(entities):
                                                                                      filtered_sentence = []
                                                                                  
                                                                                      for word in entities:
                                                                                          lexeme = nlp.vocab[word]
                                                                                          if not lexeme.is_stop and not lexeme.is_punct:
                                                                                              filtered_sentence.append(word) 
                                                                                      return filtered_sentence
                                                                                  
                                                                                  text = "When I was walking to the park yesterday, I saw a man wearing a blue shirt."
                                                                                  tree = predictor.predict(sentence=text)
                                                                                  
                                                                                  key = "word"
                                                                                  entity = "man"
                                                                                  entities = get_entity_attributes(tree, key, entity)
                                                                                  
                                                                                  for ent in entities:
                                                                                      if ent['nodeType'] == 'dep':
                                                                                          attributes = parse_attributes(ent, key)
                                                                                          clean_attributes = get_clean_list(attributes)
                                                                                          clean_attributes.remove(entity)
                                                                                          print(f'entity: {entity} Attributes: {clean_attributes}')
                                                                                      else:
                                                                                          attributes = parse_attributes(ent, key)
                                                                                          clean_attributes = get_clean_list(attributes)
                                                                                          clean_attributes.remove(entity)
                                                                                          print(f'entity: {entity} Action Attributes: {clean_attributes}')
                                                                                  
                                                                                  entity: man Attributes: ['wearing', 'shirt', 'blue']
                                                                                  
                                                                                  Read JSON file correctly
                                                                                  Pythondot imgLines of Code : 6dot imgLicense : Strong Copyleft (CC BY-SA 4.0)
                                                                                  copy iconCopy
                                                                                  def _read(self, file_path):
                                                                                      with open(cached_path(file_path), "r") as data_file:
                                                                                          data = json.load(data_file)
                                                                                      for item in data:
                                                                                          text = item["text"]
                                                                                  
                                                                                  Community Discussions

                                                                                  Trending Discussions on allennlp

                                                                                  Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
                                                                                  chevron right
                                                                                  Loading a HuggingFace model into AllenNLP gives different predictions
                                                                                  chevron right
                                                                                  RuntimeError: Error loading state dict for SrlBert Missing keys: ['bert_model.embeddings.position_ids'] Unexpected keys: []
                                                                                  chevron right
                                                                                  Allennlp: How to load a pretrained ELMo as the embedding of allennlp model?
                                                                                  chevron right
                                                                                  How to change AllenNLP BERT based Semantic Role Labeling to RoBERTa in AllenNLP
                                                                                  chevron right
                                                                                  How to interpret Allen NLP Coreference resolution model output?
                                                                                  chevron right
                                                                                  Google mT5-small configuration error because number attention heads is not divider of model dimension
                                                                                  chevron right
                                                                                  Writing custom metrics in allennlp
                                                                                  chevron right
                                                                                  Using multiprocessing with AllenNLP decoding is sluggish compared to non-multiprocessing case
                                                                                  chevron right
                                                                                  How to incorporate ELMo into the simple classification of AllenNLP Guide
                                                                                  chevron right

                                                                                  QUESTION

                                                                                  Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
                                                                                  Asked 2022-Mar-24 at 17:17

                                                                                  I am trying to train my own custom ELMo model on AllenNLP.

                                                                                  The following bug RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1 arises when training the model. There are instances where the size of tensor a is stated to be other values (e.g. 5300). When I tested on a small subset of files, I was able to train the model successfully.

                                                                                  Based on my intuition, this is something that deals with the number of tokens in my model. More specifically specific files which has tokens more than 5000. However, there is no parameter within the AllenNLP package which allows me to tweak this to bypass this error.

                                                                                  Any advice on how I can overcome this issue? Would tweaking the PyTorch code to set it at a 5000 size work (If yes, how can I do that)? Any insights will be deeply appreciated.

                                                                                  FYI, I am currently using a customised DatasetReader for tokenisation purposes. I've generated my own vocab list before training the model (to save some time) which is used to train the ELMo model via AllenNLP.

                                                                                  Update: I found out that there is this variable from AllenNLP max_len=5000 which is why the error is showing. See code here. I've tweaked the parameter to larger values and ended up with CUDA Out of Memory Error on many occasions instead. Making me believe this should not be touched.

                                                                                  Environment: Python 3.6.9, Linux Ubuntu, allennlp=2.9.1, allennlp-models=2.9.0

                                                                                  Traceback:

                                                                                  Traceback (most recent call last):
                                                                                    File "/home/jiayi/.local/bin/allennlp", line 8, in 
                                                                                      sys.exit(run())
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/__main__.py", line 34, in run
                                                                                      main(prog="allennlp")
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 121, in main
                                                                                      args.func(args)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 120, in train_model_from_args
                                                                                      file_friendly_logging=args.file_friendly_logging,
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 179, in train_model_from_file
                                                                                      file_friendly_logging=file_friendly_logging,
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 246, in train_model
                                                                                      file_friendly_logging=file_friendly_logging,
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 470, in _train_worker
                                                                                      metrics = train_loop.run()
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 543, in run
                                                                                      return self.trainer.train()
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 720, in train
                                                                                      metrics, epoch = self._try_train()
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 741, in _try_train
                                                                                      train_metrics = self._train_epoch(epoch)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 459, in _train_epoch
                                                                                      batch_outputs = self.batch_outputs(batch, for_training=True)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 352, in batch_outputs
                                                                                      output_dict = self._pytorch_model(**batch)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/models/language_model.py", line 257, in forward
                                                                                      embeddings, mask
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 282, in forward
                                                                                      token_embeddings = self._position(token_embeddings)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 68, in forward
                                                                                      return x + self.positional_encoding[:, : x.size(1)]
                                                                                  RuntimeError: The size of tensor a (5385) must match the size of tensor b (5000) at non-singleton dimension 1
                                                                                  

                                                                                  AllenNLP training config file:

                                                                                  // For more info on config files generally, see https://guide.allennlp.org/using-config-files
                                                                                  
                                                                                  local NUM_GRAD_ACC = 4;
                                                                                  local BATCH_SIZE = 1;
                                                                                  
                                                                                  local BASE_LOADER = {
                                                                                    "max_instances_in_memory": 8,
                                                                                    "batch_sampler": {
                                                                                      "type": "bucket",
                                                                                      "batch_size": BATCH_SIZE,
                                                                                      "sorting_keys": ["source"]
                                                                                    }
                                                                                  };
                                                                                  
                                                                                  {
                                                                                      "dataset_reader" : {
                                                                                          "type": "mimic_reader",
                                                                                          "token_indexers": {
                                                                                              "tokens": {
                                                                                                  "type": "single_id"
                                                                                              },
                                                                                              "token_characters": {
                                                                                                  "type": "elmo_characters"
                                                                                              }
                                                                                          },
                                                                                          "start_tokens": [""],
                                                                                          "end_tokens": [""],
                                                                                      },
                                                                                      "train_data_path": std.extVar("MIMIC3_NOTEEVENTS_DISCHARGE_PATH"),
                                                                                      // Note: We don't set a validation_data_path because the softmax is only
                                                                                      // sampled during training. Not sampling on GPUs results in a certain OOM
                                                                                      // given our large vocabulary. We'll need to evaluate against the test set
                                                                                      // (when we'll want a full softmax) with the CPU.
                                                                                      "vocabulary": {
                                                                                          // Use a prespecified vocabulary for efficiency.
                                                                                          "type": "from_files",
                                                                                          "directory": std.extVar("ELMO_VOCAB_PATH"),
                                                                                          // Plausible config for generating the vocabulary.
                                                                                          // "tokens_to_add": {
                                                                                          //     "tokens": ["", ""],
                                                                                          //     "token_characters": ["<>/S"]
                                                                                          // },
                                                                                          // "min_count": {"tokens": 3}
                                                                                      },
                                                                                      "model": {
                                                                                          "type": "language_model",
                                                                                          "bidirectional": true,
                                                                                          "num_samples": 8192,
                                                                                          # Sparse embeddings don't work with DistributedDataParallel.
                                                                                          "sparse_embeddings": false,
                                                                                          "text_field_embedder": {
                                                                                          "token_embedders": {
                                                                                              "tokens": {
                                                                                              "type": "empty"
                                                                                              },
                                                                                              "token_characters": {
                                                                                                  "type": "character_encoding",
                                                                                                  "embedding": {
                                                                                                      "num_embeddings": 262,
                                                                                                      // Same as the Transformer ELMo in Calypso. Matt reports that
                                                                                                      // this matches the original LSTM ELMo as well.
                                                                                                      "embedding_dim": 16
                                                                                                  },
                                                                                                  "encoder": {
                                                                                                      "type": "cnn-highway",
                                                                                                      "activation": "relu",
                                                                                                      "embedding_dim": 16,
                                                                                                      "filters": [
                                                                                                          [1, 32],
                                                                                                          [2, 32],
                                                                                                          [3, 64],
                                                                                                          [4, 128],
                                                                                                          [5, 256],
                                                                                                          [6, 512],
                                                                                                          [7, 1024]],
                                                                                                      "num_highway": 2,
                                                                                                      "projection_dim": 512,
                                                                                                      "projection_location": "after_highway",
                                                                                                      "do_layer_norm": true
                                                                                                  }
                                                                                              }
                                                                                          }
                                                                                          },
                                                                                          // Consider the following.
                                                                                          // remove_bos_eos: true,
                                                                                          // Applies to the contextualized embeddings.
                                                                                          "dropout": 0.1,
                                                                                          "contextualizer": {
                                                                                              "type": "bidirectional_language_model_transformer",
                                                                                              "input_dim": 512,
                                                                                              "hidden_dim": 4096,
                                                                                              "num_layers": 2,
                                                                                              "dropout": 0.1,
                                                                                              "input_dropout": 0.1
                                                                                          }
                                                                                      },
                                                                                      "data_loader": BASE_LOADER,
                                                                                      // "distributed": {
                                                                                      //     "cuda_devices": [0, 1],
                                                                                      // },
                                                                                      "trainer": {
                                                                                          "num_epochs": 10,
                                                                                          "cuda_devices": [0, 1, 2, 3],
                                                                                          "optimizer": {
                                                                                          // The gradient accumulators in Adam for the running stdev and mean for
                                                                                          // words not used in the sampled softmax would be decayed to zero with the
                                                                                          // standard "adam" optimizer.
                                                                                          "type": "dense_sparse_adam"
                                                                                          },
                                                                                          // "grad_norm": 10.0,
                                                                                          "learning_rate_scheduler": {
                                                                                          "type": "noam",
                                                                                          // See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401
                                                                                          "model_size": 512,
                                                                                          // See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51.
                                                                                          // Adjusted based on our sample size relative to Calypso's.
                                                                                          "warmup_steps": 6000
                                                                                          },
                                                                                          "num_gradient_accumulation_steps": NUM_GRAD_ACC,
                                                                                          "use_amp": true
                                                                                      }
                                                                                  }
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-24 at 17:17

                                                                                  By setting the max_tokens variable for the custom DatasetReader built to below 5000, this error no longer persists. This was also suggested by one of AllenNLP's contributor to make sure the tokenizer truncates the input to 5000 tokens.

                                                                                  Same question was posted on AllenNLP: https://github.com/allenai/allennlp/discussions/5601

                                                                                  Source https://stackoverflow.com/questions/71514727

                                                                                  QUESTION

                                                                                  Loading a HuggingFace model into AllenNLP gives different predictions
                                                                                  Asked 2022-Mar-13 at 14:56

                                                                                  I have a custom classification model trained using transformers library based on a BERT model. The model classifies text into 7 different categories. It is persisted in a directory using:

                                                                                  trainer.save_model(model_name)
                                                                                  tokenizer.save_pretrained(model_name)
                                                                                  

                                                                                  I'm trying to load such persisted model using the allennlp library for further analysis. I managed to do so after a lot of work. However, when running the model inside the allennlp framework, the model tends to predict very different from the predictions I get when I run it using transformers, which lead me think some part of the loading was not done correctly. There are no errors during the inference, it is just that the predictions don't match.

                                                                                  There is little documentation about how to load an existing model, so I'm wondering if someone faced the same situation before. There is just one example of how to do QA classification with ROBERTA, but couldn't extrapolate to what I'm looking for. Anyone have an idea if the steps are following are correct?

                                                                                  This is how I'm loading the trained model:

                                                                                  transformer_vocab = Vocabulary.from_pretrained_transformer(model_name)
                                                                                  transformer_tokenizer = PretrainedTransformerTokenizer(model_name)
                                                                                  transformer_encoder = BertPooler(model_name)
                                                                                  
                                                                                  params = Params(
                                                                                      {
                                                                                       "token_embedders": {
                                                                                          "tokens": {
                                                                                            "type": "pretrained_transformer",
                                                                                            "model_name": model_name,
                                                                                          }
                                                                                        }
                                                                                      }
                                                                                  )
                                                                                  token_embedder = BasicTextFieldEmbedder.from_params(vocab=vocab, params=params)
                                                                                  token_indexer = PretrainedTransformerIndexer(model_name)
                                                                                  
                                                                                  transformer_model = BasicClassifier(vocab=transformer_vocab,
                                                                                                                      text_field_embedder=token_embedder, 
                                                                                                                      seq2vec_encoder=transformer_encoder, 
                                                                                                                      dropout=0.1, 
                                                                                                                      num_labels=7)
                                                                                  

                                                                                  I also had to implement my own DatasetReader as follows:

                                                                                  class ClassificationTransformerReader(DatasetReader):
                                                                                      def __init__(
                                                                                          self,
                                                                                          tokenizer: Tokenizer,
                                                                                          token_indexer: TokenIndexer,
                                                                                          max_tokens: int,
                                                                                          **kwargs
                                                                                      ):
                                                                                          super().__init__(**kwargs)
                                                                                          self.tokenizer = tokenizer
                                                                                          self.token_indexers: Dict[str, TokenIndexer] = { "tokens": token_indexer }
                                                                                          self.max_tokens = max_tokens
                                                                                          self.vocab = vocab
                                                                                  
                                                                                      def text_to_instance(self, text: str, label: str = None) -> Instance:
                                                                                          tokens = self.tokenizer.tokenize(text)
                                                                                          if self.max_tokens:
                                                                                              tokens = tokens[: self.max_tokens]
                                                                                          
                                                                                          inputs = TextField(tokens, self.token_indexers)
                                                                                          fields: Dict[str, Field] = { "tokens": inputs }
                                                                                              
                                                                                          if label:
                                                                                              fields["label"] = LabelField(label)
                                                                                              
                                                                                          return Instance(fields)
                                                                                  

                                                                                  It is instantiated as follows:

                                                                                  dataset_reader = ClassificationTransformerReader(tokenizer=transformer_tokenizer,
                                                                                                                                   token_indexer=token_indexer,
                                                                                                                                   max_tokens=400)
                                                                                  

                                                                                  To run the model and test out if it works I'm doing the following:

                                                                                  instance = dataset_reader.text_to_instance("some sample text here")
                                                                                  dataset = Batch([instance])
                                                                                  dataset.index_instances(transformer_vocab)
                                                                                  model_input = util.move_to_device(dataset.as_tensor_dict(), 
                                                                                                                    transformer_model._get_prediction_device())
                                                                                  
                                                                                  outputs = transformer_model.make_output_human_readable(transformer_model(**model_input))
                                                                                  

                                                                                  This works and returns the probabilities correctly, but there don't match what I would get running the model using transformers directly. Any idea what's going on?

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-11 at 19:55

                                                                                  As discussed on GitHub: The problem is that you are constructing a 7-way classifier on top of BERT. Even though the BERT model will be identical, the 7-way classifier on top of it is randomly initialized every time.

                                                                                  BERT itself does not come with a classifier. That has to be fine-tuned for your data.

                                                                                  Source https://stackoverflow.com/questions/69876688

                                                                                  QUESTION

                                                                                  RuntimeError: Error loading state dict for SrlBert Missing keys: ['bert_model.embeddings.position_ids'] Unexpected keys: []
                                                                                  Asked 2022-Mar-11 at 04:52

                                                                                  I am just a beginner in NLP and was trying to learn the Semantic role labeling concept through implementation. I was trying to load the bert-base-srl model from the public storage of allennlp. But was facing the following error:

                                                                                  from allennlp.predictors.predictor import Predictor
                                                                                  predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
                                                                                  
                                                                                  ---------------------------------------------------------------------------
                                                                                  RuntimeError                              Traceback (most recent call last)
                                                                                  ~\AppData\Local\Temp/ipykernel_11672/96061884.py in 
                                                                                        1 from allennlp.predictors.predictor import Predictor
                                                                                  ----> 2 predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
                                                                                  
                                                                                  ~\anaconda3\lib\site-packages\allennlp\predictors\predictor.py in from_path(cls, archive_path, predictor_name, cuda_device, dataset_reader_to_load, frozen, import_plugins, overrides, **kwargs)
                                                                                      364             plugins.import_plugins()
                                                                                      365         return Predictor.from_archive(
                                                                                  --> 366             load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
                                                                                      367             predictor_name,
                                                                                      368             dataset_reader_to_load=dataset_reader_to_load,
                                                                                  
                                                                                  ~\anaconda3\lib\site-packages\allennlp\models\archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
                                                                                      233             config.duplicate(), serialization_dir
                                                                                      234         )
                                                                                  --> 235         model = _load_model(config.duplicate(), weights_path, serialization_dir, cuda_device)
                                                                                      236 
                                                                                      237         # Load meta.
                                                                                  
                                                                                  ~\anaconda3\lib\site-packages\allennlp\models\archival.py in _load_model(config, weights_path, serialization_dir, cuda_device)
                                                                                      277 
                                                                                      278 def _load_model(config, weights_path, serialization_dir, cuda_device):
                                                                                  --> 279     return Model.load(
                                                                                      280         config,
                                                                                      281         weights_file=weights_path,
                                                                                  
                                                                                  ~\anaconda3\lib\site-packages\allennlp\models\model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
                                                                                      436             # get_model_class method, that recurses whenever it finds a from_archive model type.
                                                                                      437             model_class = Model
                                                                                  --> 438         return model_class._load(config, serialization_dir, weights_file, cuda_device)
                                                                                      439 
                                                                                      440     def extend_embedder_vocab(self, embedding_sources_mapping: Dict[str, str] = None) -> None:
                                                                                  
                                                                                  ~\anaconda3\lib\site-packages\allennlp\models\model.py in _load(cls, config, serialization_dir, weights_file, cuda_device)
                                                                                      378 
                                                                                      379         if unexpected_keys or missing_keys:
                                                                                  --> 380             raise RuntimeError(
                                                                                      381                 f"Error loading state dict for {model.__class__.__name__}\n\t"
                                                                                      382                 f"Missing keys: {missing_keys}\n\t"
                                                                                  
                                                                                  RuntimeError: Error loading state dict for SrlBert
                                                                                      Missing keys: ['bert_model.embeddings.position_ids']
                                                                                      Unexpected keys: []
                                                                                  

                                                                                  Does someone know a fix for this?

                                                                                  ANSWER

                                                                                  Answered 2022-Mar-11 at 04:52

                                                                                  If you are on the later versions of allennlp-models, you can use this archive_file instead: https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz.

                                                                                  The latest versions of the model archive files can be found on the demo page in the Model Card tab: https://demo.allennlp.org/semantic-role-labeling

                                                                                  Source https://stackoverflow.com/questions/71432983

                                                                                  QUESTION

                                                                                  Allennlp: How to load a pretrained ELMo as the embedding of allennlp model?
                                                                                  Asked 2022-Feb-24 at 19:15

                                                                                  I am new in allennlp. I trained an elmo model to apply it to other allennlp models as the embedding but failed. It seems that my model is not compatible to the interface the config gives. What can I do?

                                                                                  My elmo is trained by allennlp with the command:

                                                                                  allennlp train config/elmo.jsonnet --serialization-dir /xxx
                                                                                  

                                                                                  The elmo.jsonnet is almost the same to https://github.com/allenai/allennlp-models/blob/main/training_config/lm/bidirectional_language_model.jsonnet except the dataset and vocabulary.

                                                                                  After that, I got an elmo model with:

                                                                                  config.json
                                                                                  weights.th
                                                                                  vocabulary/
                                                                                  vocabulary/.lock
                                                                                  vocabulary/non_padded_namespaces.txt
                                                                                  vocabulary/tokens.txt
                                                                                  meta.json
                                                                                  

                                                                                  When I try to load the model into other models like bidaf-elmo in https://github.com/allenai/allennlp-models/blob/main/training_config/rc/bidaf_elmo.jsonnet, I found it requires the options and weights:

                                                                                  "elmo": {
                                                                                      "type": "elmo_token_embedder",
                                                                                      "do_layer_norm": false,
                                                                                      "dropout": 0,
                                                                                      "options_file": "xxx/options.json",
                                                                                      "weight_file": "xxx/weights.hdf5"
                                                                                  }
                                                                                  

                                                                                  Which are not included in my model. I tried to change model.state_dict() to weights.hdf5 but I received an error:

                                                                                  KeyError: "Unable to open object (object 'char_embed' doesn't exist)"
                                                                                  

                                                                                  Which is required in

                                                                                  File "/home/xxx/anaconda3/envs/thesis_torch1.8/lib/python3.8/site-packages/allennlp/modules/elmo.py", line 393, in _load_char_embedding
                                                                                      char_embed_weights = fin["char_embed"][...]
                                                                                  

                                                                                  It seems that the model I trained by allennlp is not compatible with the interface. How can I apply my elmo as the embedding of other models?

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-24 at 19:15

                                                                                  You are right, those two formats don't align.

                                                                                  I'm afraid there is no easy way out. I think you'll have to write a TokenEmbedder that can read and apply the output from bidirectional_language_model.jsonnet.

                                                                                  If you do, we'd love to have it as a contribution to AllenNLP!

                                                                                  Source https://stackoverflow.com/questions/71182588

                                                                                  QUESTION

                                                                                  How to change AllenNLP BERT based Semantic Role Labeling to RoBERTa in AllenNLP
                                                                                  Asked 2022-Feb-24 at 12:34

                                                                                  Currently i'm able to train a Semantic Role Labeling model using the config file below. This config file is based on the one provided by AllenNLP and works for the default bert-base-uncased model and also GroNLP/bert-base-dutch-cased.

                                                                                  {
                                                                                    "dataset_reader": {
                                                                                      "type": "srl_custom",
                                                                                      "bert_model_name": "GroNLP/bert-base-dutch-cased"
                                                                                    },
                                                                                    "data_loader": {
                                                                                      "batch_sampler": {
                                                                                        "type": "bucket",
                                                                                        "batch_size": 32
                                                                                      }
                                                                                    },
                                                                                    "train_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
                                                                                    "validation_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
                                                                                    "model": {
                                                                                      "type": "srl_bert",
                                                                                      "embedding_dropout": 0.1,
                                                                                      "bert_model": "GroNLP/bert-base-dutch-cased"
                                                                                    },
                                                                                    "trainer": {
                                                                                      "optimizer": {
                                                                                        "type": "huggingface_adamw",
                                                                                        "lr": 5e-5,
                                                                                        "correct_bias": false,
                                                                                        "weight_decay": 0.01,
                                                                                        "parameter_groups": [
                                                                                          [
                                                                                            [
                                                                                              "bias",
                                                                                              "LayerNorm.bias",
                                                                                              "LayerNorm.weight",
                                                                                              "layer_norm.weight"
                                                                                            ],
                                                                                            {
                                                                                              "weight_decay": 0.0
                                                                                            }
                                                                                          ]
                                                                                        ]
                                                                                      },
                                                                                      "learning_rate_scheduler": {
                                                                                        "type": "slanted_triangular"
                                                                                      },
                                                                                      "checkpointer": {
                                                                                        "keep_most_recent_by_count": 2
                                                                                      },
                                                                                      "grad_norm": 1.0,
                                                                                      "num_epochs": 3,
                                                                                      "validation_metric": "+f1-measure-overall"
                                                                                    }
                                                                                  }
                                                                                  

                                                                                  Swapping the values of bert_model_name and bert_model parameters from GroNLP/bert-base-dutch-cased to roberta-base won't work out of the box since the SRL datareader only supports the BertTokenizer and not the RobertaTokenizer. So I changed the config file to the following:

                                                                                  {
                                                                                    "dataset_reader": {
                                                                                      "type": "srl_custom",
                                                                                      "token_indexers": {
                                                                                        "tokens": {
                                                                                          "type": "pretrained_transformer",
                                                                                          "model_name": "roberta-base"
                                                                                        }
                                                                                      }
                                                                                    },
                                                                                    "data_loader": {
                                                                                      "batch_sampler": {
                                                                                        "type": "bucket",
                                                                                        "batch_size": 32
                                                                                      }
                                                                                    },
                                                                                    "train_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
                                                                                    "validation_data_path": "./data/SRL/SONAR_1_SRL/MANUAL500/",
                                                                                    "model": {
                                                                                      "type": "srl_bert",
                                                                                      "embedding_dropout": 0.1,
                                                                                      "bert_model": "roberta-base"
                                                                                    },
                                                                                    "trainer": {
                                                                                      "optimizer": {
                                                                                        "type": "huggingface_adamw",
                                                                                        "lr": 5e-5,
                                                                                        "correct_bias": false,
                                                                                        "weight_decay": 0.01,
                                                                                        "parameter_groups": [
                                                                                          [
                                                                                            [
                                                                                              "bias",
                                                                                              "LayerNorm.bias",
                                                                                              "LayerNorm.weight",
                                                                                              "layer_norm.weight"
                                                                                            ],
                                                                                            {
                                                                                              "weight_decay": 0.0
                                                                                            }
                                                                                          ]
                                                                                        ]
                                                                                      },
                                                                                      "learning_rate_scheduler": {
                                                                                        "type": "slanted_triangular"
                                                                                      },
                                                                                      "checkpointer": {
                                                                                        "keep_most_recent_by_count": 2
                                                                                      },
                                                                                      "grad_norm": 1.0,
                                                                                      "num_epochs": 15,
                                                                                      "validation_metric": "+f1-measure-overall"
                                                                                    }
                                                                                  }
                                                                                  

                                                                                  However, this is still not working. I'm receiving the following error:

                                                                                  2022-02-22 16:19:34,122 - INFO - allennlp.training.gradient_descent_trainer - Training
                                                                                    0%|          | 0/1546 [00:00
                                                                                      sys.exit(run())
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\__main__.py", line 39, in run
                                                                                      main(prog="allennlp")
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\__init__.py", line 119, in main
                                                                                      args.func(args)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 111, in train_model_from_args
                                                                                      train_model_from_file(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 177, in train_model_from_file
                                                                                      return train_model(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 258, in train_model
                                                                                      model = _train_worker(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 508, in _train_worker
                                                                                      metrics = train_loop.run()
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\commands\train.py", line 581, in run
                                                                                      return self.trainer.train()
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 771, in train
                                                                                      metrics, epoch = self._try_train()
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 793, in _try_train
                                                                                      train_metrics = self._train_epoch(epoch)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 510, in _train_epoch
                                                                                      batch_outputs = self.batch_outputs(batch, for_training=True)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp\training\gradient_descent_trainer.py", line 403, in batch_outputs
                                                                                      output_dict = self._pytorch_model(**batch)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp_models\structured_prediction\models\srl_bert.py", line 141, in forward
                                                                                      bert_embeddings, _ = self.bert_model(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\transformers\models\bert\modeling_bert.py", line 989, in forward
                                                                                      embedding_output = self.embeddings(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\transformers\models\bert\modeling_bert.py", line 215, in forward
                                                                                      token_type_embeddings = self.token_type_embeddings(token_type_ids)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
                                                                                      result = self.forward(*input, **kwargs)
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\sparse.py", line 156, in forward
                                                                                      return F.embedding(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\torch\nn\functional.py", line 1916, in embedding
                                                                                      return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
                                                                                  IndexError: index out of range in self
                                                                                  

                                                                                  I don't fully understand whats going wrong and couldn't find any documentation on how to change the config file to load in a 'custom' BERT/RoBERTa model (one thats not mentioned here). I'm running the default allennlp train config.jsonnet command to start training. allennlp train config.jsonnet --dry-run produces no errors however.

                                                                                  Thanks in advance! Thijs

                                                                                  EDIT: I've now swapped out and inherited the "srl_bert" for a custom "srl_roberta" class to make use of the RobertaModel. This however still produces the same error.

                                                                                  EDIT2: I'm now using the AutoTokenizer as suggested by Dirk Groeneveld. It looks like changing the SrlReader class to support RoBERTa based models involves way more changes like swapping BERTs wordpiece tokenizer to RoBERTa's BPE tokenizer. Is there an easy way to adapt the SrlReader class or is it better to write a new RobertaSrlReader from scratch?

                                                                                  I've inherited the SrlReader class and changed this line to the following:

                                                                                  self.bert_tokenizer = AutoTokenizer.from_pretrained(bert_model_name)
                                                                                  

                                                                                  It produces the following error since RoBERTa tokenization differs from BERT:

                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp_models\structured_prediction\dataset_readers\srl.py", line 255, in text_to_instance
                                                                                      wordpieces, offsets, start_offsets = self._wordpiece_tokenize_input(
                                                                                    File "C:\Users\denbe\AppData\Roaming\Python\Python39\site-packages\allennlp_models\structured_prediction\dataset_readers\srl.py", line 196, in _wordpiece_tokenize_input
                                                                                      word_pieces = self.bert_tokenizer.wordpiece_tokenizer.tokenize(token)
                                                                                  AttributeError: 'RobertaTokenizerFast' object has no attribute 'wordpiece_tokenizer'
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-24 at 02:14

                                                                                  The easiest way to resolve this is to patch SrlReader so that it uses PretrainedTransformerTokenizer (from AllenNLP) or AutoTokenizer (from Huggingface) instead of BertTokenizer. SrlReader is an old class, and was written against an old version of the Huggingface tokenizer API, so it's not so easy to upgrade.

                                                                                  If you want to submit a pull request in the AllenNLP project, I'd be happy to help you get it merged into AllenNLP!

                                                                                  Source https://stackoverflow.com/questions/71223907

                                                                                  QUESTION

                                                                                  How to interpret Allen NLP Coreference resolution model output?
                                                                                  Asked 2022-Feb-10 at 16:15

                                                                                  I am working on extracting people and tasks from texts (multiple sentences) and need a way to resolve coreferencing. I found this model, and it seems very promising, but once I installed the required libraries allennlp and allennlp_models and testing the model out for myself I got:

                                                                                  Script:

                                                                                  predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz")
                                                                                  prediction = predictor.predict(
                                                                                      document="Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen. Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.")
                                                                                  print(prediction)
                                                                                  

                                                                                  Output:

                                                                                  {'top_spans': [[0, 1], [3, 3], [5, 8], [5, 14], [8, 8], [11, 13], [11, 14], [13, 13], [16, 18], [16, 22], [20, 22], [24, 24], [26, 52], [33, 33], [36, 36], [37, 37], [38, 52], [41, 42], [47, 47], [48, 48], [49, 52]], 
                                                                                   'antecedent_indices': [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]], 
                                                                                   'predicted_antecedents': [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, -1, 5, 11, -1, -1, -1, 11, -1, -1], 
                                                                                   'document': ['Paul', 'Allen', 'was', 'born', 'on', 'January', '21', ',', '1953', ',', 'in', 'Seattle', ',', 'Washington', ',', 'to', 'Kenneth', 'Sam', 'Allen', 'and', 'Edna', 'Faye', 'Allen', '.', 'Allen', 'attended', 'Lakeside', 'School', ',', 'a', 'private', 'school', 'in', 'Seattle', ',', 'where', 'he', 'befriended', 'Bill', 'Gates', ',', 'two', 'years', 'younger', ',', 'with', 'whom', 'he', 'shared', 'an', 'enthusiasm', 'for', 'computers', '.'], 
                                                                                   'clusters': [[[0, 1], [24, 24], [36, 36], [47, 47]], [[11, 13], [33, 33]]]}
                                                                                  
                                                                                  

                                                                                  I'm having trouble interpreting the format of this output. I was expecting something like

                                                                                  {entity_0_spans: [LIST_OF_INDEX_TUPLES],  # Paul Allen in this example
                                                                                   entity_1_spans: [LIST_OF_INDEX_TUPLES],  # Seattle in this example
                                                                                   ...}
                                                                                  

                                                                                  or something that more closely resembles the visualisation available on the demo page:

                                                                                  I've looked through https://demo.allennlp.org/coreference-resolution but couldn't find a breakdown of how to use the model output yet - can anyone suggest some resources that will help me? Any pointers are much appreciated!

                                                                                  ANSWER

                                                                                  Answered 2022-Feb-10 at 16:15

                                                                                  The information you are looking for is in 'clusters', where each list corresponds to an entity. Within each entity list, you will find the mentions referring to the same entity. The number are indices that mark the beginning and ending of each coreferential mention. E.g. Paul Allen [0,1] and Allen [24, 24].

                                                                                  Source https://stackoverflow.com/questions/70786812

                                                                                  QUESTION

                                                                                  Google mT5-small configuration error because number attention heads is not divider of model dimension
                                                                                  Asked 2022-Jan-20 at 09:48

                                                                                  The configuration file for the HuggingFace google/mt5-small Model (https://huggingface.co/google/mt5-small)

                                                                                  defines

                                                                                  {
                                                                                  ...
                                                                                    "d_model": 512,
                                                                                  ...
                                                                                    "num_heads": 6,
                                                                                  ...
                                                                                  }
                                                                                  

                                                                                  Link to the config file: https://huggingface.co/google/mt5-small/resolve/main/config.json

                                                                                  Question:

                                                                                  As far as I understood, the number of attention-head should be a divider of the model dimension. This is clearly not true in this config file.

                                                                                  Do I misunderstand how self-attention is applied in mT5?

                                                                                  When I use the AllenNLP model (https://github.com/allenai/allennlp-models/blob/main/allennlp_models/generation/models/t5.py) as sequence-to-sequence model, I receive an error message

                                                                                  Summary:

                                                                                  allennlp.common.checks.ConfigurationError: The hidden size (512) is not a multiple of the number of attention heads (6)
                                                                                  

                                                                                  Full

                                                                                  Traceback (most recent call last):
                                                                                    File "/snap/pycharm-professional/269/plugins/python/helpers/pydev/pydevd.py", line 1500, in _exec
                                                                                      runpy._run_module_as_main(module_name, alter_argv=False)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
                                                                                      return _run_code(code, main_globals, None,
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/runpy.py", line 87, in _run_code
                                                                                      exec(code, run_globals)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/__main__.py", line 50, in 
                                                                                      run()
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/__main__.py", line 46, in run
                                                                                      main(prog="allennlp")
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/__init__.py", line 123, in main
                                                                                      args.func(args)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 112, in train_model_from_args
                                                                                      train_model_from_file(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 178, in train_model_from_file
                                                                                      return train_model(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 254, in train_model
                                                                                      model = _train_worker(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 490, in _train_worker
                                                                                      train_loop = TrainModel.from_params(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 652, in from_params
                                                                                      return retyped_subclass.from_params(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 686, in from_params
                                                                                      return constructor_to_call(**kwargs)  # type: ignore
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/commands/train.py", line 766, in from_partial_objects
                                                                                      model_ = model.construct(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 82, in construct
                                                                                      return self.constructor(**contructor_kwargs)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 66, in constructor_to_use
                                                                                      return self._constructor.from_params(  # type: ignore[union-attr]
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 652, in from_params
                                                                                      return retyped_subclass.from_params(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 686, in from_params
                                                                                      return constructor_to_call(**kwargs)  # type: ignore
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp_models/generation/models/t5.py", line 32, in __init__
                                                                                      self.t5 = T5Module.from_pretrained_module(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/transformer_module.py", line 251, in from_pretrained_module
                                                                                      model = cls._from_config(config, **kwargs)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/t5.py", line 852, in _from_config
                                                                                      return cls(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/t5.py", line 783, in __init__
                                                                                      self.encoder: T5EncoderStack = encoder.construct(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 82, in construct
                                                                                      return self.constructor(**contructor_kwargs)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/t5.py", line 600, in basic_encoder
                                                                                      self_attention=block_self_attention.construct(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 82, in construct
                                                                                      return self.constructor(**contructor_kwargs)
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/lazy.py", line 66, in constructor_to_use
                                                                                      return self._constructor.from_params(  # type: ignore[union-attr]
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/common/from_params.py", line 686, in from_params
                                                                                      return constructor_to_call(**kwargs)  # type: ignore
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/attention_module.py", line 471, in __init__
                                                                                      super().__init__(
                                                                                    File "/home/lars/anaconda3/envs/mare2/lib/python3.9/site-packages/allennlp/modules/transformer/attention_module.py", line 91, in __init__
                                                                                      raise ConfigurationError(
                                                                                  allennlp.common.checks.ConfigurationError: The hidden size (512) is not a multiple of the number of attention heads (6)
                                                                                  

                                                                                  ANSWER

                                                                                  Answered 2022-Jan-20 at 09:48

                                                                                  This is a very good question, and shows a common misconception about Transformers, stemming from an (unfortunate) formulation in the original Transformers paper. In particular, the authors write the following in Section 3.2.2:

                                                                                  In this work, we employ h = 8 parallel attention layers, or heads. For each of these we use d_k = d_v = d_(model) / h = 64. [...]

                                                                                  Note that the equality of d_k/d_v = d_(model) is not strictly necessary; it is only important that you do match the final hidden representation (d_(model)) after the Feed-Forward portion of each layer. Specifically for mt5-small, the authors actually use an internal dimension of 384 which is simply the product of parameters d_kv * num_heads = 64 * 6.

                                                                                  Now, the problem is that many libraries make a similar assumption of the enforced relation between d_kv and d_(model), because it saves some implementation effort that most people won't use anyways. I suspect (not super familiar with AllenNLP) that they have made similar assumptions here, which is why you cannot load the model.

                                                                                  Also, to clarify this, here is a peek at the modules of a loaded mt5-small:

                                                                                  T5Block(
                                                                                      (layer): ModuleList(
                                                                                          (0): T5LayerSelfAttention(
                                                                                          (SelfAttention): T5Attention(
                                                                                              (q): Linear(in_features=512, out_features=384, bias=False)
                                                                                              (k): Linear(in_features=512, out_features=384, bias=False)
                                                                                              (v): Linear(in_features=512, out_features=384, bias=False)
                                                                                              (o): Linear(in_features=384, out_features=512, bias=False)
                                                                                          )
                                                                                          (layer_norm): T5LayerNorm()
                                                                                          (dropout): Dropout(p=0.1, inplace=False)
                                                                                          )
                                                                                          (1): T5LayerFF(
                                                                                          (DenseReluDense): T5DenseGatedGeluDense(
                                                                                              (wi_0): Linear(in_features=512, out_features=1024, bias=False)
                                                                                              (wi_1): Linear(in_features=512, out_features=1024, bias=False)
                                                                                              (wo): Linear(in_features=1024, out_features=512, bias=False)
                                                                                              (dropout): Dropout(p=0.1, inplace=False)
                                                                                          )
                                                                                          (layer_norm): T5LayerNorm()
                                                                                          (dropout): Dropout(p=0.1, inplace=False)
                                                                                          )
                                                                                      )
                                                                                  )
                                                                                  
                                                                                  

                                                                                  You can get the full model layout by simply calling list(model.modules())

                                                                                  Source https://stackoverflow.com/questions/70769151

                                                                                  QUESTION

                                                                                  Writing custom metrics in allennlp
                                                                                  Asked 2021-Dec-10 at 02:12

                                                                                  I'm writing down my first allennlp project to detect specific spans in newspaper articles. I was able to have it train on my dataset. The loss computed with cross entropy seems to decrease correctly, but I'm having some issues with my metric. I wrote a custom metric which is supposed to give an estimate of how accurate my model predicts spans according to some ground truth spans. The problem is that right now, our metric doesn't seem to update correctly even though the loss is decreasing.

                                                                                  I'm not sure how to tackle the problem and guess my questions are the following:

                                                                                  1. What is the exact use of the reset() function in the Metric class ?
                                                                                  2. Apart from writing the __call__(), get_metric() and reset() function, are there other things to watch out for?

                                                                                  Below is a snapshot of my custom Metric class in case you need it.

                                                                                  class SpanIdenficationMetric(Metric):
                                                                                      def __init__(self) -> None:
                                                                                          self._s_cardinality = 0 # S: model predicted spans
                                                                                          self._t_cardinality = 0 # T: article gold spans
                                                                                          self._s_sum = 0
                                                                                          self._t_sum = 0
                                                                                          
                                                                                      def reset(self) -> None:
                                                                                          self._s_cardinality = 0
                                                                                          self._t_cardinality = 0
                                                                                          self._s_sum = 0
                                                                                          self._t_sum = 0
                                                                                              
                                                                                      def __call__(self, prop_spans: torch.Tensor, gold_spans: torch.Tensor, mask: Optional[torch.BoolTensor] = None):
                                                                                          for i, article_spans in enumerate(prop_spans):
                                                                                              if article_spans.numel() == 0:
                                                                                                  continue
                                                                                              article_gold_spans = gold_spans[i]
                                                                                              merged_prop_spans = self._merge_intervals(article_spans)
                                                                                              self._s_cardinality += merged_prop_spans.size(dim=0)
                                                                                              self._t_cardinality += article_gold_spans.size(dim=0)
                                                                                              for combination in itertools.product(merged_prop_spans, article_gold_spans):
                                                                                                  sspan = combination[0]
                                                                                                  tspan = combination[1]
                                                                                                  self._s_sum += self._c_function(sspan, tspan, sspan[1].item() - sspan[0].item() + 1)
                                                                                                  self._t_sum += self._c_function(sspan, tspan, tspan[1].item() - tspan[0].item() + 1)
                                                                                  
                                                                                      def get_metric(self, reset: bool = False):
                                                                                          precision = 0
                                                                                          recall = 0
                                                                                          if self._s_cardinality != 0:
                                                                                              precision = self._s_sum / self._s_cardinality
                                                                                          if self._t_cardinality != 0:
                                                                                              recall = self._t_sum / self._t_cardinality
                                                                                          if reset:
                                                                                              self.reset()
                                                                                          return { "si-metric" : (2 * precision * recall) / (precision + recall) if precision + recall > 0 else 0 }
                                                                                  
                                                                                  def _c_function(self, s, t, h): {}
                                                                                  def _intersect(self, s, t): {}
                                                                                  def _merge_intervals(self, prop_spans): {}
                                                                                  

                                                                                  Thank you in advance. Cheers.

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-10 at 02:12

                                                                                  During training, the trainer will call the metric (using Metric.__call__()) with the results from every batch. The metric is supposed to update its internal state when this happens. The trainer expects to get the current value(s) of the metric when it calls Metric.get_metric(). Metric.reset() has to reset the metric into a state as if it had never been called before. When get_metric() gets called with reset = True, it's expected to reset the metric as well.

                                                                                  From what I can tell, your code does all these things correctly. Your code will not run correctly in a distributed setting, but if you are not training on multiple GPUs, that's not a problem.

                                                                                  What you're doing is similar to the SQuAD metric: https://github.com/allenai/allennlp-models/blob/main/allennlp_models/rc/metrics/squad_em_and_f1.py The SQuAD metric goes out of its way to call the original SQuAD evaluation code, so it's a little more complicated than what you would want, but maybe you can adapt it? The main difference would be that you are calculating F scores across the whole dataset, while SQuAD calculates them per-document, and then averages across documents.

                                                                                  Finally, you can write a simple test for your metric, similar to the SQuAD test: https://github.com/allenai/allennlp-models/blob/main/tests/rc/metrics/squad_em_and_f1_test.py That might help narrow down where the problem is.

                                                                                  Source https://stackoverflow.com/questions/70273872

                                                                                  QUESTION

                                                                                  Using multiprocessing with AllenNLP decoding is sluggish compared to non-multiprocessing case
                                                                                  Asked 2021-Dec-01 at 15:18

                                                                                  I'm using the AllenNLP (version 2.6) semantic role labeling model to process a large pile of sentences. My Python version is 3.7.9. I'm on MacOS 11.6.1. My goal is to use multiprocessing.Pool to parallelize the work, but the calls via the pool are taking longer than they do in the parent process, sometimes substantially so.

                                                                                  In the parent process, I have explicitly placed the model in shared memory as follows:

                                                                                  from allennlp.predictors import Predictor            
                                                                                  from allennlp.models.archival import load_archive
                                                                                  import allennlp_models.structured_prediction.predictors.srl
                                                                                  PREDICTOR_PATH = "......"
                                                                                  
                                                                                  archive = load_archive(PREDICTOR_PATH)
                                                                                  archive.model.share_memory()
                                                                                  PREDICTOR = Predictor.from_archive(archive)
                                                                                  

                                                                                  I know the model is only being loaded once, in the parent process. And I place the model in shared memory whether or not I'm going to make use of the pool. I'm using torch.multiprocessing, as many recommend, and I'm using the spawn start method.

                                                                                  I'm calling the predictor in the pool using Pool.apply_async, and I'm timing the calls within the child processes. I know that the pool is using the available CPUs (I have six cores), and I'm nowhere near running out of physical memory, so there's no reason for the child processes to be swapped to disk.

                                                                                  Here's what happens, for a batch of 395 sentences:

                                                                                  • Without multiprocessing: 638 total processing seconds (and elapsed time).
                                                                                  • With a 4-process pool: 293 seconds elapsed time, 915 total processing seconds.
                                                                                  • With a 12-process pool: 263 seconds elapsed time, 2024 total processing seconds.

                                                                                  The more processes, the worse the total AllenNLP processing time - even though the model is explicitly in shared memory, and the only thing that crosses the process boundary during the invocation is the input text and the output JSON.

                                                                                  I've done some profiling, and the first thing that leaps out at me is that the function torch._C._nn.linear is taking significantly longer in the multiprocessing cases. This function takes two tensors as arguments - but there are no tensors being passed across the process boundary, and I'm decoding, not training, so the model should be entirely read-only. It seems like it has to be a problem with locking or competition for the shared model resource, but I don't understand at all why that would be the case. And I'm not a torch programmer, so my understanding of what's happening is limited.

                                                                                  Any pointers or suggestions would be appreciated.

                                                                                  ANSWER

                                                                                  Answered 2021-Dec-01 at 15:18

                                                                                  Turns out that I wasn't comparing exactly the right things. This thread: https://github.com/allenai/allennlp/discussions/5471 goes into all the detail. Briefly, because pytorch can use additional resources under the hood, my baseline test without multiprocessing wasn't taxing my computer enough when running two instances in parallel; I had to run 4 instances to see the penalty, and in that case, the total processing time was essentially the same for 4 parallel nonmultiprocessing invocations, or one multiprocessing case with 4 subprocesses.

                                                                                  Source https://stackoverflow.com/questions/70008621

                                                                                  QUESTION

                                                                                  How to incorporate ELMo into the simple classification of AllenNLP Guide
                                                                                  Asked 2021-Nov-12 at 23:10

                                                                                  I am a beginner and not a native English speaker, so I may ask poor questions.Sorry!

                                                                                  I recently finished the official AllenNLP tutorial(https://guide.allennlp.org/training-and-prediction) and want to change the simple classifier's word embedding to ELMo.

                                                                                  Also, I want to make the architecture of the simple classifier more complex to increase its accuracy. I think I'm done with the implementation of the model.

                                                                                  simple_classifier.py

                                                                                  @Model.register("simple_classifier")
                                                                                  class SimpleClassifier(Model):
                                                                                      def __init__(
                                                                                          self, vocab: Vocabulary, embedder: TextFieldEmbedder, encoder: Seq2VecEncoder
                                                                                      ):
                                                                                          super().__init__(vocab)
                                                                                          self.embedder = embedder
                                                                                          self.encoder = encoder
                                                                                          num_labels = vocab.get_vocab_size("labels")
                                                                                          self.dropout = torch.nn.Dropout(p=0.2)
                                                                                          self.relu = torch.nn.ReLU()
                                                                                          self.layer1=torch.nn.Linear(encoder.get_output_dim(),512)
                                                                                          self.layer2 = torch.nn.Linear(512, 128)
                                                                                          self.layer3 = torch.nn.Linear(128, 50)
                                                                                          self.layer4 = torch.nn.Linear(50, 10)
                                                                                          self.classifier = torch.nn.Linear(10, num_labels)
                                                                                          self.accuracy = CategoricalAccuracy()
                                                                                  
                                                                                      def forward(
                                                                                          self, text: TextFieldTensors, label: torch.Tensor = None
                                                                                      ) -> Dict[str, torch.Tensor]:
                                                                                          # Shape: (batch_size, num_tokens, embedding_dim)
                                                                                          embedded_text = self.embedder(text)
                                                                                          # Shape: (batch_size, num_tokens)
                                                                                          mask = util.get_text_field_mask(text)
                                                                                          # Shape: (batch_size, encoding_dim)
                                                                                          encoded_text = self.encoder(embedded_text, mask)
                                                                                          x=self.relu(self.layer1(encoded_text))
                                                                                          x=self.relu(self.layer2(x))
                                                                                          x=self.relu(self.layer3(x))
                                                                                          x=self.relu(self.layer4(x))
                                                                                          # Shape: (batch_size, num_labels)
                                                                                          logits = self.classifier(x)
                                                                                          # Shape: (batch_size, num_labels)
                                                                                          probs = torch.nn.functional.softmax(logits)
                                                                                          # Shape: (1,)
                                                                                          output = {"probs": probs}
                                                                                          if label is not None:
                                                                                              self.accuracy(logits, label)
                                                                                              output["loss"] = torch.nn.functional.cross_entropy(logits, label)
                                                                                          return output
                                                                                  
                                                                                      def get_metrics(self, reset: bool = False) -> Dict[str, float]:
                                                                                          return {"accuracy": self.accuracy.get_metric(reset)}
                                                                                  

                                                                                  But I have no idea how to change the configuration file. How do I change the following configuration file in the official tutorial to use ELMo?

                                                                                  my_text_classifier.jsonnet

                                                                                  {
                                                                                      "dataset_reader" : {
                                                                                          "type": "classification-tsv",
                                                                                          "token_indexers": {
                                                                                              "tokens": {
                                                                                                  "type": "single_id"
                                                                                              }
                                                                                          }
                                                                                      },
                                                                                      "train_data_path": "data/movie_review/train.tsv",
                                                                                      "validation_data_path": "data/movie_review/dev.tsv",
                                                                                      "model": {
                                                                                          "type": "simple_classifier",
                                                                                          "embedder": {
                                                                                              "token_embedders": {
                                                                                                  "tokens": {
                                                                                                      "type": "embedding",
                                                                                                      "embedding_dim": 10
                                                                                                  }
                                                                                              }
                                                                                          },
                                                                                          "encoder": {
                                                                                              "type": "bag_of_embeddings",
                                                                                              "embedding_dim": 10
                                                                                          }
                                                                                      },
                                                                                      "data_loader": {
                                                                                          "batch_size": 8,
                                                                                          "shuffle": true
                                                                                      },
                                                                                      "trainer": {
                                                                                          "optimizer": "adam",
                                                                                          "num_epochs": 5
                                                                                      }
                                                                                  }
                                                                                  

                                                                                  I'm very happy if someone could help me.

                                                                                  ANSWER

                                                                                  Answered 2021-Nov-12 at 23:10

                                                                                  Check out the way the BiDAF model uses ELMo: https://raw.githubusercontent.com/allenai/allennlp-models/main/training_config/rc/bidaf_elmo.jsonnet

                                                                                  You can steal some of the components of that config. You will need the token embedder under the name "elmo", and, I believe, both the token indexers under "tokens" and "elmo".

                                                                                  It should work without having to write any code.

                                                                                  Source https://stackoverflow.com/questions/69879808

                                                                                  Community Discussions, Code Snippets contain sources that include Stack Exchange Network

                                                                                  Vulnerabilities

                                                                                  No vulnerabilities reported

                                                                                  Install allennlp

                                                                                  If you're interested in using AllenNLP for model development, we recommend you check out the AllenNLP Guide for a thorough introduction to the library, followed by our more advanced guides on GitHub Discussions.
                                                                                  If you want to use allennlp train and config files to specify experiments, use this template. We recommend this approach.
                                                                                  If you'd prefer to use python code to configure your experiments and run your training loop, use this template. There are a few things that are currently a little harder in this setup (loading a saved model, and using distributed training), but otherwise it's functionality equivalent to the config files setup.
                                                                                  Hyperparameter optimization for AllenNLP using Optuna
                                                                                  Training with multiple GPUs in AllenNLP
                                                                                  Training on larger batches with less memory in AllenNLP
                                                                                  How to upload transformer weights and tokenizers from AllenNLP to HuggingFace
                                                                                  AllenNLP requires Python 3.6.1 or later and PyTorch. We support AllenNLP on Mac and Linux environments. We presently do not support Windows but are open to contributions.

                                                                                  Support

                                                                                  ↗️ Website🔦 Guide🖼 Gallery💻 Demo📓 Documentation ( latest | stable | commit )⬆️ Upgrade Guide from 1.x to 2.0❓ Stack Overflow✋ Contributing Guidelines🤖 Officially Supported Models Pretrained Models Documentation ( latest | stable | commit )⚙️ Continuous Build🌙 Nightly Releases
                                                                                  Find more information at:
                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit
                                                                                  Install
                                                                                • PyPI

                                                                                  pip install allennlp

                                                                                • CLONE
                                                                                • HTTPS

                                                                                  https://github.com/allenai/allennlp.git

                                                                                • CLI

                                                                                  gh repo clone allenai/allennlp

                                                                                • sshUrl

                                                                                  git@github.com:allenai/allennlp.git

                                                                                • Share this Page

                                                                                  share link

                                                                                  Reuse Pre-built Kits with allennlp

                                                                                  Consider Popular Natural Language Processing Libraries

                                                                                  transformers

                                                                                  by huggingface

                                                                                  funNLP

                                                                                  by fighting41love

                                                                                  bert

                                                                                  by google-research

                                                                                  jieba

                                                                                  by fxsjy

                                                                                  Python

                                                                                  by geekcomputers

                                                                                  Try Top Libraries by allenai

                                                                                  longformer

                                                                                  by allenaiPython

                                                                                  bilm-tf

                                                                                  by allenaiPython

                                                                                  bi-att-flow

                                                                                  by allenaiPython

                                                                                  scispacy

                                                                                  by allenaiPython

                                                                                  scibert

                                                                                  by allenaiPython

                                                                                  Compare Natural Language Processing Libraries with Highest Support

                                                                                  transformers

                                                                                  by huggingface

                                                                                  bert

                                                                                  by google-research

                                                                                  allennlp

                                                                                  by allenai

                                                                                  flair

                                                                                  by flairNLP

                                                                                  spaCy

                                                                                  by explosion

                                                                                  Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
                                                                                  Find more libraries
                                                                                  Explore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits​
                                                                                  Save this library and start creating your kit