seq2seq | first attempt on a simple project | Translation library
kandi X-RAY | seq2seq Summary
kandi X-RAY | seq2seq Summary
This is my first attempt on a simple project to create a language translation model using Sequence To Sequence Learning Approach.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Loads data from source files .
- Create the model .
- Load test data .
- Processes a list of sentences .
- Find the most recently modified checkpoint file .
seq2seq Key Features
seq2seq Examples and Code Snippets
def _create_loss(self):
print('Creating loss... \nIt might take a couple of minutes depending on how many buckets you have.')
start = time.time()
def _seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
setattr(t
Community Discussions
Trending Discussions on seq2seq
QUESTION
I am trying to follow this guide to implement a seq2seq machine tranlsation model: https://www.tensorflow.org/tutorials/text/nmt_with_attention
The tutorial's Encoder
has an initialize_hidden_state()
function that is used to generate all 0 as initial state for the encoder. However I am a bit confused as to why this is neccessary. As far as I can tell, the only times when encoder
is called (in train_step and evaluate), they were initialized with the initialize_hidden_state()
function. My questions are 1.) what is the purpose of this initial state? Doesn't Keras layer automatically initialize LSTM states to begin with? And 2.) why not always just initialize the encoder
with all 0 hidden states if encoder is always called with initial states generated by initialize_hidden_state()
?
ANSWER
Answered 2021-May-16 at 18:34you are totally right. The code in the example is a little misleading. The LSTM cells are automatically initialized with zeros. You can just delete the initialize_hidden_state()
function.
QUESTION
Is there a parameter that I can set in the config file (maybe for the trainer?) that would save the model (archive) after each epoch or after a specific number of steps? I'm using seq2seq dataloader and "composed_seq2seq" as my model. This is how my trainer looks like currently:
...ANSWER
Answered 2021-May-06 at 23:03Can you explain a little more about what you're trying to do with a model from every epoch/some number of steps? I think it already archives the model every time it gets a new best score, so I'm wondering what you want to do that can't be accomplished with that.
Edit:
It looks like AllenNLP already saves a model every epoch, but it only keeps a maximum of 2 by default. I believe you can change that by adding a checkpointer
to your training config, e.g.:
QUESTION
I would like to use bert for tokenization and also indexing for a seq2seq model and this is how my config file looks like so far:
...ANSWER
Answered 2021-Apr-29 at 17:28- Please set
add_special_tokens = False
. - Use
tokenizer.convert_tokens_to_string
(which takes the list of subword tokens as input), wheretokenizer
refers to the tokenizer used by your DatasetReader.
Please let us know if you have further questions!
QUESTION
Like the title says, I require a Seq2SeqTrainer for my project, but the file/s on Github are not available and return a 404. I use this code to try and import it:
...ANSWER
Answered 2021-Apr-24 at 22:57I eventually found a solution. The file can be found at: https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq/seq2seq_trainer.py
For some reason when importing the file Python picks up a commented link and throws an error. To get around this simply make a copy of the file without the comments at the top. That worked for me.
EDIT: I found a neater solution:
QUESTION
For example, if I have a tensor A = [[1,1,1], [2,2,2], [3,3,3]], and B = [1,2,3]. How do I get C = [[1,1,1], [2,2,2], [2,2,2], [3,3,3], [3,3,3], [3,3,3]], and doing this batch-wise?
My current element-wise solution btw (takes forever...):
...ANSWER
Answered 2021-Apr-20 at 21:28Use this:
QUESTION
Downloaded T5-small model from SparkNLP website, and using this code (almost entirely from the examples):
...ANSWER
Answered 2021-Apr-16 at 08:53The offline model of T5 - t5_base_en_2.7.1_2.4_1610133506835
- was trained on SparkNLP 2.7.1, and there was a breaking change in 2.7.2.
Solved by downloading and re-saving the new version with
QUESTION
I am trying to train a seq2seq model. I ran the example code in Colab:
...ANSWER
Answered 2021-Mar-13 at 23:54The problem is that you clone the master branch of the repository and try to run the run_seq2seq.py
script with a transformers version (4.3.3) that is behind that master branch.
run_seq2seq.py
was updated to import is_offline_mode
on the 6th of march with this merge.
All you need to do is to clone the branch that was used for your used transformers version:
QUESTION
In a machine translation seq2seq model (using RNN/GRU/LSTM) we provide sentence in a source language and train the model to map it to a sequence of words in another language (e.g., English to German).
The idea is, that the decoder part generates a classification vector (which has the size of target word vocabulary) and a softmax is applied on this vector followed by an argmax to get the index of the most probable word.
My question is: is there an upper limit to how large the target word vocabulary should be, considering:
- The performance remains reasonable (softmax will take more time for larger vectors)
- The accuracy/correctness of prediction is acceptable
ANSWER
Answered 2021-Feb-19 at 10:12The main technical limitation of the vocabulary size is the GPU memory. The word embeddings and the output projection are the biggest parameters in the model. With a too large vocabulary, you would be forced to use small training batches which would significantly slow down the training.
Also, it is not necessarily so that the bigger the vocabulary, the better the performance. Words in a natural language are distributed according to Zipf's law, which means that the frequency of words decreases exponentially with the frequency rank. With the increasing vocabulary size, you add words that are less and less common in the language. The word embeddings get updated only when the word occurs in the training data. With a very large vocabulary, the embeddings of less frequent words end up undertrained and the model cannot handle them properly anyway.
MT models typically used a vocabulary of 30k-50k tokens. These are however not words, but so-called sub-words. The text gets segmented using a statistical heuristic, such that most of the common words remain as they are and less frequent words get split into subwords, ultimately into single characters.
QUESTION
Can we use Seq2Seq model with input data that has no temporal relation ( not a time series )? For example I have a list of image regions that I would like to feed my seq2seq model. And the the model should predict an description ( output is time series |) or captions.
I’m not asking from the technical perspective, I know that if the data is in the correct format then I can do that. My question is rather theoretical, is it ok to use Seq2Seq with none time series data? And are there any papers/articles/references of using Seq2Seq in this setting ?
...ANSWER
Answered 2021-Feb-11 at 19:00No, it just has to be a sequence like requirement.
Klaus Greff, et al., LSTM: A Search Space Odyssey, 2015 : Since LSTMs are effective at capturing long-term temporal dependencies without suffering from the optimization hurdles that plague simple recurrent networks (SRNs), they have been used to advance the state of the art for many difficult problems. This includes handwriting recognition and generation, language modeling and translation, acoustic modeling of speech, speech synthesis, protein secondary structure prediction, analysis of audio, and video data among others.
Felix A. Gers, et al., Learning to Forget: Continual Prediction with LSTM, 2000 : LSTM holds promise for any sequential processing task in which we suspect that a hierarchical decomposition may exist, but do not know in advance what this decomposition is.
QUESTION
I know this question gets asked a lot but in my case it's a bit wierd. I just got a RTX 3080 and tried to install Tensorflow based on a tutorial I found on reddit. I did everything as described there: Install Anaconda --> Python 3.8 --> TF-nightly v. 2.5.0 --> Visual Studio C++ --> Cuda 11.1.0 --> cuDNN 8.0.4 --> add path --> restart pc. Everything seems to work at first. I tried following command:
...ANSWER
Answered 2021-Feb-03 at 08:21You can upgrade the Tensorflow
to latest stable version since Tensorflow 2.4
version supports new Nvidia's
Ampere
architecture which is of RTX 30
series and CUDA 11
support also is available.
You can check in this chart for details and follow the guide to install the same.
https://www.tensorflow.org/install/source_windows#tested_build_configurations
Regarding the memory usage on GPU, you can always set the memory growth at the start of your code like mentioned here.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install seq2seq
You can use seq2seq like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page