Opus-MT | Open neural machine translation models and web services | Translation library
kandi X-RAY | Opus-MT Summary
kandi X-RAY | Opus-MT Summary
Tools and resources for open translation services.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Handle a message
- Segment a list of tokens
- Process a line
- Segment a sentence
- Creates argument parser
- Create a language element
- Create an etree element
- Post a translation
- Process line
- Create a WSGI application
- Setup logger
- Read a vocabulary file
- Process translation
- Get git user info
- Process a message
- Translate text
- Fill keyboard buttons
Opus-MT Key Features
Opus-MT Examples and Code Snippets
Community Discussions
Trending Discussions on Opus-MT
QUESTION
I'm currently comparing various pre-trained NMT models and can't help but wonder what the difference between MarianMT and OpusMT is. According to OpusMT's Github it is based on MarianMT. However in the Huggingface transformers implementation all pretrained MarianMT models start with "Helsinki-NLP/opus-mt". So I thought it was the same, but even though they're roughly the same size, they yield different translation results.
If someone could please shed some light on what the differences are I would be very thankful.
...ANSWER
Answered 2021-Dec-18 at 14:43Marian is an open-source tool for training and serving neural machine translation, mostly developed at the University of Edinburgh, Adam Mickiewicz University in Poznań and at Microsoft. It is implemented in C++ and is heavily optimized for MT, unlike PyTorch-based Huggingface Transformers that aim for generality rather than efficiency in a specific use case.
The NLP group at the University of Helsinki trained many translation models using Marian on parallel data collected at Opus, and open-sourced those models. Later, they also did a conversion of the trained model into Huggingface Transformers and made them available via the Huggingface Hub.
MarianMT is a class in Huggingface Transformers for imported Marian models. You can train a model in Marian and convert it yourself. OpusMT models are Marian models trained on the Opus data in Helsinki converted to the PyTorch models. If you search the Huggingface Hub for Marian, you will find other MarianMT models than those from Helsinki.
QUESTION
I searched a lot for this but havent still got a clear idea so I hope you can help me out:
I am trying to translate german texts to english! I udes this code:
...ANSWER
Answered 2021-Aug-17 at 13:27I think one possible answer to your dilemma is provided in this question: https://stackoverflow.com/questions/61523829/how-can-i-use-bert-fo-machine-translation#:~:text=BERT%20is%20not%20a%20machine%20translation%20model%2C%20BERT,there%20are%20doubts%20if%20it%20really%20pays%20off.
Practically with the output of BERT, you get a vectorized representation for each of your words. In essence, it is easier to use the output for other tasks, but trickier in the case of Machine Translation.
A good starting point of using a seq2seq
model from the transformers library in the context of machine translation is the following: https://github.com/huggingface/notebooks/blob/master/examples/translation.ipynb.
The example above provides how to translate from English to Romanian.
QUESTION
I am getting desperate as I have no clue what is the problem over here. I want to translate a list of sentences from german to english. This is my code:
...ANSWER
Answered 2021-Aug-17 at 09:18In the problem described here (credits to
LysandreJik): https://github.com/huggingface/transformers/issues/5480, the problem appears to be the data type of a dict
instead of tensor
.
It might be the case that you need to change the tokenizer output from:
QUESTION
I am trying to use Huggingface to transform stuff from English to Hindi. This is the code snippet
...ANSWER
Answered 2021-Mar-14 at 16:21The model requires pytorch tensors and not a python list. Simply add return_tensors='pt'
to prepare_seq2seq:
QUESTION
Currently Helsinki-NLP/opus-mt-es-en model takes around 1.5sec for inference from transformer. How can that be reduced? Also when trying to convert it to onxx runtime getting this error:
ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModel. Model type should be one of RetriBertConfig, MT5Config, T5Config, DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, LayoutLMConfig, SqueezeBertConfig, BertConfig, OpenAIGPTConfig, GPT2Config, MobileBertConfig, TransfoXLConfig, XLNetConfig, FlaubertConfig, FSMTConfig, XLMConfig, CTRLConfig, ElectraConfig, ReformerConfig, FunnelConfig, LxmertConfig, BertGenerationConfig, DebertaConfig, DPRConfig, XLMProphetNetConfig, ProphetNetConfig, MPNetConfig, TapasConfig.
Is it possible to convert this to onxx runtime?
...ANSWER
Answered 2021-Jan-13 at 10:10The OPUS models are originally trained with Marian which is a highly optimized toolkit for machine translation written fully in C++. Unlike PyTorch, it does have the ambition to be a general deep learning toolkit, so it can focus on MT efficiency. The Marian configurations and instructions on how to download the models are at https://github.com/Helsinki-NLP/OPUS-MT.
The OPUS-MT models for Huggingface's Transformers are converted from the original Marian models are meant more for prototyping and analyzing the models rather than for using them for translation in a production-like setup.
Running the models in Marian will certainly much faster than in Python and it is certainly much easier than hacking Transformers to run with onxx runtime. Marian also offers further tricks to speed up the translation, e.g., by model quantization, which is however at the expense of the translation quality.
With both Marian and Tranformers, you can speed things up if you use GPU or if you narrow the beam width during decoding (attribute num_beams
in the generate
method in Transformers).
QUESTION
I am using a pretrained MarianMT machine translation model from English to German. I also have a large set of high quality English-to-German sentence pairs that I would like to use to enhance the performance of the model, which is trained on the OPUS corpus, but without making the model forget the OPUS training data. Is there a way to do that? Thanks.
...ANSWER
Answered 2020-Sep-07 at 12:37Have you tried the finetune.sh script shown here? In addition to the short list of CLI flags listed there, you could try adding:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Opus-MT
There is another option of setting up translation services using WebSockets and Linux services. Detailed information is available from doc/WebSocketServer.md.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page