Opus-MT | Open neural machine translation models and web services | Translation library

by Helsinki-NLP Python Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(6)Vulnerabilities Install Support

kandi X-RAY | Opus-MT Summary

Opus-MT is a Python library typically used in Utilities, Translation applications. Opus-MT has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. However Opus-MT has 1 bugs. You can download it from GitHub.

Tools and resources for open translation services.

Support

Quality

Security

License

Reuse

Support

Opus-MT has a low active ecosystem.

It has 341 star(s) with 51 fork(s). There are 15 watchers for this library.

It had no major release in the last 6 months.

There are 38 open issues and 23 have been closed. On average issues are closed in 70 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of Opus-MT is current.

Quality

Opus-MT has 1 bugs (0 blocker, 0 critical, 1 major, 0 minor) and 89 code smells.

Security

Opus-MT has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

Opus-MT code analysis shows 0 unresolved vulnerabilities.

There are 23 security hotspots that need review.

License

Opus-MT is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

Opus-MT releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

It has 1825 lines of code, 80 functions and 23 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed Opus-MT and discovered the below as its top functions. This is intended to give you an instant insight into Opus-MT implemented functionality, and help decide if they suit your requirements.

Handle a message
Segment a list of tokens
Process a line
Segment a sentence
Creates argument parser
Create a language element
Create an etree element
Post a translation
Process line
Create a WSGI application
Setup logger
Read a vocabulary file
Process translation
Get git user info
Process a message
Translate text
Fill keyboard buttons

Get all kandi verified functions for this library.

Opus-MT Key Features

No Key Features are available at this moment for Opus-MT.

Opus-MT Examples and Code Snippets

No Code Snippets are available at this moment for Opus-MT.

Community Discussions

Trending Discussions on Opus-MT

What is the difference between MarianMT and OpusMT?

Bert model output interpretation

Bert Transformer "Size Error" while Machine Traslation

AttributeError: 'list' object has no attribute 'size' Hugging-Face transformers

How to reduce the inference time of Helsinki-NLP/opus-mt-es-en (translation model) from transformer

Enhance a MarianMT pretrained model from HuggingFace with more training data

QUESTION

What is the difference between MarianMT and OpusMT?

Asked 2021-Dec-18 at 14:43

I'm currently comparing various pre-trained NMT models and can't help but wonder what the difference between MarianMT and OpusMT is. According to OpusMT's Github it is based on MarianMT. However in the Huggingface transformers implementation all pretrained MarianMT models start with "Helsinki-NLP/opus-mt". So I thought it was the same, but even though they're roughly the same size, they yield different translation results.

If someone could please shed some light on what the differences are I would be very thankful.

...

ANSWER

Answered 2021-Dec-18 at 14:43

Marian is an open-source tool for training and serving neural machine translation, mostly developed at the University of Edinburgh, Adam Mickiewicz University in Poznań and at Microsoft. It is implemented in C++ and is heavily optimized for MT, unlike PyTorch-based Huggingface Transformers that aim for generality rather than efficiency in a specific use case.

The NLP group at the University of Helsinki trained many translation models using Marian on parallel data collected at Opus, and open-sourced those models. Later, they also did a conversion of the trained model into Huggingface Transformers and made them available via the Huggingface Hub.

MarianMT is a class in Huggingface Transformers for imported Marian models. You can train a model in Marian and convert it yourself. OpusMT models are Marian models trained on the Opus data in Helsinki converted to the PyTorch models. If you search the Huggingface Hub for Marian, you will find other MarianMT models than those from Helsinki.

Source https://stackoverflow.com/questions/70367816

QUESTION

Bert model output interpretation

Asked 2021-Aug-17 at 16:04

I searched a lot for this but havent still got a clear idea so I hope you can help me out:

I am trying to translate german texts to english! I udes this code:

...

ANSWER

Answered 2021-Aug-17 at 13:27

I think one possible answer to your dilemma is provided in this question: https://stackoverflow.com/questions/61523829/how-can-i-use-bert-fo-machine-translation#:~:text=BERT%20is%20not%20a%20machine%20translation%20model%2C%20BERT,there%20are%20doubts%20if%20it%20really%20pays%20off.

Practically with the output of BERT, you get a vectorized representation for each of your words. In essence, it is easier to use the output for other tasks, but trickier in the case of Machine Translation.

A good starting point of using a seq2seq model from the transformers library in the context of machine translation is the following: https://github.com/huggingface/notebooks/blob/master/examples/translation.ipynb.

The example above provides how to translate from English to Romanian.

Source https://stackoverflow.com/questions/68817989

QUESTION

Bert Transformer "Size Error" while Machine Traslation

Asked 2021-Aug-17 at 09:18

I am getting desperate as I have no clue what is the problem over here. I want to translate a list of sentences from german to english. This is my code:

...

ANSWER

Answered 2021-Aug-17 at 09:18

In the problem described here (credits to LysandreJik): https://github.com/huggingface/transformers/issues/5480, the problem appears to be the data type of a dict instead of tensor.

It might be the case that you need to change the tokenizer output from:

Source https://stackoverflow.com/questions/68813979

QUESTION

AttributeError: 'list' object has no attribute 'size' Hugging-Face transformers

Asked 2021-Mar-14 at 16:21

I am trying to use Huggingface to transform stuff from English to Hindi. This is the code snippet

...

ANSWER

Answered 2021-Mar-14 at 16:21

The model requires pytorch tensors and not a python list. Simply add return_tensors='pt' to prepare_seq2seq:

Source https://stackoverflow.com/questions/66625389

QUESTION

How to reduce the inference time of Helsinki-NLP/opus-mt-es-en (translation model) from transformer

Asked 2021-Jan-13 at 10:10

Currently Helsinki-NLP/opus-mt-es-en model takes around 1.5sec for inference from transformer. How can that be reduced? Also when trying to convert it to onxx runtime getting this error:

ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModel. Model type should be one of RetriBertConfig, MT5Config, T5Config, DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, LayoutLMConfig, SqueezeBertConfig, BertConfig, OpenAIGPTConfig, GPT2Config, MobileBertConfig, TransfoXLConfig, XLNetConfig, FlaubertConfig, FSMTConfig, XLMConfig, CTRLConfig, ElectraConfig, ReformerConfig, FunnelConfig, LxmertConfig, BertGenerationConfig, DebertaConfig, DPRConfig, XLMProphetNetConfig, ProphetNetConfig, MPNetConfig, TapasConfig.

Is it possible to convert this to onxx runtime?

...

ANSWER

Answered 2021-Jan-13 at 10:10

The OPUS models are originally trained with Marian which is a highly optimized toolkit for machine translation written fully in C++. Unlike PyTorch, it does have the ambition to be a general deep learning toolkit, so it can focus on MT efficiency. The Marian configurations and instructions on how to download the models are at https://github.com/Helsinki-NLP/OPUS-MT.

The OPUS-MT models for Huggingface's Transformers are converted from the original Marian models are meant more for prototyping and analyzing the models rather than for using them for translation in a production-like setup.

Running the models in Marian will certainly much faster than in Python and it is certainly much easier than hacking Transformers to run with onxx runtime. Marian also offers further tricks to speed up the translation, e.g., by model quantization, which is however at the expense of the translation quality.

With both Marian and Tranformers, you can speed things up if you use GPU or if you narrow the beam width during decoding (attribute num_beams in the generate method in Transformers).

Source https://stackoverflow.com/questions/65541788

QUESTION

Enhance a MarianMT pretrained model from HuggingFace with more training data

Asked 2020-Sep-07 at 12:37

I am using a pretrained MarianMT machine translation model from English to German. I also have a large set of high quality English-to-German sentence pairs that I would like to use to enhance the performance of the model, which is trained on the OPUS corpus, but without making the model forget the OPUS training data. Is there a way to do that? Thanks.

...

ANSWER

Answered 2020-Sep-07 at 12:37

Have you tried the finetune.sh script shown here? In addition to the short list of CLI flags listed there, you could try adding:

Source https://stackoverflow.com/questions/63774619

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install Opus-MT

Download the latest version from github:.
There is another option of setting up translation services using WebSockets and Linux services. Detailed information is available from doc/WebSocketServer.md.

Support

OPUS-translator: implementation of a simple on-line translation interfaceOPUS-CAT: an implementation of an NMT plugin for Trados Studio that can run OPUS-MT modelsfiskmö: a project on the devlopment of resources and tools for translating between Finnish and SwedishThe Tatoeba MT Challenge with lots of pre-trained NMT modelsThe NMT map that plots the status of Tatoeba NMT models on a mappre-trained multilingual models trained on OPUS-100 using the zero toolkit

Find more information at: