Opus-MT | Open neural machine translation models and web services | Translation library

 by   Helsinki-NLP Python Version: Current License: MIT

kandi X-RAY | Opus-MT Summary

kandi X-RAY | Opus-MT Summary

Opus-MT is a Python library typically used in Utilities, Translation applications. Opus-MT has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. However Opus-MT has 1 bugs. You can download it from GitHub.

Tools and resources for open translation services.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              Opus-MT has a low active ecosystem.
              It has 341 star(s) with 51 fork(s). There are 15 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 38 open issues and 23 have been closed. On average issues are closed in 70 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of Opus-MT is current.

            kandi-Quality Quality

              Opus-MT has 1 bugs (0 blocker, 0 critical, 1 major, 0 minor) and 89 code smells.

            kandi-Security Security

              Opus-MT has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              Opus-MT code analysis shows 0 unresolved vulnerabilities.
              There are 23 security hotspots that need review.

            kandi-License License

              Opus-MT is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              Opus-MT releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              It has 1825 lines of code, 80 functions and 23 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed Opus-MT and discovered the below as its top functions. This is intended to give you an instant insight into Opus-MT implemented functionality, and help decide if they suit your requirements.
            • Handle a message
            • Segment a list of tokens
            • Process a line
            • Segment a sentence
            • Creates argument parser
            • Create a language element
            • Create an etree element
            • Post a translation
            • Process line
            • Create a WSGI application
            • Setup logger
            • Read a vocabulary file
            • Process translation
            • Get git user info
            • Process a message
            • Translate text
            • Fill keyboard buttons
            Get all kandi verified functions for this library.

            Opus-MT Key Features

            No Key Features are available at this moment for Opus-MT.

            Opus-MT Examples and Code Snippets

            No Code Snippets are available at this moment for Opus-MT.

            Community Discussions

            QUESTION

            What is the difference between MarianMT and OpusMT?
            Asked 2021-Dec-18 at 14:43

            I'm currently comparing various pre-trained NMT models and can't help but wonder what the difference between MarianMT and OpusMT is. According to OpusMT's Github it is based on MarianMT. However in the Huggingface transformers implementation all pretrained MarianMT models start with "Helsinki-NLP/opus-mt". So I thought it was the same, but even though they're roughly the same size, they yield different translation results.

            If someone could please shed some light on what the differences are I would be very thankful.

            ...

            ANSWER

            Answered 2021-Dec-18 at 14:43

            Marian is an open-source tool for training and serving neural machine translation, mostly developed at the University of Edinburgh, Adam Mickiewicz University in Poznań and at Microsoft. It is implemented in C++ and is heavily optimized for MT, unlike PyTorch-based Huggingface Transformers that aim for generality rather than efficiency in a specific use case.

            The NLP group at the University of Helsinki trained many translation models using Marian on parallel data collected at Opus, and open-sourced those models. Later, they also did a conversion of the trained model into Huggingface Transformers and made them available via the Huggingface Hub.

            MarianMT is a class in Huggingface Transformers for imported Marian models. You can train a model in Marian and convert it yourself. OpusMT models are Marian models trained on the Opus data in Helsinki converted to the PyTorch models. If you search the Huggingface Hub for Marian, you will find other MarianMT models than those from Helsinki.

            Source https://stackoverflow.com/questions/70367816

            QUESTION

            Bert model output interpretation
            Asked 2021-Aug-17 at 16:04

            I searched a lot for this but havent still got a clear idea so I hope you can help me out:

            I am trying to translate german texts to english! I udes this code:

            ...

            ANSWER

            Answered 2021-Aug-17 at 13:27

            I think one possible answer to your dilemma is provided in this question: https://stackoverflow.com/questions/61523829/how-can-i-use-bert-fo-machine-translation#:~:text=BERT%20is%20not%20a%20machine%20translation%20model%2C%20BERT,there%20are%20doubts%20if%20it%20really%20pays%20off.

            Practically with the output of BERT, you get a vectorized representation for each of your words. In essence, it is easier to use the output for other tasks, but trickier in the case of Machine Translation.

            A good starting point of using a seq2seq model from the transformers library in the context of machine translation is the following: https://github.com/huggingface/notebooks/blob/master/examples/translation.ipynb.

            The example above provides how to translate from English to Romanian.

            Source https://stackoverflow.com/questions/68817989

            QUESTION

            Bert Transformer "Size Error" while Machine Traslation
            Asked 2021-Aug-17 at 09:18

            I am getting desperate as I have no clue what is the problem over here. I want to translate a list of sentences from german to english. This is my code:

            ...

            ANSWER

            Answered 2021-Aug-17 at 09:18

            In the problem described here (credits to LysandreJik): https://github.com/huggingface/transformers/issues/5480, the problem appears to be the data type of a dict instead of tensor.

            It might be the case that you need to change the tokenizer output from:

            Source https://stackoverflow.com/questions/68813979

            QUESTION

            AttributeError: 'list' object has no attribute 'size' Hugging-Face transformers
            Asked 2021-Mar-14 at 16:21

            I am trying to use Huggingface to transform stuff from English to Hindi. This is the code snippet

            ...

            ANSWER

            Answered 2021-Mar-14 at 16:21

            The model requires pytorch tensors and not a python list. Simply add return_tensors='pt' to prepare_seq2seq:

            Source https://stackoverflow.com/questions/66625389

            QUESTION

            How to reduce the inference time of Helsinki-NLP/opus-mt-es-en (translation model) from transformer
            Asked 2021-Jan-13 at 10:10

            Currently Helsinki-NLP/opus-mt-es-en model takes around 1.5sec for inference from transformer. How can that be reduced? Also when trying to convert it to onxx runtime getting this error:

            ValueError: Unrecognized configuration class for this kind of AutoModel: AutoModel. Model type should be one of RetriBertConfig, MT5Config, T5Config, DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, LayoutLMConfig, SqueezeBertConfig, BertConfig, OpenAIGPTConfig, GPT2Config, MobileBertConfig, TransfoXLConfig, XLNetConfig, FlaubertConfig, FSMTConfig, XLMConfig, CTRLConfig, ElectraConfig, ReformerConfig, FunnelConfig, LxmertConfig, BertGenerationConfig, DebertaConfig, DPRConfig, XLMProphetNetConfig, ProphetNetConfig, MPNetConfig, TapasConfig.

            Is it possible to convert this to onxx runtime?

            ...

            ANSWER

            Answered 2021-Jan-13 at 10:10

            The OPUS models are originally trained with Marian which is a highly optimized toolkit for machine translation written fully in C++. Unlike PyTorch, it does have the ambition to be a general deep learning toolkit, so it can focus on MT efficiency. The Marian configurations and instructions on how to download the models are at https://github.com/Helsinki-NLP/OPUS-MT.

            The OPUS-MT models for Huggingface's Transformers are converted from the original Marian models are meant more for prototyping and analyzing the models rather than for using them for translation in a production-like setup.

            Running the models in Marian will certainly much faster than in Python and it is certainly much easier than hacking Transformers to run with onxx runtime. Marian also offers further tricks to speed up the translation, e.g., by model quantization, which is however at the expense of the translation quality.

            With both Marian and Tranformers, you can speed things up if you use GPU or if you narrow the beam width during decoding (attribute num_beams in the generate method in Transformers).

            Source https://stackoverflow.com/questions/65541788

            QUESTION

            Enhance a MarianMT pretrained model from HuggingFace with more training data
            Asked 2020-Sep-07 at 12:37

            I am using a pretrained MarianMT machine translation model from English to German. I also have a large set of high quality English-to-German sentence pairs that I would like to use to enhance the performance of the model, which is trained on the OPUS corpus, but without making the model forget the OPUS training data. Is there a way to do that? Thanks.

            ...

            ANSWER

            Answered 2020-Sep-07 at 12:37

            Have you tried the finetune.sh script shown here? In addition to the short list of CLI flags listed there, you could try adding:

            Source https://stackoverflow.com/questions/63774619

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install Opus-MT

            Download the latest version from github:.
            There is another option of setting up translation services using WebSockets and Linux services. Detailed information is available from doc/WebSocketServer.md.

            Support

            OPUS-translator: implementation of a simple on-line translation interfaceOPUS-CAT: an implementation of an NMT plugin for Trados Studio that can run OPUS-MT modelsfiskmö: a project on the devlopment of resources and tools for translating between Finnish and SwedishThe Tatoeba MT Challenge with lots of pre-trained NMT modelsThe NMT map that plots the status of Tatoeba NMT models on a mappre-trained multilingual models trained on OPUS-100 using the zero toolkit
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/Helsinki-NLP/Opus-MT.git

          • CLI

            gh repo clone Helsinki-NLP/Opus-MT

          • sshUrl

            git@github.com:Helsinki-NLP/Opus-MT.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link