OpusFilter | OpusFilter - Parallel corpus processing toolkit

 by   Helsinki-NLP Python Version: 3.1.0 License: MIT

kandi X-RAY | OpusFilter Summary

kandi X-RAY | OpusFilter Summary

OpusFilter is a Python library. OpusFilter has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can install using 'pip install OpusFilter' or download it from GitHub, PyPI.

The main script provided by the package is opusfilter, which takes a configuration file as an input. The configuration files are in YAML format. At the top level, they have to sections:. The syntax for the opusfilter is. where CONFIG is path to the configuration file. The script will run the steps one by one and stops when the final step has been processed (if no exceptions were raised). The script has options for setting the last step to run (--last) and running only a single step (--single). It the latter, the user has to make sure that all input files for the step already exist. The first step has number 1, and -1 points to the last step, -2 to the second to last, and so on. By default, existing output files will be re-used, and the steps producing them skipped. The --overwrite option will force overwrite for all the steps.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              OpusFilter has a low active ecosystem.
              It has 78 star(s) with 15 fork(s). There are 9 watchers for this library.
              There were 1 major release(s) in the last 6 months.
              There are 4 open issues and 23 have been closed. On average issues are closed in 61 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of OpusFilter is 3.1.0

            kandi-Quality Quality

              OpusFilter has 0 bugs and 0 code smells.

            kandi-Security Security

              OpusFilter has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              OpusFilter code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              OpusFilter is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              OpusFilter releases are not available. You will need to build from source code and install.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              It has 5260 lines of code, 482 functions and 27 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed OpusFilter and discovered the below as its top functions. This is intended to give you an instant insight into OpusFilter implemented functionality, and help decide if they suit your requirements.
            • Split the input and output files
            • Check for extra parameters
            • Close files
            • Write lines to outfiles
            • Train a classifier
            • List of feature weights
            • Find the best fit for a given criterion
            • Create a LVML object
            • Create a temporary token file
            • Train the model
            • Computes the score of the given pairs
            • Tokenize a sentence
            • Preprocess input files
            • Train a Morfessor segment
            • Compute score for given parameters
            • Fit neighborhood models
            • Compute the scores for each sentence
            • Concatenate input files
            • Read corpus fromopus
            • Filter data
            • Compute score for the given pairs
            • Train logistic regression model
            • Classify a model
            • Given a set of pairs yield the pairs that are false
            • Computes scores for the given pairs
            • Train a ngram
            Get all kandi verified functions for this library.

            OpusFilter Key Features

            No Key Features are available at this moment for OpusFilter.

            OpusFilter Examples and Code Snippets

            OpusFilter,Overview,Examples
            Pythondot img1Lines of Code : 89dot img1License : Permissive (MIT)
            copy iconCopy
            steps:
              - type: opus_read
                parameters:
                  corpus_name: ParaCrawl
                  source_language: fi
                  target_language: en
                  release: v4
                  preprocessing: raw
                  src_output: paracrawl.fi.gz
                  tgt_output: paracrawl.en.gz
            
            steps:
              - typ  
            OpusFilter,Custom filters
            Pythondot img2Lines of Code : 34dot img2License : Permissive (MIT)
            copy iconCopy
            import opusfilter
            
            class UppercaseFilter(opusfilter.FilterABC):
            
                def __init__(self, threshold=0.5, **kwargs):
                    self.threshold = threshold
                    super().__init__(**kwargs)
            
                def uppercase_ratio(self, sentence):
                    length = len(sen  
            OpusFilter,Overview,Variables and constants
            Pythondot img3Lines of Code : 26dot img3License : Permissive (MIT)
            copy iconCopy
            common:
              constants:
                source: en
            
            steps:
              - type: concatenate
                parameters:
                  inputs:
                  - !varstr "file1.{source}-{target}.gz"
                  - !varstr "file2.{source}-{target}.gz"
                  output: !varstr "all.{source}-{target}.gz"
                constants:
              

            Community Discussions

            No Community Discussions are available at this moment for OpusFilter.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install OpusFilter

            You can install using 'pip install OpusFilter' or download it from GitHub, PyPI.
            You can use OpusFilter like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install opusfilter

          • CLONE
          • HTTPS

            https://github.com/Helsinki-NLP/OpusFilter.git

          • CLI

            gh repo clone Helsinki-NLP/OpusFilter

          • sshUrl

            git@github.com:Helsinki-NLP/OpusFilter.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Python Libraries

            public-apis

            by public-apis

            system-design-primer

            by donnemartin

            Python

            by TheAlgorithms

            Python-100-Days

            by jackfrued

            youtube-dl

            by ytdl-org

            Try Top Libraries by Helsinki-NLP

            Opus-MT

            by Helsinki-NLPPython

            prosody

            by Helsinki-NLPPython

            HBMP

            by Helsinki-NLPPython

            OpusTools

            by Helsinki-NLPPython

            XED

            by Helsinki-NLPJupyter Notebook