OpusFilter | OpusFilter - Parallel corpus processing toolkit
kandi X-RAY | OpusFilter Summary
kandi X-RAY | OpusFilter Summary
OpusFilter is a Python library. OpusFilter has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can install using 'pip install OpusFilter' or download it from GitHub, PyPI.
The main script provided by the package is opusfilter, which takes a configuration file as an input. The configuration files are in YAML format. At the top level, they have to sections:. The syntax for the opusfilter is. where CONFIG is path to the configuration file. The script will run the steps one by one and stops when the final step has been processed (if no exceptions were raised). The script has options for setting the last step to run (--last) and running only a single step (--single). It the latter, the user has to make sure that all input files for the step already exist. The first step has number 1, and -1 points to the last step, -2 to the second to last, and so on. By default, existing output files will be re-used, and the steps producing them skipped. The --overwrite option will force overwrite for all the steps.
The main script provided by the package is opusfilter, which takes a configuration file as an input. The configuration files are in YAML format. At the top level, they have to sections:. The syntax for the opusfilter is. where CONFIG is path to the configuration file. The script will run the steps one by one and stops when the final step has been processed (if no exceptions were raised). The script has options for setting the last step to run (--last) and running only a single step (--single). It the latter, the user has to make sure that all input files for the step already exist. The first step has number 1, and -1 points to the last step, -2 to the second to last, and so on. By default, existing output files will be re-used, and the steps producing them skipped. The --overwrite option will force overwrite for all the steps.
Support
Quality
Security
License
Reuse
Support
OpusFilter has a low active ecosystem.
It has 78 star(s) with 15 fork(s). There are 9 watchers for this library.
There were 1 major release(s) in the last 6 months.
There are 4 open issues and 23 have been closed. On average issues are closed in 61 days. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of OpusFilter is 3.1.0
Quality
OpusFilter has 0 bugs and 0 code smells.
Security
OpusFilter has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
OpusFilter code analysis shows 0 unresolved vulnerabilities.
There are 0 security hotspots that need review.
License
OpusFilter is licensed under the MIT License. This license is Permissive.
Permissive licenses have the least restrictions, and you can use them in most projects.
Reuse
OpusFilter releases are not available. You will need to build from source code and install.
Deployable package is available in PyPI.
Build file is available. You can build the component from source.
Installation instructions are not available. Examples and code snippets are available.
It has 5260 lines of code, 482 functions and 27 files.
It has medium code complexity. Code complexity directly impacts maintainability of the code.
Top functions reviewed by kandi - BETA
kandi has reviewed OpusFilter and discovered the below as its top functions. This is intended to give you an instant insight into OpusFilter implemented functionality, and help decide if they suit your requirements.
- Split the input and output files
- Check for extra parameters
- Close files
- Write lines to outfiles
- Train a classifier
- List of feature weights
- Find the best fit for a given criterion
- Create a LVML object
- Create a temporary token file
- Train the model
- Computes the score of the given pairs
- Tokenize a sentence
- Preprocess input files
- Train a Morfessor segment
- Compute score for given parameters
- Fit neighborhood models
- Compute the scores for each sentence
- Concatenate input files
- Read corpus fromopus
- Filter data
- Compute score for the given pairs
- Train logistic regression model
- Classify a model
- Given a set of pairs yield the pairs that are false
- Computes scores for the given pairs
- Train a ngram
Get all kandi verified functions for this library.
OpusFilter Key Features
No Key Features are available at this moment for OpusFilter.
OpusFilter Examples and Code Snippets
Copy
steps:
- type: opus_read
parameters:
corpus_name: ParaCrawl
source_language: fi
target_language: en
release: v4
preprocessing: raw
src_output: paracrawl.fi.gz
tgt_output: paracrawl.en.gz
steps:
- typ
Copy
import opusfilter
class UppercaseFilter(opusfilter.FilterABC):
def __init__(self, threshold=0.5, **kwargs):
self.threshold = threshold
super().__init__(**kwargs)
def uppercase_ratio(self, sentence):
length = len(sen
Copy
common:
constants:
source: en
steps:
- type: concatenate
parameters:
inputs:
- !varstr "file1.{source}-{target}.gz"
- !varstr "file2.{source}-{target}.gz"
output: !varstr "all.{source}-{target}.gz"
constants:
Community Discussions
No Community Discussions are available at this moment for OpusFilter.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install OpusFilter
You can install using 'pip install OpusFilter' or download it from GitHub, PyPI.
You can use OpusFilter like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
You can use OpusFilter like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page