bakta | standardized annotation of bacterial genomes | Genomics library

 by   oschwengers Python Version: 1.9.3 License: GPL-3.0

kandi X-RAY | bakta Summary

kandi X-RAY | bakta Summary

bakta is a Python library typically used in Artificial Intelligence, Genomics applications. bakta has no bugs, it has no vulnerabilities, it has build file available, it has a Strong Copyleft License and it has low support. You can install using 'pip install bakta' or download it from GitHub, PyPI.

Comprehensive & taxonomy-independent database Bakta provides a large and taxonomy-independent database using UniProt's entire UniRef protein sequence cluster universe. Thus, it achieves favourable annotations in terms of sensitivity and specificity along the broad continuum ranging from well-studied species to unknown genomes from MAGs. Protein sequence identification Bakta exactly identifies known identical protein sequences (IPS) from RefSeq and UniProt allowing the fine-grained annotation of gene alleles (AMR) or closely related but distinct protein families. This is achieved via an alignment-free sequence identification (AFSI) approach using full-length MD5 protein sequence hash digests. Fast This AFSI approach substantially accellerates the annotation process by avoiding computationally expensive homology searches for identified genes. Thus, Bakta can annotate a typical bacterial genome in 10 ±5 min on a laptop, plasmids in a couple of seconds/minutes. Database cross-references Fostering the FAIR principles, Bakta exploits its AFSI approach to annotate CDS with database cross-references (dbxref) to RefSeq (WP_*), UniRef100 (UniRef100_*) and UniParc (UPI*). By doing so, IPS allow the surveillance of distinct gene alleles and streamlining comparative analysis as well as posterior (external) annotations of putative & hypothetical protein sequences which can be mapped back to existing CDS via these exact & stable identifiers (E. coli gene ymiA ...more). Currently, Bakta identifies ~214.8 mio, ~199 mio and ~161 mio distinct protein sequences from UniParc, UniRef100 and RefSeq, respectively. Hence, for certain genomes, up to 99 % of all CDS can be identified this way, skipping computationally expensive sequence alignments. FAIR annotations To provide standardized annotations adhearing to FAIR principles, Bakta utilizes a versioned custom annotation database comprising UniProt's UniRef100 & UniRef90 protein clusters (FAIR -> DOI/DOI) enriched with dbxrefs (GO, COG, EC) and annotated by specialized niche databases. For each db version we provide a comprehensive log file of all imported sequences and annotations. Small proteins / short open reading frames Bakta detects and annotates small proteins/short open reading frames (sORF) which are not predicted by tools like Prodigal. Expert annotation systems To provide high quality annotations for certain proteins of higher interest, e.g. AMR & VF genes, Bakta includes & merges different expert annotation systems. Currently, Bakta uses NCBI's AMRFinderPlus for AMR gene annotations as well as an generalized protein sequence expert system with distinct coverage, identity and priority values for each sequence, currenlty comprising the VFDB as well as NCBI's BlastRules. Comprehensive workflow Bakta annotates ncRNA cis-regulatory regions, oriC/oriV/oriT and assembly gaps as well as standard feature types: tRNA, tmRNA, rRNA, ncRNA genes, CRISPR, CDS. GFF3 & INSDC conform annotations Bakta writes GFF3 and INSDC-compliant (Genbank & EMBL) annotation files ready for submission (checked via GenomeTools GFF3Validator, table2asn_GFF and ENA Webin-CLI for GFF3 and EMBL file formats, respectively for representative genomes of all ESKAPE species). Bacteria & plasmids only Bakta was designed to annotate bacteria (isolates & MAGs) and plasmids, only. This decision by design has been made in order to tweak the annotation process regarding tools, preferences & databases and to streamline further development & maintenance of the software. Reasoning By annotating bacterial genomes in a standardized, taxonomy-independent, high-throughput and local manner, Bakta aims at a well-balanced tradeoff between fully featured but computationally demanding pipelines like PGAP and rapid highly customizable offline tools like Prokka. Indeed, Bakta is heavily inspired by Prokka (kudos to Torsten Seemann) and many command line options are compatible for the sake of interoperability and user convenience. Hence, if Bakta does not fit your needs, please consider trying Prokka.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              bakta has a low active ecosystem.
              It has 296 star(s) with 33 fork(s). There are 11 watchers for this library.
              There were 4 major release(s) in the last 12 months.
              There are 10 open issues and 127 have been closed. On average issues are closed in 51 days. There are 2 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of bakta is 1.9.3

            kandi-Quality Quality

              bakta has no bugs reported.

            kandi-Security Security

              bakta has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              bakta is licensed under the GPL-3.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              bakta releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed bakta and discovered the below as its top functions. This is intended to give you an instant insight into bakta implemented functionality, and help decide if they suit your requirements.
            • Write a GFF3 file
            • Encode the annotations
            • Write the signal peptide
            • Encode an attribute
            • Write an insdc output to a genome
            • Extract EC number and EC number from the given notes
            • Select the ncrRNA class from a feature
            • Revise dbxrefs
            • Run diamond search
            • Performs overlap filter on a genome
            • Run hmmsearch
            • Lookup a sequence of features in the database
            • Run prediction on sequences_path
            • Annotate a sequence of AA
            • Check that the database directory exists
            • Test whether the dependencies are met
            • Writes user protein protein sequences file
            • Updates an AMRinder database
            • Download a file from Zenodo
            • Combine annotation
            • Detect feature overlaps between contigs
            • Sets up NCBI
            • Detect pseudogenes from candidates
            • Run ncrna on contigs_path
            • Predict ncrRNA regions from contigs_path
            • Use blastn to predict features
            Get all kandi verified functions for this library.

            bakta Key Features

            No Key Features are available at this moment for bakta.

            bakta Examples and Code Snippets

            No Code Snippets are available at this moment for bakta.

            Community Discussions

            QUESTION

            aws sam local invoke command returns Error: Cannot find module error
            Asked 2021-Mar-19 at 19:44

            I am getting Error: Cannot find module (shared the full stack trace at the end) error when trying to test my lambda with sam local invoke command. Basically, I created a typescript project and using AWS CDK to create my lambda resources.

            Here is my project directory structure

            ...

            ANSWER

            Answered 2021-Mar-19 at 19:44

            I was referencing my actual src directory which should be replaced with dist/src in my case. So, code parameter should be code: lambda.Code.fromAsset('dist/src') instead of code: lambda.Code.fromAsset('src').

            Source https://stackoverflow.com/questions/66697283

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install bakta

            Bakta can be installed via BioConda, Docker, Singularity and Pip. However, we encourage to use Conda or Docker/Singularity to automatically install all required 3rd party dependencies. In all cases a mandatory database must be downloaded.
            Bakta requires a mandatory database which is publicly hosted at Zenodo: Further information is provided in the database section below.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install bakta

          • CLONE
          • HTTPS

            https://github.com/oschwengers/bakta.git

          • CLI

            gh repo clone oschwengers/bakta

          • sshUrl

            git@github.com:oschwengers/bakta.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link