FUSAC | FFPE-tissue UMI-based Sequence Artefact Classifier

 by   clinical-genomics-uppsala Python Version: Current License: No License

kandi X-RAY | FUSAC Summary

kandi X-RAY | FUSAC Summary

FUSAC is a Python library. FUSAC has no bugs, it has no vulnerabilities and it has low support. However FUSAC build file is not available. You can download it from GitHub.

FFPE tissue UMI-based Sequence Artefact Classifier.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              FUSAC has a low active ecosystem.
              It has 0 star(s) with 0 fork(s). There are 1 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 0 open issues and 16 have been closed. On average issues are closed in 2 days. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of FUSAC is current.

            kandi-Quality Quality

              FUSAC has no bugs reported.

            kandi-Security Security

              FUSAC has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              FUSAC does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              FUSAC releases are not available. You will need to build from source code and install.
              FUSAC has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions, examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of FUSAC
            Get all kandi verified functions for this library.

            FUSAC Key Features

            No Key Features are available at this moment for FUSAC.

            FUSAC Examples and Code Snippets

            No Code Snippets are available at this moment for FUSAC.

            Community Discussions

            No Community Discussions are available at this moment for FUSAC.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install FUSAC

            FUSAC is a python-based program for the identification and classification of FFPE-artefacts in UMI tagged sequence data. Using a VCF and BAM-file + BAI-file as input, FUSAC is able to successfully identify group and collapse all reads aligning to a position called by the VCF. From this FUSAC generates consensus sequences for each UMI as well as identifying their string of origin before amplification. These consensus sequences are then compared with their mate, and thus FUSAC is able to not only identify C:G>T:A artefacts left by hydrolytic deamination, but also identify true mutations, deletions, unknowns and any other type of mismatch. FUSAC requires the user to have basic understanding of the data they wish to study, namely the location of the UMI-tag and how it is structurally stored within the BAM-file. More specifically it requires the user to define both the location of the tag (query name or the RX-field) as well as the character the UMI is separated by if such a character exists. From this input, FUSAC generates a modified VCF-file as output. The output VCF is a copy of the input VCF but has a modified "FILTER" field where any classified FFPE-artefact will display "FFPE". Furthermore, the output VCF with also have a modified "FORMAT" field where the molecular support for the variant position having no mutation a true mutation, an FFPE-artefact, an unknown, or a deletion will be displayed. This field also contains the molecular support for the reference genome nucleotide as well as the called variant nucleotide for paired reads on str1, str2, as well as the support on single reads belonging to string 1 and string 2.
            Required input arguments for running FUSAC are -b and -v, which are the respective paths to the .bam and .vcf file. Furthermore, an indexed BAM (.bai) file is required for extracting desired segments of the BAM-file. The other input flags are not required, but should be changed if the default value is not representative of the desired output. To minimize run-time and CPU-load FUSAC can run on multiple threads. Unfortunately, as pickling cannot deal with open filehandles, multiprocessing is not a viable option as this would require the file to be opened for every read aligning to the variant-position. Instead, FUSAC uses the python "threading" module with a producer-consumer approach, where the producer generates and populates a queue, and the consumer thread extracts the inhabitants of this queue for analysis. To control this threading process, the arguments threads (-t) and queueSize (-qs) determine the number of threads to be run and the size of the threading queue respectively. The default values for threads and queueSize respectively are one active thread and an infinite queue, but can be set to any integer value desired. The default FFPE-classification mode focuses solely on C:G>T:A artefacts, however if desired the program can also identify any mismatching consensus nucleotides using the input flag ffpeBases (-fb) with the option "all". Lastly, FUSAC is entirely dependent on the UMI-tag being properly extracted to ensure that reads are assigned to String 1 or String 2 as origin. Therefore, the user can specify through the umiPosition (-up) tag if the UMI-tag is located in the query-name ("qrn") or the RX-tag respectively ("rx"). Furthermore, the UMI-tag needs to be split in half to be rearranged correctly, which can be done using the input splitCharacter (-sc) which represents the character on which to split the tag. For reads where the UMI-tag is not separated by a tag, the input "" should be used to split the tag in half. The final input to consider is csvFile (-cf) which controls whether or not FUSAC generates an output CSV file based on the FUSAC output. This CSV generates a separate row for each variant-record with columns for the molecular support for the reference genome nucletoide, the variant-call nucleotide, the number of FFPE-calls, the overall frequency of FFPE-artefacts for each variant-record, and the type of mismatch for the variant-record. The default setting is to generate the CSV, but if this is not required the function can be turned off using the input "no". We wish classify all mismatches belonging to the file example_bam using the example_vcf file. The Reads in the example_bam file have their UMI-tag stored in the query-name, which is separated by the character "_". The program is being run on a laptop with 4 cores, and we wish to limit the queue to 9 variant-records. We wish classify only C:T>G:A artefacts belonging to the file example_bam using the example_vcf file. The Reads in the example_bam file have their UMI-tag stored in the RX-tag, and are not separated by any character. The program is being run on a cluster with 16 cores. We do not wish to limit the queue, but rather have it infinite. Furthermore, we do not wish to generate a CSV file.

            Support

            FUSAC can only classify variant types if the UMI it is studying has both a positive and negative strand consensus sequence. Thus, if the studied data has many singletons or if only one of the strings align to the variant-call position, there will be a limited amount of classifications that can be made. The single paired reads and the singletons instead provide unique molecular counts for the variant call nucleotide as well as the reference genome nucleotide for the variant-call position.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/clinical-genomics-uppsala/FUSAC.git

          • CLI

            gh repo clone clinical-genomics-uppsala/FUSAC

          • sshUrl

            git@github.com:clinical-genomics-uppsala/FUSAC.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Python Libraries

            public-apis

            by public-apis

            system-design-primer

            by donnemartin

            Python

            by TheAlgorithms

            Python-100-Days

            by jackfrued

            youtube-dl

            by ytdl-org

            Try Top Libraries by clinical-genomics-uppsala

            Geneious_pangolin_wrapper

            by clinical-genomics-uppsalaPython

            Twist_DNA

            by clinical-genomics-uppsalaPython

            ductus-core

            by clinical-genomics-uppsalaPython

            CAPS

            by clinical-genomics-uppsalaPython

            TSO500_Marvin

            by clinical-genomics-uppsalaPython