FUSAC | FFPE-tissue UMI-based Sequence Artefact Classifier
kandi X-RAY | FUSAC Summary
kandi X-RAY | FUSAC Summary
FUSAC is a Python library. FUSAC has no bugs, it has no vulnerabilities and it has low support. However FUSAC build file is not available. You can download it from GitHub.
FFPE tissue UMI-based Sequence Artefact Classifier.
FFPE tissue UMI-based Sequence Artefact Classifier.
Support
Quality
Security
License
Reuse
Support
FUSAC has a low active ecosystem.
It has 0 star(s) with 0 fork(s). There are 1 watchers for this library.
It had no major release in the last 6 months.
There are 0 open issues and 16 have been closed. On average issues are closed in 2 days. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of FUSAC is current.
Quality
FUSAC has no bugs reported.
Security
FUSAC has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
FUSAC does not have a standard license declared.
Check the repository for any license declaration and review the terms closely.
Without a license, all rights are reserved, and you cannot use the library in your applications.
Reuse
FUSAC releases are not available. You will need to build from source code and install.
FUSAC has no build file. You will be need to create the build yourself to build the component from source.
Installation instructions, examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of FUSAC
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of FUSAC
FUSAC Key Features
No Key Features are available at this moment for FUSAC.
FUSAC Examples and Code Snippets
No Code Snippets are available at this moment for FUSAC.
Community Discussions
No Community Discussions are available at this moment for FUSAC.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install FUSAC
FUSAC is a python-based program for the identification and classification of FFPE-artefacts in UMI tagged sequence data. Using a VCF and BAM-file + BAI-file as input, FUSAC is able to successfully identify group and collapse all reads aligning to a position called by the VCF. From this FUSAC generates consensus sequences for each UMI as well as identifying their string of origin before amplification. These consensus sequences are then compared with their mate, and thus FUSAC is able to not only identify C:G>T:A artefacts left by hydrolytic deamination, but also identify true mutations, deletions, unknowns and any other type of mismatch. FUSAC requires the user to have basic understanding of the data they wish to study, namely the location of the UMI-tag and how it is structurally stored within the BAM-file. More specifically it requires the user to define both the location of the tag (query name or the RX-field) as well as the character the UMI is separated by if such a character exists. From this input, FUSAC generates a modified VCF-file as output. The output VCF is a copy of the input VCF but has a modified "FILTER" field where any classified FFPE-artefact will display "FFPE". Furthermore, the output VCF with also have a modified "FORMAT" field where the molecular support for the variant position having no mutation a true mutation, an FFPE-artefact, an unknown, or a deletion will be displayed. This field also contains the molecular support for the reference genome nucleotide as well as the called variant nucleotide for paired reads on str1, str2, as well as the support on single reads belonging to string 1 and string 2.
Required input arguments for running FUSAC are -b and -v, which are the respective paths to the .bam and .vcf file. Furthermore, an indexed BAM (.bai) file is required for extracting desired segments of the BAM-file. The other input flags are not required, but should be changed if the default value is not representative of the desired output. To minimize run-time and CPU-load FUSAC can run on multiple threads. Unfortunately, as pickling cannot deal with open filehandles, multiprocessing is not a viable option as this would require the file to be opened for every read aligning to the variant-position. Instead, FUSAC uses the python "threading" module with a producer-consumer approach, where the producer generates and populates a queue, and the consumer thread extracts the inhabitants of this queue for analysis. To control this threading process, the arguments threads (-t) and queueSize (-qs) determine the number of threads to be run and the size of the threading queue respectively. The default values for threads and queueSize respectively are one active thread and an infinite queue, but can be set to any integer value desired. The default FFPE-classification mode focuses solely on C:G>T:A artefacts, however if desired the program can also identify any mismatching consensus nucleotides using the input flag ffpeBases (-fb) with the option "all". Lastly, FUSAC is entirely dependent on the UMI-tag being properly extracted to ensure that reads are assigned to String 1 or String 2 as origin. Therefore, the user can specify through the umiPosition (-up) tag if the UMI-tag is located in the query-name ("qrn") or the RX-tag respectively ("rx"). Furthermore, the UMI-tag needs to be split in half to be rearranged correctly, which can be done using the input splitCharacter (-sc) which represents the character on which to split the tag. For reads where the UMI-tag is not separated by a tag, the input "" should be used to split the tag in half. The final input to consider is csvFile (-cf) which controls whether or not FUSAC generates an output CSV file based on the FUSAC output. This CSV generates a separate row for each variant-record with columns for the molecular support for the reference genome nucletoide, the variant-call nucleotide, the number of FFPE-calls, the overall frequency of FFPE-artefacts for each variant-record, and the type of mismatch for the variant-record. The default setting is to generate the CSV, but if this is not required the function can be turned off using the input "no". We wish classify all mismatches belonging to the file example_bam using the example_vcf file. The Reads in the example_bam file have their UMI-tag stored in the query-name, which is separated by the character "_". The program is being run on a laptop with 4 cores, and we wish to limit the queue to 9 variant-records. We wish classify only C:T>G:A artefacts belonging to the file example_bam using the example_vcf file. The Reads in the example_bam file have their UMI-tag stored in the RX-tag, and are not separated by any character. The program is being run on a cluster with 16 cores. We do not wish to limit the queue, but rather have it infinite. Furthermore, we do not wish to generate a CSV file.
Required input arguments for running FUSAC are -b and -v, which are the respective paths to the .bam and .vcf file. Furthermore, an indexed BAM (.bai) file is required for extracting desired segments of the BAM-file. The other input flags are not required, but should be changed if the default value is not representative of the desired output. To minimize run-time and CPU-load FUSAC can run on multiple threads. Unfortunately, as pickling cannot deal with open filehandles, multiprocessing is not a viable option as this would require the file to be opened for every read aligning to the variant-position. Instead, FUSAC uses the python "threading" module with a producer-consumer approach, where the producer generates and populates a queue, and the consumer thread extracts the inhabitants of this queue for analysis. To control this threading process, the arguments threads (-t) and queueSize (-qs) determine the number of threads to be run and the size of the threading queue respectively. The default values for threads and queueSize respectively are one active thread and an infinite queue, but can be set to any integer value desired. The default FFPE-classification mode focuses solely on C:G>T:A artefacts, however if desired the program can also identify any mismatching consensus nucleotides using the input flag ffpeBases (-fb) with the option "all". Lastly, FUSAC is entirely dependent on the UMI-tag being properly extracted to ensure that reads are assigned to String 1 or String 2 as origin. Therefore, the user can specify through the umiPosition (-up) tag if the UMI-tag is located in the query-name ("qrn") or the RX-tag respectively ("rx"). Furthermore, the UMI-tag needs to be split in half to be rearranged correctly, which can be done using the input splitCharacter (-sc) which represents the character on which to split the tag. For reads where the UMI-tag is not separated by a tag, the input "" should be used to split the tag in half. The final input to consider is csvFile (-cf) which controls whether or not FUSAC generates an output CSV file based on the FUSAC output. This CSV generates a separate row for each variant-record with columns for the molecular support for the reference genome nucletoide, the variant-call nucleotide, the number of FFPE-calls, the overall frequency of FFPE-artefacts for each variant-record, and the type of mismatch for the variant-record. The default setting is to generate the CSV, but if this is not required the function can be turned off using the input "no". We wish classify all mismatches belonging to the file example_bam using the example_vcf file. The Reads in the example_bam file have their UMI-tag stored in the query-name, which is separated by the character "_". The program is being run on a laptop with 4 cores, and we wish to limit the queue to 9 variant-records. We wish classify only C:T>G:A artefacts belonging to the file example_bam using the example_vcf file. The Reads in the example_bam file have their UMI-tag stored in the RX-tag, and are not separated by any character. The program is being run on a cluster with 16 cores. We do not wish to limit the queue, but rather have it infinite. Furthermore, we do not wish to generate a CSV file.
Support
FUSAC can only classify variant types if the UMI it is studying has both a positive and negative strand consensus sequence. Thus, if the studied data has many singletons or if only one of the strings align to the variant-call position, there will be a limited amount of classifications that can be made. The single paired reads and the singletons instead provide unique molecular counts for the variant call nucleotide as well as the reference genome nucleotide for the variant-call position.
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page