bwa | Wheeler Aligner for short-read alignment | Genomics library

 by   lh3 C Version: v0.7.17 License: GPL-3.0

kandi X-RAY | bwa Summary

kandi X-RAY | bwa Summary

bwa is a C library typically used in Artificial Intelligence, Genomics applications. bwa has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has medium support. You can download it from GitHub.

BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to a few megabases. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads. For all the algorithms, BWA first needs to construct the FM-index for the reference genome (the index command). Alignment algorithms are invoked with different sub-commands: aln/samse/sampe for BWA-backtrack, bwasw for BWA-SW and mem for the BWA-MEM algorithm.

            kandi-support Support

              bwa has a medium active ecosystem.
              It has 1308 star(s) with 535 fork(s). There are 112 watchers for this library.
              It had no major release in the last 12 months.
              There are 172 open issues and 98 have been closed. On average issues are closed in 116 days. There are 42 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of bwa is v0.7.17

            kandi-Quality Quality

              bwa has 0 bugs and 0 code smells.

            kandi-Security Security

              bwa has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              bwa code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              bwa is licensed under the GPL-3.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              bwa releases are available to install and integrate.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of bwa
            Get all kandi verified functions for this library.

            bwa Key Features

            No Key Features are available at this moment for bwa.

            bwa Examples and Code Snippets

            No Code Snippets are available at this moment for bwa.

            Community Discussions


            Joins using aggregation in Oracle
            Asked 2022-Feb-04 at 11:05

            I have two tables. Table Main and Sub. I need to join these two tables. The key's that have same grp_id is one single group. eg : in Table main (BWA,ST,FD62E015) is one group and (BWA,VI,FD62E015) is other group and so on. The same goes with the other table sub as well. Now i want to join these two tables and get the grp_id from main table in a way that if the group that has key (BWA,FD62E015) in table sub gets the grp_id 1 and 2 from Main table and the group that has key (BWA,FM62Q011) gets grp_id 3 and 4.

            So the normal joins wont work here since both the group in the sub table has the key BWA. Is there way to aggregate the key's and join them ?



            Answered 2022-Feb-04 at 10:45


            Using bash how to iterate through lines in a txt file and sequentially pair up every two lines
            Asked 2022-Jan-20 at 14:56

            Hi I am attempting to use bash to iterate through a .txt file which contains the following lines. This is a smaller subset of the full list of fastq files, but all samples follow the same patterns.



            Answered 2022-Jan-20 at 14:56

            Why not reading 2 lines at a time? Remove echo before bwa... when you'll be satisfied with the result.



            Snakemake on cluster error: 'Wildcards' object has no attribute 'output'
            Asked 2022-Jan-11 at 14:29

            I'm running into an error of 'Wildcards' object has no attribute 'output', similar to this earlier question 'Wildcards' object has no attribute 'output', when I submit Snakemake to my cluster. I'm wondering if you have any suggestions for how to make this compatible with the cluster?

            While my rule annotate_snps works when I test it locally, I get the following error on the cluster:



            Answered 2022-Jan-11 at 14:29

            The raw rule definition appears to be consistent except for the multiple calls to the contents of config, e.g. config[snpeff].

            One thing to check is if the config definition on the single machine and on the cluster is the same, if it's not there might be some content that is confusing snakemake, e.g. if somehow config[snpeff] == "wildcards.output" (or something similar).



            Snakemake Error: No values given for wildcard
            Asked 2022-Jan-04 at 04:45

            This is a follow-up of a previous question about using a Python dictionary to generate a list of files to include as input for a single step. In this case, I'm interested in merging BAM files for a single sample that have been generated by mapping FASTQ files from multiple runs.

            I am running into an error in my rule combine_bams only for a single sample:



            Answered 2022-Jan-04 at 03:10

            In rule combine_bams, when using lambda expression you will need to provide the values of all {} wildcards. Right now there is only run information provided. One way to fix this is to include kwarg allow_missing=True to expand:



            Fill missing values by group using linear regression in R
            Asked 2021-Dec-02 at 13:40

            I have a dataset with about 50 columns (all indicators I got from World Bank), Country Code and Year. These 50 columns are not all complete, and I would like to fill in the missing values based on an lm fit for the column for that specific country. For example:

            Doing this for a single country and a single column is absolutely fine when following these steps here: Filling NA using linear regression in R

            However, I have over 180 different countries I want to do this to. And I want this to work for each indicator per country (so 50 columns total) So in a way, each country and each column would have its own linear regression model that fills out the missing values.

            Here is how it looked after I did the steps above: This is the expected output for ONE column. I would like to do this for EVERY column by individual country groups.

            However, the data looks like this:

            There are numerous countries and columns that I want to perform this on just like the post above.

            This is for a project I am working on for my data-mining / statistics class. Any help would be appreciated and thanks so much in advance!


            I tried this:



            Answered 2021-Dec-02 at 13:40

            Since you already know how to do this for one dataframe with a single country, you are very close to your solution. But to make this easy on yourself, you need to do a few things.

            1. Create a reproducible example using dput. The janitor library has the clean_names() function to fix columns names.

            2. Write your own interpolation function that takes a dataframe with one country as the input, and returns an interpolated dataframe for one country.

            3. Pivot_longer to get all the data columns into a one parameterized column.

            4. Use the dplyr function group_split to take your large multicountry dataframe, and break it into a list of dataframes, one for each country and parameter.

            5. Use the purrr function map to map each of the dataframes in the list to a new list of interpolate dataframes.

            6. Use dplyr's bind_rows to convert the list interpolated dataframes back into one dataframe, and pivot_wider to get your original data shape back.



            snakemake: Ambiguous rule not detected?
            Asked 2021-Aug-26 at 05:45

            The following Snakefile fails with AmbiguousRuleException:



            Answered 2021-Aug-26 at 05:45

            Snakemake performs some checks for cycles and jobs with the same input and output file(s) are removed from consideration during DAG creation. In your working case, the job from the merge_bam rule has the same input/output file (S1.bam) so it is not considered in the DAG and their is no ambiguity when satisfying the input of the all rule.


            Snakemake starts with the final target file (in this case S1.bam) and works backward to find parameterized rules (jobs) that can be executed to create the target file from existing input files. To do this, it recursively calls snakemake/ and snakemake/ to construct the DAG from the initial target file(s). DAG.update() has the following check to remove jobs from consideration if they produce the same output file that they require for input:



            How to fix this error: variable NOT found as character variable in synth package?
            Asked 2021-Aug-18 at 06:32

            I am using Synth() package (see in R.

            This is a part of my data frame:



            Answered 2021-Aug-18 at 06:32

            I cannot tell you what's going on behind the scenes, but I think that Synth wants a few things:

            First, turn factor variables into characters;



            Can I put multiple download links in same table cell in shiny?
            Asked 2021-Aug-09 at 19:14

            I'm using shiny and I'm having trouble inserting multiple links in the same table cell. Every link should allow the user to download local files found on the computer. Here is an image of what I mean: table with links

            For columns 2, 3, and 4, whose rows include at most only 1 link, it works perfectly; when I click on the hyperlinks I am able to download the corresponding file from my pc. However, for column 5, which includes multiple hyperlinks in each cell, I am unable to do so. Clicking on the links returns nothing; no file is downloaded (but I don't get an error).

            This is the code I'm using for column 5:



            Answered 2021-Aug-09 at 19:14

            I think this should do it by replacing the for loop with lapply in the renderUI part and generating the downloadLink as a tagList:



            GNU parallel --block and -L clarification
            Asked 2021-Aug-09 at 12:28

            Given that I have a file of N size. For the sake of example 30GB file.

            Facts about the file content is that it has proprotional amount of lines. This is interleaved FastQ file. (not important for the question but usefull for someone)

            File content is paired or interleaved DNA sequence of strings. Each pair is 8 lines long.

            I want to process the interleaved FastQ with GNU parallel in order to speed up the process. Reason for using parallel instead of native bwa tool threads feature is that parallel helps to reduce amount of RAM needed because the nature of bwa memory allocation.

            Given that interleaved file is 30GB of size I want to process chunks of --block 500M, command line params looks like --pipe --block 500M -L 8 -j 10 this then is sent as stdin to bwa and will run 10 bwa tasks each getting 500M chunks with a record of 8 lines.

            Is my assumption correct that --block 500M and -L 8 will be managed by parallel and I can be certain that my bwa tool will always get 8 lines times N MB of data?

            What I am not clear is, will parallel "repeat" last "chunk" if 8 lines are not present? And will it apropriatelly controll other chunk inputs for N processes I start with parallel?

            Or this --block 500M "blindly" sends 500M chunk to single process regardless if last part of the 500M chunk does not contain 8 lines so to speak?


            After whole day reading questions and answers on biostars and seqanswers I've realised that my testing/"benchmarking" was wrong.

            But this helped to realise that I need to update the question and will make separate question.

            I was testing inside Docker container which by default has very low /dev/shm thus I have mislead my self to go totaly different path.



            Answered 2021-Aug-06 at 06:34

            Yes, you can be certain.

            The --block parameter is described here:

            The -L parameter here:

            Quick summary: Parallel will always send full lines to each process until the block/buffer capacity is filled. If you specify a that one record requires several lines (8 in your case), it will fill the buffer capacity in chunks of 8 lines each.

            The last block can be smaller than 8 lines, if there are fewer remaining.

            Side note: In the case of properly formatted and interleaved fastq files, there will always be 8 lines. fastq format specifies that each record is 4 lines and paired-end fastq files must contain the same number of records.



            How do I calculate the right mean?
            Asked 2021-May-15 at 15:19

            I have a dataset that shows bilateral exports for several countries. Because the data fluctuates, I need to calculate the mean of year groups. All the countries do not cover exactly the years. Some start later, some have gaps in between - this means, some years are missing (but without having NA entries). I already managed to cut the data into pieces whith the help of an amazing community member: year_group.

            Below I am listing two further problems along with my code, the wrong output and on the bottom some sample input data for the dataset total_trade

            Problem 1

            I am facing the issue, that the code does not calculate the right means. When I calculate the results manually, I get different results than my code. (see below)

            This is my code



            Answered 2021-May-15 at 15:19

            The issue with mean is duplicated rows for any ReporterName in the data.



            Community Discussions, Code Snippets contain sources that include Stack Exchange Network


            No vulnerabilities reported

            Install bwa

            You can download it from GitHub.


            BWA works with a variety types of DNA sequence data, though the optimal algorithm and setting may vary. The following list gives the recommended settings:. BWA-MEM is recommended for query sequences longer than ~70bp for a variety of error rates (or sequence divergence). Generally, BWA-MEM is more tolerant with errors given longer query sequences as the chance of missing all seeds is small. As is shown above, with non-default settings, BWA-MEM works with Oxford Nanopore reads with a sequencing error rate over 20%. BWA-SW and BWA-MEM perform local alignments. If there is a translocation, a gene fusion or a long deletion, a read bridging the break point may have two hits, occupying two lines in the SAM output. With the default setting of BWA-MEM, one and only one line is primary and is soft clipped; other lines are tagged with 0x800 SAM flag (supplementary alignment) and are hard clipped. Yes. Since 0.6.x, all BWA algorithms work with a genome with total length over 4GB. However, individual chromosome should not be longer than 2GB. This is correct. Mapping quality is assigned for individual read, not for a read pair. It is possible that one read can be mapped unambiguously, but its mate falls in a tandem repeat and thus its accurate position cannot be determined. Internally BWA concatenates all reference sequences into one long sequence. A read may be mapped to the junction of two adjacent reference sequences. In this case, BWA-backtrack will flag the read as unmapped (0x4), but you will see position, CIGAR and all the tags. A similar issue may occur to BWA-SW alignment as well. BWA-MEM does not have this problem. Yes, since 0.7.11, BWA-MEM officially supports mapping to GRCh38+ALT. BWA-backtrack and BWA-SW don't properly support ALT mapping as of now. Please see for details. Briefly, it is recommended to use bwakit, the binary release of BWA, for generating the reference genome and for mapping. If you are not interested in hits to ALT contigs, it is okay to run BWA-MEM without post-processing. The alignments produced this way are very close to alignments against GRCh38 without ALT contigs. Nonetheless, applying post-processing helps to reduce false mappings caused by reads from the diverged part of ALT contigs and also enables HLA typing. It is recommended to run the post-processing script.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries

            Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link