htslib | C library for high-throughput sequencing data formats | Genomics library

 by   samtools C Version: 1.17 License: Non-SPDX

kandi X-RAY | htslib Summary

kandi X-RAY | htslib Summary

htslib is a C library typically used in Artificial Intelligence, Genomics applications. htslib has no bugs and it has low support. However htslib has 1 vulnerabilities and it has a Non-SPDX License. You can download it from GitHub.

See INSTALL for complete details. Release tarballs contain generated files that have not been committed to this repository, so building the code from a Git repository requires extra steps:.

            kandi-support Support

              htslib has a low active ecosystem.
              It has 696 star(s) with 433 fork(s). There are 70 watchers for this library.
              There were 1 major release(s) in the last 12 months.
              There are 138 open issues and 508 have been closed. On average issues are closed in 21 days. There are 28 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of htslib is 1.17

            kandi-Quality Quality

              htslib has 0 bugs and 0 code smells.

            kandi-Security Security

              htslib has 1 vulnerability issues reported (1 critical, 0 high, 0 medium, 0 low).
              htslib code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              htslib has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              htslib releases are available to install and integrate.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of htslib
            Get all kandi verified functions for this library.

            htslib Key Features

            No Key Features are available at this moment for htslib.

            htslib Examples and Code Snippets

            No Code Snippets are available at this moment for htslib.

            Community Discussions


            My merge overlapping coordinate python script cost too much memory
            Asked 2022-Mar-15 at 01:12

            I'm trying to choose unique regions for my Non-invasive Prenatal Testing (NIPT) project. I have done following steps:

            • Create an artificial fasta file contains 50bp sequence. On each chromosome, the next sequence overlap 40bp from previous sequence
            • Align and only chosen no mismatch sequence

            I have a .sam file about 40gb, on the next step I try to merge all overlapping coordinates to one .bed file for using in samtools. This is my python script to do that:



            Answered 2022-Mar-14 at 04:13

            Hopefully this does the trick. Hard to know since I'm working without having the input file or knowing what the output should look like and can't really run the code to check for errors, but here's what I've tried to do:

            1. Eliminate raw and go straight to creating treat1; and
            2. Eliminate treat2 and go straight to writing to bedfile. I opened bedfile right off the bat and everything in your program is done with that file open.

            If this does what you need but still crashes, then perhaps you can read through each file twice to get last_coord at the end of the treat1 and then read through it again to "recreate" each line of treat1 individually and apply it to defining what needs to be written into the file.

            Without really knowing the details of what you're doing (I do not work anywhere close to a field that would apply samtools).



            Does Apache Arrow support separately-compressed chunks?
            Asked 2022-Feb-15 at 15:09

            In bioinformatics we have the bgzip file, which is block-compressed, meaning that you can compress a file (let's say a CSV), and then if you want to access some data in the middle of that file, you can decompress only the middle chunk, rather than the entire file.

            As is explained here, Arrow (and therefore Feather v2, the file format) seems to support chunked reads and writes, and also compression. However it isn't clear if the compression applies to the entire file, or if individual chunks can be decompressed. This is my questions: can we separately compress chunks of an Arrow/Feather v2 and then later decompress a single chunk without decompressing everything?



            Answered 2022-Feb-15 at 15:09

            The compression is applied to individual buffers in each RecordBatch, i.e. yes, you still get random access to each of the record batches in the file. I see this is not documented in the user docs but it is present in the format where compression is specified for each RecordBatch.



            Using awk to replace and add text
            Asked 2021-Sep-29 at 20:35

            I have the following .txt file:



            Answered 2021-Sep-29 at 13:30

            I would GNU AWK following way, let file.txt content be



            How to read lines in R starting with certain string
            Asked 2021-Jun-30 at 15:59

            Suppose I have a text file such as below. My question is how to read the lines starting with SN in R. The whole file has over 10k rows but I just wanted the lines starting with SN.



            Answered 2021-Jun-30 at 15:38

            Using readLines() and then grep the SN's. Reads in the whole thing first, though.



            Iterating and dereferencing unaligned pointer to memory causing segmentation fault ? Bug in GCC optimizer?
            Asked 2021-May-27 at 11:24
            // Compiled with GCC 7.3.0, x86-64 -g -O3 -std=gnu++11
            // Tested on OS (1) Ubuntu 18.04 LTS, (2) Gentoo
            int Listen(... socket)
                char buffer[INT16_MAX];
                . . .
                . . . = recvfrom(socket, buffer, ....)
            void ParseMsg(uint8_t *const msg)
                . . .
                uint16_t* word_arr = (uint16_t*)(msg+15); // if I changed 15 to 16 (aligned to uint16_t) 
                                                          // the program doesn't crash
                for(size_t i = 0 ; i < 30 ; ++i)
                    word_arr[i] = 1; // after some iterations (around 13) the program crashes with segmentation fault
                                     // if i add a print statement inside the loop the program doesn't crash
                // word_arr[20] = 1; // if I assign value not in loop the program doesn't crash 


            Answered 2021-May-27 at 11:24

            The problem does underline in the alignment of the uint16_t pointer (which should be 2), in combination with -O3 the compiler is trying to optimize your loop using vectorization (SIMD), hence trying to load unaligned argument to SSE registers.

            This can explain why it works when:

            1. you change the offset from 15 to 16 - causes the uint16_t pointer to be aligned
            2. with -O2 - disable the vectorization
            3. removing the loop / adding a print within the loop - also disable the vectorization

            Please refer to the following link with a question similar to yours and some great elaborate answers:




            Unable to install htslib v1.12 with conda
            Asked 2021-Apr-25 at 10:05

            I could not install htslib v1.12 with conda using either commands:



            Answered 2021-Apr-25 at 10:05

            The directives OP reference are from Anaconda Cloud, which are generic and miss the nuances that using specialized channels often entails. Specifically, Bioconda expects the following channel prioritization:



            How to extract genotype information for each sample as a string from a VCF file using htslib?
            Asked 2021-Feb-09 at 04:10

            I am using htslib for extracting all the information contained in a VCF file in C++.

            Currently, thanks to the VCF specification and the documentation in the file vcf.h, I have successfully extracted all the metadata information in the header (Meta-Information Lines), and most of the information contained in each row of the body of the file (Data Lines).

            However, I don't know how to extract the genotype information (sample columns).

            I am using example files from the 1000G project. This is an example of two rows of the file, it shows the Format field and two samples (The file has more than 1000 samples per each row, I would like to extract the data for all of them):



            Answered 2021-Feb-09 at 04:10

            I finally figured it out. There are some functions for doing this, depending on the type specified in the header for the format ID: the functions are inside of the vcf.h file in htslib:



            Docker container's sshfs mount freezes, but only when mounted by Python
            Asked 2020-Dec-17 at 06:35

            I have a development laptop (Mint 19.3), and a test server (Ubuntu 18.04.4 LTS).

            The laptop is Docker version 19.03.5, build 633a0ea838, the server is Docker version 19.03.12, build 48a66213fe

            I'm running Python 3.6 code inside the container, which uses subprocess (code below) to create an sshfs mount to a third server, after which the python code walks through the mounted directory.

            Everything works fine on my development laptop. But on the server, the directory mounts (and is seen with the mount command) however cd'ing into the directory just hangs, and the Python code's subsequent walk just hangs. (NOTE: The python code never crashes or errors out. It just hangs forever.)

            HOWEVER, if I manually use the same sshfs command at the container's command line, the directory works fine.

            I'm at a loss as to how to troubleshoot this.

            ===2020-09-25 UPDATE===

            OK. Since the Python code uses subprocess, the sshfs mount is obviously available to any terminal windows that wants to use it.

            I have tried accessing the mount from a new terminal window inside the container, but when I cd to the mount - the window just freezes.

            Well, I left everything sitting overnight - and now when I try to cd into the mount ... it works. It's like the mount has to sit for hours before it will work.

            Any ideas?

            Python code



            Answered 2020-Dec-13 at 10:51

            I am assuming you want to mount some server's directory to container's filesystem using SSHFS. You could add that instruction to the Dockerfile:



            How to use Snakemake container for htslib (bgzip + tabix)
            Asked 2020-Sep-24 at 21:01

            I have a pipeline which uses a global singularity image and rule-based conda wrappers.

            However, some of the tools don't have wrappers (i.e. htslib's bgzip and tabix).

            Now I need to learn how to run jobs in containers.

            In the official documentation link it says:

            "Allowed image urls entail everything supported by singularity (e.g., shub:// and docker://)."

            Now I've tried the following image from singularity hub but I get an error:

            minimal reproducible example: config.yaml ...


            Answered 2020-Sep-24 at 17:38

            Using another container solves the issue; however, the fact I'm getting errors from biocontainers is troubling given that these are both very common and used as examples in the literature so I will award the top-answer to whomever can solve that specific issue.

            As it were, the use of stackleader/bgzip-utility solve the issue of actually running this rule in a container.



            Target specific variable definition not working when concatenating variables
            Asked 2020-Sep-15 at 02:18

            I want my Makefile to behave differently depending on whether I'm compiling on my own machine or inside a docker container. This is the Makefile:



            Answered 2020-Sep-15 at 02:18

            Make has two flavors of variable: simply expanded variables, and recursively expanded variables.



            Community Discussions, Code Snippets contain sources that include Stack Exchange Network


            No vulnerabilities reported

            Install htslib

            You can download it from GitHub.


            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
          • HTTPS


          • CLI

            gh repo clone samtools/htslib

          • sshUrl


          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link