htslib | C library for high-throughput sequencing data formats | Genomics library

by samtools C Version: 1.17 License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | htslib Summary

htslib is a C library typically used in Artificial Intelligence, Genomics applications. htslib has no bugs and it has low support. However htslib has 1 vulnerabilities and it has a Non-SPDX License. You can download it from GitHub.

See INSTALL for complete details. Release tarballs contain generated files that have not been committed to this repository, so building the code from a Git repository requires extra steps:.

Support

Quality

Security

License

Reuse

Support

htslib has a low active ecosystem.

It has 696 star(s) with 433 fork(s). There are 70 watchers for this library.

It had no major release in the last 12 months.

There are 138 open issues and 508 have been closed. On average issues are closed in 21 days. There are 28 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of htslib is 1.17

Quality

htslib has 0 bugs and 0 code smells.

Security

htslib has 1 vulnerability issues reported (1 critical, 0 high, 0 medium, 0 low).

htslib code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

htslib has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

htslib releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of htslib

Get all kandi verified functions for this library.

htslib Key Features

No Key Features are available at this moment for htslib.

htslib Examples and Code Snippets

No Code Snippets are available at this moment for htslib.

Community Discussions

Trending Discussions on htslib

My merge overlapping coordinate python script cost too much memory

Does Apache Arrow support separately-compressed chunks?

Using awk to replace and add text

How to read lines in R starting with certain string

Iterating and dereferencing unaligned pointer to memory causing segmentation fault ? Bug in GCC optimizer?

Unable to install htslib v1.12 with conda

How to extract genotype information for each sample as a string from a VCF file using htslib?

Docker container's sshfs mount freezes, but only when mounted by Python

How to use Snakemake container for htslib (bgzip + tabix)

Target specific variable definition not working when concatenating variables

QUESTION

My merge overlapping coordinate python script cost too much memory

Asked 2022-Mar-15 at 01:12

I'm trying to choose unique regions for my Non-invasive Prenatal Testing (NIPT) project. I have done following steps:

Create an artificial fasta file contains 50bp sequence. On each chromosome, the next sequence overlap 40bp from previous sequence
Align and only chosen no mismatch sequence

I have a .sam file about 40gb, on the next step I try to merge all overlapping coordinates to one .bed file for using in samtools. This is my python script to do that:

...

ANSWER

Answered 2022-Mar-14 at 04:13

Hopefully this does the trick. Hard to know since I'm working without having the input file or knowing what the output should look like and can't really run the code to check for errors, but here's what I've tried to do:

Eliminate raw and go straight to creating treat1; and
Eliminate treat2 and go straight to writing to bedfile. I opened bedfile right off the bat and everything in your program is done with that file open.

If this does what you need but still crashes, then perhaps you can read through each file twice to get last_coord at the end of the treat1 and then read through it again to "recreate" each line of treat1 individually and apply it to defining what needs to be written into the file.

Without really knowing the details of what you're doing (I do not work anywhere close to a field that would apply samtools).

Source https://stackoverflow.com/questions/71462634

QUESTION

Does Apache Arrow support separately-compressed chunks?

Asked 2022-Feb-15 at 15:09

In bioinformatics we have the bgzip file, which is block-compressed, meaning that you can compress a file (let's say a CSV), and then if you want to access some data in the middle of that file, you can decompress only the middle chunk, rather than the entire file.

As is explained here, Arrow (and therefore Feather v2, the file format) seems to support chunked reads and writes, and also compression. However it isn't clear if the compression applies to the entire file, or if individual chunks can be decompressed. This is my questions: can we separately compress chunks of an Arrow/Feather v2 and then later decompress a single chunk without decompressing everything?

...

ANSWER

Answered 2022-Feb-15 at 15:09

The compression is applied to individual buffers in each RecordBatch, i.e. yes, you still get random access to each of the record batches in the file. I see this is not documented in the user docs but it is present in the format where compression is specified for each RecordBatch.

Source https://stackoverflow.com/questions/71128550

QUESTION

Using awk to replace and add text

Asked 2021-Sep-29 at 20:35

I have the following .txt file:

...

ANSWER

Answered 2021-Sep-29 at 13:30

I would GNU AWK following way, let file.txt content be

Source https://stackoverflow.com/questions/69376355

QUESTION

How to read lines in R starting with certain string

Asked 2021-Jun-30 at 15:59

Suppose I have a text file such as below. My question is how to read the lines starting with SN in R. The whole file has over 10k rows but I just wanted the lines starting with SN.

...

ANSWER

Answered 2021-Jun-30 at 15:38

Using readLines() and then grep the SN's. Reads in the whole thing first, though.

Source https://stackoverflow.com/questions/68197596

QUESTION

Iterating and dereferencing unaligned pointer to memory causing segmentation fault ? Bug in GCC optimizer?

Asked 2021-May-27 at 11:24

// Compiled with GCC 7.3.0, x86-64 -g -O3 -std=gnu++11
// Tested on OS (1) Ubuntu 18.04 LTS, (2) Gentoo

int Listen(... socket)
{
    char buffer[INT16_MAX];
    . . .
    . . . = recvfrom(socket, buffer, ....)
    ParseMsg(buffer)
}

void ParseMsg(uint8_t *const msg)
{
    . . .
    uint16_t* word_arr = (uint16_t*)(msg+15); // if I changed 15 to 16 (aligned to uint16_t) 
                                              // the program doesn't crash

    for(size_t i = 0 ; i < 30 ; ++i)
    {
        word_arr[i] = 1; // after some iterations (around 13) the program crashes with segmentation fault
                         // if i add a print statement inside the loop the program doesn't crash
    }

    // word_arr[20] = 1; // if I assign value not in loop the program doesn't crash 
}

...

ANSWER

Answered 2021-May-27 at 11:24

The problem does underline in the alignment of the uint16_t pointer (which should be 2), in combination with -O3 the compiler is trying to optimize your loop using vectorization (SIMD), hence trying to load unaligned argument to SSE registers.

This can explain why it works when:

you change the offset from 15 to 16 - causes the uint16_t pointer to be aligned
with -O2 - disable the vectorization
removing the loop / adding a print within the loop - also disable the vectorization

Please refer to the following link with a question similar to yours and some great elaborate answers:

c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment

Source https://stackoverflow.com/questions/67684698

QUESTION

Unable to install htslib v1.12 with conda

Asked 2021-Apr-25 at 10:05

I could not install htslib v1.12 with conda using either commands:

...

ANSWER

Answered 2021-Apr-25 at 10:05

The directives OP reference are from Anaconda Cloud, which are generic and miss the nuances that using specialized channels often entails. Specifically, Bioconda expects the following channel prioritization:

Source https://stackoverflow.com/questions/67239320

QUESTION

How to extract genotype information for each sample as a string from a VCF file using htslib?

Asked 2021-Feb-09 at 04:10

I am using htslib for extracting all the information contained in a VCF file in C++.

Currently, thanks to the VCF specification and the documentation in the file vcf.h, I have successfully extracted all the metadata information in the header (Meta-Information Lines), and most of the information contained in each row of the body of the file (Data Lines).

However, I don't know how to extract the genotype information (sample columns).

I am using example files from the 1000G project. This is an example of two rows of the file, it shows the Format field and two samples (The file has more than 1000 samples per each row, I would like to extract the data for all of them):

...

ANSWER

Answered 2021-Feb-09 at 04:10

I finally figured it out. There are some functions for doing this, depending on the type specified in the header for the format ID: the functions are inside of the vcf.h file in htslib:

Source https://stackoverflow.com/questions/66083109

QUESTION

Docker container's sshfs mount freezes, but only when mounted by Python

Asked 2020-Dec-17 at 06:35

I have a development laptop (Mint 19.3), and a test server (Ubuntu 18.04.4 LTS).

The laptop is Docker version 19.03.5, build 633a0ea838, the server is Docker version 19.03.12, build 48a66213fe

I'm running Python 3.6 code inside the container, which uses subprocess (code below) to create an sshfs mount to a third server, after which the python code walks through the mounted directory.

Everything works fine on my development laptop. But on the server, the directory mounts (and is seen with the mount command) however cd'ing into the directory just hangs, and the Python code's subsequent walk just hangs. (NOTE: The python code never crashes or errors out. It just hangs forever.)

HOWEVER, if I manually use the same sshfs command at the container's command line, the directory works fine.

I'm at a loss as to how to troubleshoot this.

===2020-09-25 UPDATE===

OK. Since the Python code uses subprocess, the sshfs mount is obviously available to any terminal windows that wants to use it.

I have tried accessing the mount from a new terminal window inside the container, but when I cd to the mount - the window just freezes.

Well, I left everything sitting overnight - and now when I try to cd into the mount ... it works. It's like the mount has to sit for hours before it will work.

Any ideas?

Python code

...

ANSWER

Answered 2020-Dec-13 at 10:51

I am assuming you want to mount some server's directory to container's filesystem using SSHFS. You could add that instruction to the Dockerfile:

Source https://stackoverflow.com/questions/64049240

QUESTION

How to use Snakemake container for htslib (bgzip + tabix)

Asked 2020-Sep-24 at 21:01

I have a pipeline which uses a global singularity image and rule-based conda wrappers.

However, some of the tools don't have wrappers (i.e. htslib's bgzip and tabix).

Now I need to learn how to run jobs in containers.

In the official documentation link it says:

"Allowed image urls entail everything supported by singularity (e.g., shub:// and docker://)."

Now I've tried the following image from singularity hub but I get an error:

minimal reproducible example: config.yaml ...

ANSWER

Answered 2020-Sep-24 at 17:38

Using another container solves the issue; however, the fact I'm getting errors from biocontainers is troubling given that these are both very common and used as examples in the literature so I will award the top-answer to whomever can solve that specific issue.

As it were, the use of stackleader/bgzip-utility solve the issue of actually running this rule in a container.

Source https://stackoverflow.com/questions/64050974

QUESTION

Target specific variable definition not working when concatenating variables

Asked 2020-Sep-15 at 02:18

I want my Makefile to behave differently depending on whether I'm compiling on my own machine or inside a docker container. This is the Makefile:

...

ANSWER

Answered 2020-Sep-15 at 02:18

Make has two flavors of variable: simply expanded variables, and recursively expanded variables.

Here:

Source https://stackoverflow.com/questions/63893724

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install htslib

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: