gatk | Official code repository for GATK versions | Genomics library

by broadinstitute Java Version: 4.4.0.0 License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | gatk Summary

gatk is a Java library typically used in Artificial Intelligence, Genomics applications. gatk has no bugs, it has no vulnerabilities, it has build file available and it has medium support. However gatk has a Non-SPDX License. You can download it from GitHub, Maven.

This repository contains the next generation of the Genome Analysis Toolkit (GATK). The contents of this repository are 100% open source and released under the Apache 2.0 license (see LICENSE.TXT). GATK4 aims to bring together well-established tools from the GATK and Picard codebases under a streamlined framework, and to enable selected tools to be run in a massively parallel way on local clusters or in the cloud using Apache Spark. It also contains many newly developed tools not present in earlier releases of the toolkit.

Support

Quality

Security

License

Reuse

Support

gatk has a medium active ecosystem.

It has 1450 star(s) with 544 fork(s). There are 168 watchers for this library.

It had no major release in the last 12 months.

There are 1124 open issues and 3285 have been closed. On average issues are closed in 275 days. There are 127 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of gatk is 4.4.0.0

Quality

gatk has 0 bugs and 0 code smells.

Security

gatk has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

gatk code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

gatk has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

gatk releases are available to install and integrate.

Deployable package is available in Maven.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed gatk and discovered the below as its top functions. This is intended to give you an instant insight into gatk implemented functionality, and help decide if they suit your requirements.

Call a region .
Creates a Sequence comparator for the given variant .
Preprocess the input panel
Find hmm in local format .
Run the DEST function .
Call Hatch results .
Simple merge method .
Find the best haplotypes .
Calculate Geneotypes .
Validates the plugin arguments .

Get all kandi verified functions for this library.

gatk Key Features

No Key Features are available at this moment for gatk.

gatk Examples and Code Snippets

No Code Snippets are available at this moment for gatk.

Community Discussions

Trending Discussions on gatk

R: How do I add columns from one dataframe to another?

Nextflow: Not all items in channel used by process

Bash: Identifying file based on part of filename

GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

Snakemake input rule defintion via lambda + Pandas dataframe

Snakemake integrate the multiple command lines in a rule

Snakemake first genotype of a vcf file as wildcard in output

gatk VariantRecalibrator positional argument error

snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

Add flag -I before multiple input in snakemake

QUESTION

R: How do I add columns from one dataframe to another?

Asked 2022-Mar-20 at 15:32

The 1st column of the subjects_153_ped dataframe corresponds to the Database_ID column of the ann dataframe. The 5th column of the subjects_153_ped dataframe corresponds to the sex column of the ann dataframe. The 6th column of the subjects_153_ped dataframecorresponds to theProfilecolumn of theann` dataframe.

Here, I pheno is the subset where: (1) 1st column (FID): Database_ID column (from ann clinical table) (2) 2nd column (IID): 1 (hardcoded) (3) 3rd column (PAT): 1 (hardcoded) (4) 4th column (MAT): 1 (hardcoded) (5) 5th column (SEX): sex column (from ann clinical table) (6) 6th column (PHENOTYPE): Profile column (from ann clinical table) (7) Column 7 onwards are info from the original subjects_153_ped dataframe

Desired file formatting: PED file format https://plink.readthedocs.io/en/latest/plink_fmt/ https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format

Expected output

FID IID PAT MAT SEX PHENOTYPE ALL OTHER COLUMNS AC10 1 1 1 M Schiz. ALL OTHER COLUMNS AC11 1 1 1 M Schiz. ALL OTHER COLUMNS AC12 1 1 1 M Schiz. ALL OTHER COLUMNS AC13 1 1 1 F BP ALL OTHER COLUMNS ...

ANSWER

Answered 2022-Mar-20 at 15:32

Perhaps this helps - select and create new columns in ann with transmute, and left_join with the 'subjects_153_ped' using the 'FID' and 'V1' as by

Source https://stackoverflow.com/questions/71547962

QUESTION

Nextflow: Not all items in channel used by process

Asked 2022-Mar-05 at 15:20

I've been struggling to identify why a nextflow (v20.10.00) process is not using all the items in a channel. I want the process to run for each sample bam file (10 in total) and for each chromosome (3 in total).

Here is the creation of the channels and the process:

...

ANSWER

Answered 2022-Mar-05 at 15:20

Issues like this almost always involve the use of multiple input channels:

When two or more channels are declared as process inputs, the process stops until there’s a complete input configuration ie. it receives an input value from all the channels declared as input.

Your initial assessment was correct. However, the reason only three processes were run (i.e. one sample for each of the three chromosomes), is because this line (probably) returned a list (i.e. a java LinkedList) containing a single element, and lists behave like queue channels:

Source https://stackoverflow.com/questions/71352719

QUESTION

Bash: Identifying file based on part of filename

Asked 2022-Mar-03 at 19:22

I have a folder containing paired files with names that look like this:

...

ANSWER

Answered 2022-Mar-03 at 17:38

I'm trying to fully understand what you want to do here.

If you want to extract just the first two parts, this should do:

Source https://stackoverflow.com/questions/71340337

QUESTION

GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

Asked 2022-Feb-15 at 17:02

I can't seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I'm writing. For whatever reason, I cannot get GATK to see there is more than one thread. I've tried different node types, increasing and decreasing the number of cpus available, providing java arguments such as -XX:ActiveProcessorCount=16, using taskset, but it always just detects 1.

Here is the command from the .command.sh:

...

ANSWER

Answered 2022-Feb-15 at 17:02

In case anyone else has the same problem, it turned out I had to configure the submission as an MPI job.

So on the HPC I use, here is the nextflow process:

Source https://stackoverflow.com/questions/71053941

QUESTION

Snakemake input rule defintion via lambda + Pandas dataframe

Asked 2022-Feb-15 at 11:15

Sorry if this is gonna be probably a duplication of other questions, but I couldn't figure to debug what's going on in my case. Got a dataframe like this:

...

ANSWER

Answered 2022-Feb-15 at 11:15

The parameters are stored in a dataframe, and there is a handy utility for working with tabulated parameters, Paramspace. Below is a rough take on your specific case, but it will need some adjustments for command syntax and paths.

First step is to reshape the data for easier workflow:

Source https://stackoverflow.com/questions/71116406

QUESTION

Snakemake integrate the multiple command lines in a rule

Asked 2022-Feb-07 at 04:30

The output of my first command line "bcftools query -l {input.invcf} | head -n 1" prints the name of the first individual of vcf file (i.e. IND1). I want to use that output in selectvariants GATK in -sn IND1 option. How is it possible to integrate the 1st comamnd line in snakemake in order to use it's output in the next one?

...

ANSWER

Answered 2022-Feb-04 at 12:52

I think I found a solution:

Source https://stackoverflow.com/questions/70985443

QUESTION

Snakemake first genotype of a vcf file as wildcard in output

Asked 2022-Feb-03 at 13:32

In the second rule I would like to select from the vcf file containing bob, clara and tim, only the first genotype of dictionary (i.e. bob) in roder to get as output in the second rule bob.dn.vcf. Is this possible in snakemake?

...

ANSWER

Answered 2022-Feb-03 at 13:32

There are at least two options:

explicitly specify output:

Source https://stackoverflow.com/questions/70970716

QUESTION

gatk VariantRecalibrator positional argument error

Asked 2022-Jan-06 at 17:12

I'm trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error "Illegal argument value: Positional arguments were provided". But I don't know what this means, or how to correct it!

Here's my call:

...

ANSWER

Answered 2022-Jan-06 at 17:12

The error message in this case is confusing. The --resource arguments aren't formatted correctly and extra whitespace is causing it to interpret the following arguments as positional arguments.

The problem is that --resource blocks should have the meta information attached to the argument name instead of separated with a space.

i.e.

--resource:hapmap filename instead of --resource hapmap: filename

The GATK Forum is a good place to get answers to questions like this.

Source https://stackoverflow.com/questions/70310122

QUESTION

snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

Asked 2021-Sep-16 at 17:18

I have written a rule for CombineGVCFs in gatk4. The rule is as follow

...

ANSWER

Answered 2021-Sep-16 at 17:18

Found out the problem. Turns out I can write a lambda function as follows

Source https://stackoverflow.com/questions/69209459

QUESTION

Add flag -I before multiple input in snakemake

Asked 2021-Jun-02 at 08:13

I am using snakemake to describe the GATK pipeline. I need to run the following command:

...

ANSWER

Answered 2021-Jun-02 at 08:13

You could do this:

Source https://stackoverflow.com/questions/67799734

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install gatk

You can download it from GitHub, Maven.
You can use gatk like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the gatk component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: