hgvs | HGVS variant name parsing and generation | Genomics library
kandi X-RAY | hgvs Summary
kandi X-RAY | hgvs Summary
In most next-generation sequencing applications, variants are first discovered and described in terms of their genomic coordinates such as chromosome 7, position 117,199,563 with reference allele G and alternative allele T. According to the HGVS standard, we can describe this variant as NC_000007.13:g.117199563G>T. The first part of the name is a RefSeq ID NC_000007.13 for chromosome 7 version 13. The g. denotes that this is a variant described in genomic (i.e. chromosomal) coordinates. Lastly, the chromosomal position, reference allele, and alternative allele are indicated. For simple single nucleotide changes the > character is used. More commonly, a variant will be described using a cDNA or protein style HGVS name. In the example above, the variant in cDNA style is named NM_000492.3:c.1438G>T. Here again, the first part of the name refers to a RefSeq sequence, this time mRNA transcript NM_000492 version 3. Optionally, the gene name can also be given as NM_000492.3(CFTR). The c. indicates that this is a cDNA name, and the coordinate indicates that this mutation occurs at position 1438 along the coding portion of the spliced transcript (i.e. position 1 is the first base of ATG translation start codon). Briefly, the protein style of the variant name is NP_000483.3:p.Gly480Cys which indicates the change in amino-acid coordinates (480) along an amino-acid sequence (NP_000483.3) and gives the reference and alternative amino-acid alleles (Gly and Cys, respectively). The standard also specifies custom name formats for many mutation categories such as insertions (NM_000492.3:c.1438_1439insA), deletions (NM_000492.3:c.1438_1440delGGT), duplications (NM_000492.3:c.1438_1440dupGGT), and several other more complex genomic rearrangements. While many of these names appear to be simple to parse or generate, there are many corner cases, especially with cDNA HGVS names. For example, variants before the start codon should have negative cDNA coordinates (NM_000492.3:c.-4G>C), and variants after the stop codon also have their own format (NM_000492.3:c.*33C>T). Variants within introns are indicated by the closest exonic base with an additional genomic offset such as NM_000492.3:4243-20A>G (the variant is 20 bases in the 5' direction of the cDNA coordinate 4243). Lastly, all coordinates and alleles are specified on the strand of the transcript. This library properly handles all logic necessary to convert genomic coordinates to and from HGVS cDNA coordinates. Another important consideration of any library that handles HGVS names is variant normalization. The HGVS standard aims to provide "uniform and unequivocal" description of variants. Namely, two people discovering a variant should be able to arrive at the same name for it. Such a property is very useful for checking whether a variant has been seen before and connecting all known relevant information. For SNPs, this property is fairly easy to achieve. However, for insertions and deletions (indels) near repetitive regions, many indels are equivalent (e.g. it doesn’t matter which AT in a run of ATATATAT was deleted). The VCF file format has chosen to uniquely specify such indels by using the most left-aligned genomic coordinate. Therefore, compliant variant callers that output VCF will have applied this normalization. The HGVS standard also specifies a normalization for such indels. However, it states that indels should use the most 3' position in a transcript. For genes on the positive strand, this is the opposite direction specified by VCF. This library properly implements both kinds of variant normalization and allows easy conversion between HGVS and VCF style variants. It also handles many other cases of normalization (e.g. the HGVS standard recommends indicating an insertion with the dup notation instead of ins if it can be represented as a tandem duplication).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Align the sequence to the right strand
- Replace indel
- Performs a justification of an indel
- Get a sequence from a chromosome
- Pad sequences with 1 - prime bases
- Parse name
- Parse cdna
- Parse an allele
- Validate the coordinates
- Read transcripts from a refgene file
- Parse a refgene file
- Create a transcript from a transcript
- Flip the strand
- Create a sequence from a position
- Get the allele for a given transcript
- Convert CDN to genomic coordinate
- Return the coordinates of this HGVS name
- Find the codon in the given list of exons
- Returns True iff the reference sequence matches the reference sequence
- Get all exons in a transcript
- Return a BED6Interval object
hgvs Key Features
hgvs Examples and Code Snippets
Community Discussions
Trending Discussions on hgvs
QUESTION
I've got the following data frame:
...ANSWER
Answered 2022-Mar-03 at 10:21Here is a potential solution:
QUESTION
For example, one column of the table I have is like this
...ANSWER
Answered 2022-Feb-13 at 21:05Here is a base R way with substr
.
QUESTION
For example, in the column I have, there is a line written 'Ser25Phe'
. And I want to split the column HGVS.Consequence
e.g. as 'Ser 25 Phe'
...
ANSWER
Answered 2022-Feb-12 at 11:29Using gsub
, assuming that e.g. "AsAsp"
should also be split into "As Asp"
.
QUESTION
I wonder to append listB to listA
input
...ANSWER
Answered 2021-Nov-24 at 06:07listA.extend(listB)
should work, but it modifies the original listA
. If you want sum_list
to be a different list, you can copy it first with something like
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install hgvs
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page