bioawk | BWK awk modified for biological data | Genomics library
kandi X-RAY | bioawk Summary
kandi X-RAY | bioawk Summary
Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk. The original awk requires a YACC-compatible parser generator (e.g. Byacc or Bison). Bioawk further depends on zlib so as to work with gzip'd files.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of bioawk
bioawk Key Features
bioawk Examples and Code Snippets
Community Discussions
Trending Discussions on bioawk
QUESTION
I have a fasta file which contains protein sequences. I'd like to select sequences with more than 300 amino acids and Cysteine (C) amino acid appears more than 4 times.
I've used this command to select sequences with more than 300 aa:
...ANSWER
Answered 2018-Oct-08 at 14:59I do not know bioawk
but I assume it is identical to awk with some initial parsing and constant definitions.
I would proceed as follows. Assuming you want the find the strings with more then 4 times the letter C
in and a length of more than 300, then you could do :
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install bioawk
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page