fastq | Fast, in memory work queue
kandi X-RAY | fastq Summary
kandi X-RAY | fastq Summary
Fast, in memory work queue.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of fastq
fastq Key Features
fastq Examples and Code Snippets
import fastq from "fastq";
export async function consecutiveRequest(data, requester, concurrency = 10) {
const workerFunction= async (args) => {
try {
const data = await requester(args.params);
console.log(args.index,
#!/usr/bin/env bash
# Path where to store merged fq files
mergePath='/research/merged_fq'
{
# Print header for merged files tsv
printf 'MergedOddFile\tMergedEvenFile\n'
# Read in dummy variable to skip header row of data.tsv
rea
#!/bin/bash
for dir in ./*/
do
if compgen -G "$dir*.fastq" > /dev/null
then
echo 'There are FASTQ files, will not do the operation'
elif pushd "$dir" > /dev/null
then
echo 'There are no FASTQ files, p
for fastq in *.fastq ; do
perl -pe 'chomp unless 1 == $.' "$fastq" > "${fastq%.fastq}.fq
done
perl -i~ -pe 'chomp unless 1 == $. || eof; $. = 0 if eof' *.fastq
nextflow.enable.dsl=2
params.num_lines = 40000
params.suffix_length = 5
process split_fastq {
input:
tuple val(name), path(fastq)
output:
tuple val(name), path("${name}-${/[0-9]/*params.suffix_length}.fastq.gz")
sh
# Requires Python 3.6+ for f-strings, Snakemake 5.4+ for checkpoints
import pathlib
import random
random.seed(1)
rule make_fastq:
output:
fastq = touch("input/{sample}.fastq")
checkpoint split_fastq:
SAMPLES = FASTQ | GENOMES
SAMPLES = FASTQ.copy()
SAMPLES.update(GENOMES)
import glob
import os
sra_id= ['SRR1234', 'SRR4567']
rule all:
input:
expand('{sra_id}.dump.done', sra_id= sra_id),
expand('{sra_id}.bam', sra_id= sra_id),
rule sra_dump:
output:
touch('{sra_id}.dump.done
---
class: Workflow
cwlVersion: v1.0
id: workflow
inputs:
- id: paired_end_fastqs
type:
type: record
name: paired_end_fastqs
fields:
- name: fastq
type: File
- name: fastq2
type: File
- name: sample_na
rule mergeFastq:
input:
reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
output:
reads1='sampleA/sampleA_R1.fastq.gz',
reads2='sampleA/sampleA_R2.fastq.gz'
Community Discussions
Trending Discussions on fastq
QUESTION
I want to rename and move my fastq.gz
files from these:
ANSWER
Answered 2021-Jun-11 at 23:24There are several problems in your code. First of all, the {dir}
in your output and {dir}
in your input are two different variables. Actually the {dir}
in the output is a wildcard, while the {dir}
in the input is a parameter for the expand
function (moreover, you even forgot to call this function, and that is the second problem).
The third problem is that the shell
section shall contain only a single command. You may try mv {input.fastq1} {output.fastq1}; mv {input.fastq2} {output.fastq2}
, but this is not an idiomatic solution. Much better would be to create a rule that produces a single file, letting Snakemake to do the rest of the work.
Finally the S
value fully depend on the DIR
value, so it becomes a function of {dir}
, and that can be solved with a lambda in input:
QUESTION
I have a series of files:
...ANSWER
Answered 2021-Jun-11 at 12:34Loop over the files. In bash, you can use parameter expansion to remove the extension from the file name:
QUESTION
I am trying to process bulk RNA-seq data using salmon through snakemake in the conda/mamba environment.
I am receiving the following error when running snakemake:
...ANSWER
Answered 2021-Jun-10 at 20:38I think the Snakefile is ok, SRR3350597_GSM2112330_RA_hip_3_Homo_sapiens_RNA-Seq_1.fastq.gz
is simply missing. See the ls
output of yours, that file is not in it.
QUESTION
I am writing a snakemake pipeline to eventually identify corona virus variants.
Below is a minimal example with three steps:
...ANSWER
Answered 2021-Jun-10 at 07:54I think the problem is that rule catFasta
doesn't contain the wildcard barcode
. If you think about it, what job name would you expect in {wildcards.barcode}.{rule}.{jobid}
?
Maybe a solution could be to add to each rule a jobname
parameter that could be {barcode}
for guppyplex
and minion
and 'all_barcodes'
for catFasta
. Then use --jobname "{params.jobname}.{rule}.{jobid}"
QUESTION
I am writing a snakemake to produce Sars-Cov-2 variants from Nanopore sequencing. The pipeline that I am writing is based on the artic network, so I am using artic guppyplex
and artic minion
.
The snakemake that I wrote has the following steps:
- zip all the
fastq
files for all barcodes (rulezipFq
) - perform read filtering with
guppyplex
(ruleguppyplex
) - call the
artic minion
pipeline (ruleminion
) - move the stderr and stdout from qsub to a folder under the working directory (rule
mvQsubLogs
)
Below is the snakemake that I wrote so far, which works
...ANSWER
Answered 2021-Jun-08 at 15:40The rule that fails is rule guppyplex
, which looks for an input in the form of {FASTQ_PATH}/{{barcode}}
.
Looks like the wildcard {barcode}
is filled with barcode49/barcode49.consensus.fasta
, which happened because of two reasons I think:
First (and most important): The workflow does not find a better way to produce the final output. In rule catFasta
, you give an input file which is never described as an output in your workflow. The rule minion
has the directory as an output, but not the file, and it is not perfectly clear for the workflow where to produce this input file.
It therefore infers that the {barcode}
wildcard somehow has to contain this .consensus.fasta
that it has never seen before. This wildcard is then handed over to the top, where the workflow crashes since it cannot find a matching input file.
Second: This initialisation of the wildcard with sth. you don't want is only possible since you did not constrain the wildcard properly. You can for example forbid the wildcard to contain a .
(see wildcard_constraints
here)
However, the main problem is that catFasta
does not find the desired input. I'd suggest changing the output of minion
to "nanopolish/{barcode}/{barcode}.consensus.fasta"
, since the you already take the OUTDIR from the params, that should not hurt your rule here.
Edit: Dummy test example:
QUESTION
url <- "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR105/056/SRR10503056/SRR10503056.fastq.gz"
for (i in 1:20){
RCurl::getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE)
}
...ANSWER
Answered 2021-Jun-03 at 10:20Ok, I have a solution that has not failed for me: I create a try catch with a max attempt iterator, default 5 attempts with a wait time of 1 second, in addition to a general wait time of 0.05 seconds per accepted url request.
Let me know if anyone has a safer idea:
QUESTION
There is an async iterable
...ANSWER
Answered 2021-Jun-03 at 09:15Here's an async variation of my answer here:
QUESTION
I am trying to take data from specific lines in one file that is 32GB, put the extracted data in a dictionary and then read into another file of 32GB to replace specific lines with the keys and values from the dictionary created prior. Lastly, I'm trying to put all this new information in a brand new file.
However, when I ran the program, it was over 12 hours and it was still running. I implemented a progress bar and it's been 2 hours, and not one percent progress has been made. I don't get an error message, but I see no progress. Does anybody know why? Maybe it's having trouble reading files this big? Any help would be appreciated. Here is my code I used.
...ANSWER
Answered 2021-May-29 at 16:00In general, I would always try to run a program on a much smaller input first (1 MB, 10 MB, 100 MB?) to see if the program works correctly, and if so, how long time it takes per MB. Then I can calculate how long time it would approximately take for the full file and how much progress would be expected at which time in the progress.
Maybe you can even make those small file tests while leaving it running on the big file, to at least see that the program actually works and will eventually finish (to not lose your current progress). Try with a very small file first (maybe the first 1 MB of the large file), then perhaps increase in size if that worked flawlessly.
Looking at the actual program, though, I would definitely not collect the whole data in memory and only write it in the end. I would write to the output file continously. That is much more efficient and won't use the same likely huge amount of virtual memory as you use with the current program.
So, something like this (didn't test, as I can't):
QUESTION
I have text files in the rage of 10-50GB. I need to edit the first several lines of these files as follows;
Original;
...ANSWER
Answered 2021-May-27 at 15:45Incorporating @shellter comments and help, the easiest script to get rid of non-ASCII junks and get the desired output that I came up with was following;
QUESTION
I want to read a file, 4 lines by 4 (it's a fastq file, with DNA sequences).
When I read the file one line by one or two by two, there's no issues, but when I read 3 or 4 lines at once, my code crashes (kernel appeared to have died on jupyter notebook). (Uncommenting the last part, or any 3 out of the 4 getline()
.
I tried with a double array of char (char**) to store the lines, with the same issue.
Any idea what can be the cause ?
Using Python 3.7.3, Cython 0.29, all other libraries updated. File being read is about 1.3GB, machine has 8GB, ubuntu 16.04. Code adapted from https://gist.github.com/pydemo/0b85bd5d1c017f6873422e02aeb9618a
...ANSWER
Answered 2021-Apr-26 at 11:27The underlying problem was my misunderstanding of getline()
getline() c reference
To store lines in different variables, an associated n
is necessary for each line pointer *lineptr
.
If *lineptr is set to NULL and *n is set 0 before the call, then getline() will allocate a buffer for storing the line.
Alternatively, before calling getline(), *lineptr can contain a pointer to a malloc(3)-allocated buffer *n bytes in size. If the buffer is not large enough to hold the line, getline() resizes it with realloc(3), updating *lineptr and *n as necessary.
The n
(or seed
in my code) will hold the size of the buffer allocated for the pointer, where getline() puts the incoming line. As I set the same buffer variable for different pointers, getline was given the wrong information of the size of the char* line_xxx.
As fastq files are usually in this shape:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install fastq
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page