snakemake | development home of the workflow management system Snakemake | BPM library
kandi X-RAY | snakemake Summary
kandi X-RAY | snakemake Summary
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Snakemake is highly popular, with on average more than 6 new citations per week, and over 200k downloads. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment. Copyright (c) 2012-2022 Johannes Köster johannes.koester@uni-due.com (see LICENSE).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of snakemake
snakemake Key Features
snakemake Examples and Code Snippets
x=expand(
[
"results/qc/{u.sample}.foo.txt",
"results/qc/{u.sample}.foo.bar.txt",
],
u=list(samples.itertuples())
)
print(x)
name: multiqc
channels:
- conda-forge
- bioconda
dependencies:
- python=3.7
- multiqc=1.12
if not workflow.use_conda:
sys.stderr.write("Use conda\n")
sys.exit(1)
rule all:
input:
"a.txt",
"b.txt",
rule a:
output:
"a.txt",
shell:
"""
echo {params.val} > {output}
"""
rule b:
output:
"b.txt",
shell:
"""
# Snakefile
samples = ['1', '2']
batches = ['b1', 'b2']
print(expand(['{sample}/{batch}/{sample}'], zip, sample=samples, batch=batches))
$ snakemake -nq
['1/b1/1', '2/b2/2']
trim = expand(
localrules: all, make_envs
rule all:
input:
# Maybe not needed:
expand('{env}.done', env= ['env1', 'env2'])
rule make_envs:
conda:
'workflow/envs/{env}.yaml',
output:
touch('{env}.done'),
ru
expand("{{sample}}/{sample}_phased_illumina_FILT5_{chrom}.vcf", sample=samples, chrom=chroms)
samples = ['12NAE3', '15NOH7']
chroms = ['chr01','chr02']
rule ConcatVCF:
input:
expand("{{sample}}/{sample
d = {
"foo": {
"bar1": {
"A1": {"name": "A1", "path": "/path/to/A1"},
"B1": {"name": "B1", "path": "/path/to/B1"},
"C1": {"name": "C1", "path": "/path/to/C1"},
"D1": {"name": "D1"
def process_lines(file_name):
"""generates id/run, ignoring non-numeric lines"""
with open(file_name, "r") as f:
for line in f:
detector_id, run_number, *_ = line.split()
if detector_id.isnumeric() a
rule all:
input:
expand("{your_path}.extension", replacements)
rule make_output:
input: "{input}_{num}.extension"
output: "{output}_{num}.extension"
shell:
copy_sph_to_wav {input} > {output}
Community Discussions
Trending Discussions on snakemake
QUESTION
I have a rule in my snakemake pipeline to run multiqc :
...ANSWER
Answered 2022-Apr-11 at 10:56You most likely want to install python as well, since according to docs it's not recommended to use the system-wide python:
QUESTION
I am using a cluster submission wrapper script with snakemake --cluster "python qsub_script.py"
. I need to pass a global variable, taken from the config['someVar']
. This should be applied to all rules. I could add it to the params
of each rule, and then access it using job_properties['params']['someVar']
, but this is probably not the best solution. Is there a way to access config
from the submission wrapper? Simply using config['someVar']
gives me a NameError
.
If that's not possible, can you suggest an alternative? I suspect using profiles could be helpful, but couldn't figure out how this interacts with the submission wrapper.
...ANSWER
Answered 2022-Mar-19 at 07:47The problem statement is a bit broad, so the solution below might not be the most optimal, but it should achieve what you are looking for. Specifically, the code below allows modifying every rule in the workflow.
Here's a reproducible demo:
QUESTION
I have two rules on my Snakefile: one generates several sets of files using wildcards, the other one merges everything into a single file. This is how I wrote it:
...ANSWER
Answered 2022-Mar-07 at 11:43In rule generate
I think you don't want to escape the {chr}
wildcard, otherwise it doesn't get replaced. I.e.:
QUESTION
I am aware that by adding the option --conda-create-envs-only
you are able to create the conda environments for the workflow. However, would it be possible to force the creation of all conda environments under workflow/envs/
without knowing the workflow DAG in advance?
The reason is that I am planning to run snakemake on an HPC, and the compute nodes have no internet. As such I have to set up the environment in a build node with internet. The problem is that I can only access my input data in the compute nodes.
...ANSWER
Answered 2022-Mar-04 at 10:07Maybe make the creation of the conda environments a target itself? Something like, not tested:
QUESTION
I am trying to clean a data pipeline by using snakemake
. It looks like wildcards are what I need but I don't manage to make it work in params
My function needs a parameter that depends on the wildcard value. For instance, let's say
it depends on sample
that can either be A
or B
.
I tried the following (my example is more complicated but this is basically what I am trying to do) :
...ANSWER
Answered 2022-Mar-02 at 18:45I think that you are trying to get the sample
wildcard to use as a parameter in your script.
The wc
variable is an instance of snakemake.io.Wildcards
which is a snakemake.io.Namedlist
.
You can call .get(key)
on these objects, so we can use a lambda
function to generate the params.
samples_from_wc=lambda wc: wc.get("sample")
and use this in the run/shell as params.samples_from_wc
.
QUESTION
I am trying to make a snakemake workflow for whatshap haplotype caller but I am struggling with MissingInputException errors. This is what I get:
...ANSWER
Answered 2022-Feb-26 at 01:06Troy Comi has already answered your question in comments, but I will explain it further.
Indeed, removing double braces will help. The difference between single and double braces is that double braces escape the symbol '{'
and '}'
. In other words whenever Snakemake encounters a string like this one "{{sample}}/{sample}_phased_illumina_FILT5.vcf.gz"
in the output section, it treats {sample}
as a wildcard and {{sample}}
as a string "{sample}"
. So it tries to find the files like {sample}/saturna_phased_illumina_FILT5.vcf.gz
which it definitely fails to find.
The problem is quite different in case of using this string in the expand
function:
QUESTION
I have a json file like so:
...ANSWER
Answered 2022-Feb-22 at 13:11There is a possibility of using a custom combinatoric function in expand. Most often this function is zip
, however, in your case the nested dictionary shape will require designing a custom function. Instead, a simpler solution is to use Python to construct the list of desired files.
QUESTION
I would like to improve the reproducibility of some python codes I made by transforming some codes into a data pipeline. I am used to targets
in R
and would like to find an equivalent in Python
. I have the impression that snakemake
is quite close to that.
I don't understand how we can use pandas
to import an input in a snakemake
task, modify it and then write output
.
Let's take the easiest pipeline I can think of: we take a csv and write a copy somewhere else.
The pipeline works fine when using bash script:
...ANSWER
Answered 2022-Feb-17 at 17:08You are very close, the curly braces are not needed within run
directive:
QUESTION
I started to migrate my workflows from Nextflow
to Snakemake
and already hitting the wall at the start of my pipelines which very often begin with a list of numbers (representing a "run number" from our detector).
What I have for example is a run-list.txt
like
ANSWER
Answered 2022-Feb-16 at 14:08To make this work you need two ingredients:
- a rule that specifies the logic for generating a single file (defining any file dependencies, if necessary)
- a rule that defines which file should be calculated, by convention this rule is called
all
.
Here is a rough sketch of the code:
QUESTION
Sorry if this is gonna be probably a duplication of other questions, but I couldn't figure to debug what's going on in my case. Got a dataframe like this:
...ANSWER
Answered 2022-Feb-15 at 11:15The parameters are stored in a dataframe, and there is a handy utility for working with tabulated parameters, Paramspace
. Below is a rough take on your specific case, but it will need some adjustments for command syntax and paths.
First step is to reshape the data for easier workflow:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install snakemake
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page