snakefiles | common RNA-seq data analysis workflows | Genomics library
kandi X-RAY | snakefiles Summary
kandi X-RAY | snakefiles Summary
This repository has Snakefiles for common RNA-seq data analysis workflows. Please feel free to copy them and modify them to suit your needs.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Returns a list of queues to run .
- Entry point for the script .
- Run a bsub command .
snakefiles Key Features
snakefiles Examples and Code Snippets
Community Discussions
Trending Discussions on snakefiles
QUESTION
I'm trying to expand on two set of files that have a similar name structure, only one is longer than the other.
...ANSWER
Answered 2022-Apr-14 at 19:36I didn't look into the source code, my guess is for each filename you want to insert variable in, it loops each element in the variable again. So itertuples()
returns an iterator, for first filename {u.sample}.foo.txt
the iterator already hits the end, then for the next filename {u.sample}.foo.bar.txt
there is no element remaining in the iterator. A simple solution is extracting all elements in iterator as a list
QUESTION
I am trying to clean a data pipeline by using snakemake
. It looks like wildcards are what I need but I don't manage to make it work in params
My function needs a parameter that depends on the wildcard value. For instance, let's say
it depends on sample
that can either be A
or B
.
I tried the following (my example is more complicated but this is basically what I am trying to do) :
...ANSWER
Answered 2022-Mar-02 at 18:45I think that you are trying to get the sample
wildcard to use as a parameter in your script.
The wc
variable is an instance of snakemake.io.Wildcards
which is a snakemake.io.Namedlist
.
You can call .get(key)
on these objects, so we can use a lambda
function to generate the params.
samples_from_wc=lambda wc: wc.get("sample")
and use this in the run/shell as params.samples_from_wc
.
QUESTION
I'm performing parameter exploration using Paramspace
utility as described here. I've read parameters in a pandas dataframe and next I wish to pass these as values of options of a shell command but can't figure out how.
In the below minimal example, I wish to pass parameter s
(read in dataframe df) as the value of option -n
for head command in the shell directive.
ANSWER
Answered 2022-Feb-11 at 09:56You're almost there, you do not need to quote the dictionary key. Here's a slightly modified working version:
QUESTION
I am working in a high performance computing grid environment, where large-scale data transfers are done via Globus. I would like to use Snakemake to pull data from a Globus path, process the data, and then push the processed data to a different Globus path. Globus has a command-line interface.
Pulling the data is no problem, for I'd just create a rule that would run globus transfer
to create the requisite local file. But for pushing the data back to Globus, I think I'll need a rule that can "see" that the file is missing at the remote location, and then work backwards to determine what needs to happen to create the file.
I could create local "proxy" files that represent the remote files. For example I could make a rule for creating 'processed_data_1234.tar.gz' output files in a directory. These files would just be created using touch
(thus empty), and the same rule will run globus transfer
to push the files remotely. But then there's the overhead of making sure that the proxy files don't get out of sync with the real Globus-hosted files.
Is there a more elegant way to do this akin to the Remote File capability? Is it difficult to add a Globus CLI support for Snakemake? Thanks in advance for any advice!
...ANSWER
Answered 2022-Jan-11 at 14:19Would it help to create a utility function that would generate a list of all desired files and compare it against the list of files available on globus? Something like this (pseudocode):
QUESTION
Before this, I checked this, snakemake's documentation, this,and this. Maybe they actually answered this question but I just didn't understand it.
In short, I create in one rule a number of files from other files, that both conform to a wildcard format. I don't know how many of these I create, since I don't know how many I originally download.
In all of the examples I've read so far, the output is directory("the/path"), while I have a "the/path/{id}.txt. So this I guess modifies how I call the checkpoints in the function itself. And the use of expand.
The rules in question are:
download_mv
textgrid_to_ctm_txt
get_MV_IDs
merge_ctms
The order of the rules should be:
download_mv (creates {MV_ID}.TEX and .wav (though not necessarily the same amount)
textgrid_to_ctm_txt (creates from {MV_ID}.TEX matching .txt and .ctm)
get_MV_IDs (should make a list of the .ctm files)
merge_ctms (should concatenate the ctm files)
kaldi_align (from the .wav and .txt directories creates one ctm file)
analyse_align (compares ctm file from kaldi_align the the merge_ctms)
upload_print_results
I have tried with the outputs of download_mv being directories, and then trying to get the IDs but I had different errors then. Now with snakemake --dryrun
I get
ANSWER
Answered 2021-Dec-07 at 05:19I can see the reason why you got the error is:
You use input function in rule merge_ctms
to access the files generated by checkpoint. But merge_ctms
doesn't have a wildcard in output file name, snakemake didn't know which wildcard should be filled into MV_ID
in your checkpoint.
I'm also a bit confused about the way you use checkpoint, since you are not sure how many .TEX
files would be downloaded (I guess), shouldn't you use the directory that stores .TEX
as output of checkpoint, then use glob_wildcards
to find out how many .TEX
files you downloaded?
An alternative solution I can think of is to let download_mv
become your checkpoint and set the output as the directory containing .TEX
files, then in input function, replace the .TEX
files with .ctm
files to do the format conversion
QUESTION
I have a snakemake file where one rule produces a file from witch i would like to extract the header and use as wildcards in my rule all. The Snakemake guide provides an example where it creates new folders named like the wildcards, but if I can avoid that it would be nice since in some cases it would need to create 100-200 folders then. Any suggestions on how to make it work?
link to snakemake guide: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
...ANSWER
Answered 2021-Nov-26 at 20:06I think you are misreading the snakemake checkpoint example. You only need to create one folder in your case. They have a wildcard (sample
) in the folder name, but that part of the output name is known ahead of time.
QUESTION
I am using Snakemake to run the defense-finder program. This program creates and overwites generic temporary files in /tmp/defense-finder
, i.e. the file names do not contain unique identifiers. When running my rule across separate cores on different input files, Snakemake crashes due to clashes in /tmp/defense-finder
.
It appears that Shadow rules can help when different jobs write to the same files within the working directory. Is there a way to use Shadow rules when a program writes to the /tmp
directory?
ANSWER
Answered 2021-Nov-17 at 09:40Following @Marmaduke's comment that file paths are hard-coded, a temporary workaround is to force snakemake to run the defense-finder jobs one at a time while allowing other jobs to run in parallel. You can do this with the resources directive:
QUESTION
Below an example of Snakemake Rmd report with a custom.css.
...ANSWER
Answered 2021-Aug-05 at 12:51I think maybe your syntax is wrong. according to Rmarkdown Cookbook
QUESTION
I am trying to run a snakemake rule with an external script that contains a wildcard as noted in the snakemake reathedocs. However I am running into KeyError when running snakemake.
For example, if we have the following rule:
...ANSWER
Answered 2021-May-22 at 09:59I agree with Dmitry Kuzminov that having a script depending on a wildcard is odd. Maybe there are better solutions.
Anyway, this below works for me on snakemake 6.0.0. Note that in your R script snakemake@output[1]
should be snakemake@output[[1]]
, but that doesn't give the problem you report.
QUESTION
I have a job that might fail with a specific configuration, what I want to do is let it run once and if it fails run a slightly different configuration. I found the attempt parameter but I don't find a way to access it outside the resources tag...
do you know how to access it or any alternative?
...ANSWER
Answered 2020-Dec-04 at 09:07the attempt counter is contained as argument "'--attempt' 'int'" to the jobscript (in my case a wrapper script in python)
therefore you can access it for example with:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install snakefiles
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page