snakefiles | common RNA-seq data analysis workflows | Genomics library

by slowkow Python Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | snakefiles Summary

snakefiles is a Python library typically used in Artificial Intelligence, Genomics applications. snakefiles has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. However snakefiles build file is not available. You can download it from GitHub.

This repository has Snakefiles for common RNA-seq data analysis workflows. Please feel free to copy them and modify them to suit your needs.

Support

Quality

Security

License

Reuse

Support

snakefiles has a low active ecosystem.

It has 68 star(s) with 32 fork(s). There are 3 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 1 have been closed. On average issues are closed in 164 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of snakefiles is current.

Quality

snakefiles has 0 bugs and 4 code smells.

Security

snakefiles has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

snakefiles code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

snakefiles is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

snakefiles releases are not available. You will need to build from source code and install.

snakefiles has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions, examples and code snippets are available.

It has 79 lines of code, 3 functions and 2 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed snakefiles and discovered the below as its top functions. This is intended to give you an instant insight into snakefiles implemented functionality, and help decide if they suit your requirements.

Returns a list of queues to run .
Entry point for the script .
Run a bsub command .

Get all kandi verified functions for this library.

snakefiles Key Features

No Key Features are available at this moment for snakefiles.

snakefiles Examples and Code Snippets

No Code Snippets are available at this moment for snakefiles.

Community Discussions

Trending Discussions on snakefiles

Snakemake expand() only applies on the first element of a list

Passing wildcard values in params in snakemake

snakemake parameter exploration: how to pass options to a command in shell directive?

How to make Snakemake recognize Globus remote files using Globus CLI?

Snakemake: Use checkpoint and function to aggregate unknown number of files using wildcards

How do Snakemake checkpoints work when i do not wanna make a folder?

Snakemake shadow rule when program writes to /tmp

Snakemake report with Rmarkdown and custom.css file

Running external scripts with wildcards in snakemake

Accessing the attempt counter for change in job behaviour

QUESTION

Snakemake expand() only applies on the first element of a list

Asked 2022-Apr-14 at 19:36

I'm trying to expand on two set of files that have a similar name structure, only one is longer than the other.

...

ANSWER

Answered 2022-Apr-14 at 19:36

I didn't look into the source code, my guess is for each filename you want to insert variable in, it loops each element in the variable again. So itertuples() returns an iterator, for first filename {u.sample}.foo.txt the iterator already hits the end, then for the next filename {u.sample}.foo.bar.txt there is no element remaining in the iterator. A simple solution is extracting all elements in iterator as a list

Source https://stackoverflow.com/questions/71875766

QUESTION

Passing wildcard values in params in snakemake

Asked 2022-Mar-02 at 18:45

I am trying to clean a data pipeline by using snakemake. It looks like wildcards are what I need but I don't manage to make it work in params

My function needs a parameter that depends on the wildcard value. For instance, let's say it depends on sample that can either be A or B.

I tried the following (my example is more complicated but this is basically what I am trying to do) :

...

ANSWER

Answered 2022-Mar-02 at 18:45

I think that you are trying to get the sample wildcard to use as a parameter in your script.

The wc variable is an instance of snakemake.io.Wildcards which is a snakemake.io.Namedlist. You can call .get(key) on these objects, so we can use a lambda function to generate the params.

samples_from_wc=lambda wc: wc.get("sample") and use this in the run/shell as params.samples_from_wc.

Source https://stackoverflow.com/questions/71326692

QUESTION

snakemake parameter exploration: how to pass options to a command in shell directive?

Asked 2022-Feb-11 at 09:57

I'm performing parameter exploration using Paramspace utility as described here. I've read parameters in a pandas dataframe and next I wish to pass these as values of options of a shell command but can't figure out how.

In the below minimal example, I wish to pass parameter s (read in dataframe df) as the value of option -n for head command in the shell directive.

...

ANSWER

Answered 2022-Feb-11 at 09:56

You're almost there, you do not need to quote the dictionary key. Here's a slightly modified working version:

Source https://stackoverflow.com/questions/71075669

QUESTION

How to make Snakemake recognize Globus remote files using Globus CLI?

Asked 2022-Jan-15 at 01:09

I am working in a high performance computing grid environment, where large-scale data transfers are done via Globus. I would like to use Snakemake to pull data from a Globus path, process the data, and then push the processed data to a different Globus path. Globus has a command-line interface.

Pulling the data is no problem, for I'd just create a rule that would run globus transfer to create the requisite local file. But for pushing the data back to Globus, I think I'll need a rule that can "see" that the file is missing at the remote location, and then work backwards to determine what needs to happen to create the file.

I could create local "proxy" files that represent the remote files. For example I could make a rule for creating 'processed_data_1234.tar.gz' output files in a directory. These files would just be created using touch (thus empty), and the same rule will run globus transfer to push the files remotely. But then there's the overhead of making sure that the proxy files don't get out of sync with the real Globus-hosted files.

Is there a more elegant way to do this akin to the Remote File capability? Is it difficult to add a Globus CLI support for Snakemake? Thanks in advance for any advice!

...

ANSWER

Answered 2022-Jan-11 at 14:19

Would it help to create a utility function that would generate a list of all desired files and compare it against the list of files available on globus? Something like this (pseudocode):

Source https://stackoverflow.com/questions/70659821

QUESTION

Snakemake: Use checkpoint and function to aggregate unknown number of files using wildcards

Asked 2021-Dec-07 at 05:19

Before this, I checked this, snakemake's documentation, this,and this. Maybe they actually answered this question but I just didn't understand it.

In short, I create in one rule a number of files from other files, that both conform to a wildcard format. I don't know how many of these I create, since I don't know how many I originally download.

In all of the examples I've read so far, the output is directory("the/path"), while I have a "the/path/{id}.txt. So this I guess modifies how I call the checkpoints in the function itself. And the use of expand.

The rules in question are:

download_mv

textgrid_to_ctm_txt

get_MV_IDs

merge_ctms

The order of the rules should be:

download_mv (creates {MV_ID}.TEX and .wav (though not necessarily the same amount)

textgrid_to_ctm_txt (creates from {MV_ID}.TEX matching .txt and .ctm)

get_MV_IDs (should make a list of the .ctm files)

merge_ctms (should concatenate the ctm files)

kaldi_align (from the .wav and .txt directories creates one ctm file)

analyse_align (compares ctm file from kaldi_align the the merge_ctms)

upload_print_results

I have tried with the outputs of download_mv being directories, and then trying to get the IDs but I had different errors then. Now with snakemake --dryrun I get

...

ANSWER

Answered 2021-Dec-07 at 05:19

I can see the reason why you got the error is:

You use input function in rule merge_ctms to access the files generated by checkpoint. But merge_ctms doesn't have a wildcard in output file name, snakemake didn't know which wildcard should be filled into MV_ID in your checkpoint.

I'm also a bit confused about the way you use checkpoint, since you are not sure how many .TEX files would be downloaded (I guess), shouldn't you use the directory that stores .TEX as output of checkpoint, then use glob_wildcards to find out how many .TEX files you downloaded?

An alternative solution I can think of is to let download_mv become your checkpoint and set the output as the directory containing .TEX files, then in input function, replace the .TEX files with .ctm files to do the format conversion

Source https://stackoverflow.com/questions/70247422

QUESTION

How do Snakemake checkpoints work when i do not wanna make a folder?

Asked 2021-Nov-26 at 20:06

I have a snakemake file where one rule produces a file from witch i would like to extract the header and use as wildcards in my rule all. The Snakemake guide provides an example where it creates new folders named like the wildcards, but if I can avoid that it would be nice since in some cases it would need to create 100-200 folders then. Any suggestions on how to make it work?

link to snakemake guide: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html

...

ANSWER

Answered 2021-Nov-26 at 20:06

I think you are misreading the snakemake checkpoint example. You only need to create one folder in your case. They have a wildcard (sample) in the folder name, but that part of the output name is known ahead of time.

Source https://stackoverflow.com/questions/70095796

QUESTION

Snakemake shadow rule when program writes to /tmp

Asked 2021-Nov-17 at 09:40

I am using Snakemake to run the defense-finder program. This program creates and overwites generic temporary files in /tmp/defense-finder, i.e. the file names do not contain unique identifiers. When running my rule across separate cores on different input files, Snakemake crashes due to clashes in /tmp/defense-finder.

It appears that Shadow rules can help when different jobs write to the same files within the working directory. Is there a way to use Shadow rules when a program writes to the /tmp directory?

...

ANSWER

Answered 2021-Nov-17 at 09:40

Following @Marmaduke's comment that file paths are hard-coded, a temporary workaround is to force snakemake to run the defense-finder jobs one at a time while allowing other jobs to run in parallel. You can do this with the resources directive:

Source https://stackoverflow.com/questions/69990055

QUESTION

Snakemake report with Rmarkdown and custom.css file

Asked 2021-Oct-28 at 14:07

Below an example of Snakemake Rmd report with a custom.css.

...

ANSWER

Answered 2021-Aug-05 at 12:51

I think maybe your syntax is wrong. according to Rmarkdown Cookbook

Source https://stackoverflow.com/questions/68637394

QUESTION

Running external scripts with wildcards in snakemake

Asked 2021-May-24 at 11:50

I am trying to run a snakemake rule with an external script that contains a wildcard as noted in the snakemake reathedocs. However I am running into KeyError when running snakemake.

For example, if we have the following rule:

...

ANSWER

Answered 2021-May-22 at 09:59

I agree with Dmitry Kuzminov that having a script depending on a wildcard is odd. Maybe there are better solutions.

Anyway, this below works for me on snakemake 6.0.0. Note that in your R script snakemake@output[1] should be snakemake@output[[1]], but that doesn't give the problem you report.

Source https://stackoverflow.com/questions/67640412

QUESTION

Accessing the attempt counter for change in job behaviour

Asked 2020-Dec-04 at 09:07

I have a job that might fail with a specific configuration, what I want to do is let it run once and if it fails run a slightly different configuration. I found the attempt parameter but I don't find a way to access it outside the resources tag...

do you know how to access it or any alternative?

...

ANSWER

Answered 2020-Dec-04 at 09:07

the attempt counter is contained as argument "'--attempt' 'int'" to the jobscript (in my case a wrapper script in python)

therefore you can access it for example with:

Source https://stackoverflow.com/questions/65124807

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install snakefiles

If you are new to Snakemake, you might like to start by walking through my tutorial for beginners. Next, have a look at Johannes Koster's introductory slides, tutorial, documentation, and FAQ.