RNASeq | Download the RNAseq pipeline | Genomics library
kandi X-RAY | RNASeq Summary
kandi X-RAY | RNASeq Summary
Download the RNAseq pipeline. Make sure all dependencies are installed and the right paths are set in the pipeline (RNAseqAnalyse.pl) in the "Get options" section.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of RNASeq
RNASeq Key Features
RNASeq Examples and Code Snippets
Community Discussions
Trending Discussions on RNASeq
QUESTION
I'd like to generate a PCA of my bulk RNAseq data, coloured by each of my variables in the DESeq2 object "vsd". My current code looks like this (to generate a single plot):
...ANSWER
Answered 2022-Mar-09 at 18:08Let's mock up some data, since the example you provide is too short and ill-formatted to do anything with. I'm assuming you have data of roughly the following structure:
QUESTION
I tried to recreate the correlation matrix that is described in scanpy's tutorial, using my own RNAseq dataset.
The relevant function in scanpy is: sc.pl.correlation_matrix
and the plot looks like this:
The main question here is: how was this Pearson's correlation between different cell types calculated, while the size of the matrix for each cell type is different?
For example: I have 1000 genes as columns, 500 CD34+ cells as rows and only 200 CD19+ B cells. So how is it possible to calculate a correlation between both cell types?
Additionally, calculating a correlation between two matrices, results in another matrix (and not with a single scalar such as displayed in the plot above). For example, numpy's corrcoef()
function that is applied on two matrices of the same size results in another matrix and not with a scalar...
I tried to average the genes expression over the cells for each cell type so the calculation contained equally-sized vectors, and it still didn't match scanpy's results.
I encountered this conversation: https://github.com/theislab/scanpy/pull/425 mentioning that when hierarchical clustering is computed, this correlation matrix above is created (but no code was provided).
I'll be happy with any suggestions, explanations and some possible python / code implementations.
...ANSWER
Answered 2022-Mar-07 at 11:57I managed to solve this problem with some trial and error method.
Indeed, it is impossible to obtain a single scalar from a correlation between two unequal-sized matrices.
In fact, the correlation in this function is not conducted on genes at all! It is calculated on the results of a 50-components-PCA of the dataset!
After conducting PCA, you'll have to average the values of every component for each cell type (because each cell type contains different number of cells). Thus, you'll get a single vector (1*50) for each cell type.
You will finally have a dataframe where each column is a different cell type and each row is the average component of the PCA for each cell type.
The calculation of the pearson correlation can be simply achieved using the corr()
function of pandas on the final averaged PCA dataframe.
The arrangement of the plot, including dendrogram, can be achieved with seaborn's clustermap()
function.
That's it :)
QUESTION
I have one question. Like now i have two files:sampleattributes and genecount.I have filtered sample attributes file and it has a column name sampid and genecount has a column name sampid. I am trying to merge the two files using the common sampid. This is what I have written:
...ANSWER
Answered 2022-Feb-23 at 14:07Looking at youre gene_count
data, it doesn't have a column for the SAMPID, those were imported as row names. We'll convert them to an actual column, replace the "."
with "-"
so they match the braindata format, and then we can join. Your sample data doesn't have any elements in common so I use a full_join
, but you may prefer a left, right, or inner join--I'm not really sure what your use case is.
QUESTION
This is an extension of a question I asked yesterday. I have looked all over StackOverflow and have not found an instance of this specific NameError:
...ANSWER
Answered 2022-Jan-06 at 06:30I think it's due to you used expand
function in a wrong way, expand
only accepts two positional arguments, where the first one is pattern and the second one is function (optional). If you want to supply multiple patterns you should wrap these patterns in list.
After some studying on source code of snakemake, it turns out expand
function doesn't check if user provides < 3 positional arguments, there is a variable combinator
in if-else
that would only be created when there are 1 or 2 positional arguments, the massive amount of positional arguments you provide skip this part and lead to the error when it tries to use combinator
later.
Source code: https://snakemake.readthedocs.io/en/v6.5.4/_modules/snakemake/io.html
QUESTION
I am trying to do QC on RNAseq data that is tarballed. I am using Snakemake as a workflow manager and am aware that Snakemake does not like one-to-many rules. I defining a checkpoint would fix the problem but when I run the script I get this this error message with rule fastqc.
...ANSWER
Answered 2022-Jan-05 at 06:18First, glob_wildcards(INPUTDIR + "{basenames}_R1.fastq.gz")
returns a Wildcards object that contains the key:value pair for each wildcard. If you want to get the basenames, it should be glob_wildcards(INPUTDIR + "{basenames}_R1.fastq.gz").basenames
Second, I assume all fastq.gz files are generated by decompress_h1n1
checkpoint, since you already include the fastqc output in aggregate_decompress_h1n1
function. You shouldn't include those outputs again in rule all
, it leads to snakemake try to do fastqc before checkpoint got executed.
Third, you should also put your trim_qc
outputs in the aggregate_decompress_h1n1
function, basically it's the same issue as fastqc, probably the same for salmon
related rules too.
There might be a potential issue, I noticed you use wrapper for fastqc, I remember that official fastqc wrapper requires the output must have html
and zip
, while you have a raw
prefix. But I didn't see a 0.80.3
release in official document, not sure if you are using some other repository to access wrapper
QUESTION
I have a file that looks like
...ANSWER
Answered 2021-Nov-15 at 21:46Do you want:
QUESTION
I have these two different tables :
...ANSWER
Answered 2021-Jul-26 at 02:41What do you need the third table for when you can get all the required information from Table 1 and Table 2? Use full_join
to merge
Table 1
and Table 2
and get the required result shown below
QUESTION
I understand we have to use collect()
when we run a process that takes as input two channels, where the first channel has one element and then second one has > 1 element:
ANSWER
Answered 2021-Jun-25 at 20:09You need to create the first channel with a value factory which never exhausts the channel.
Your linked example implicitly creates a value channel which is why it works. The same happens when you call .collect()
on A.out.foo
.
Channel.from
(or the more modern Channel.of
) create a sequence channel which can be exhausted which is why both A
and B
only run once.
So
QUESTION
I have a snakemake pipeline that looks like this:
...ANSWER
Answered 2021-Apr-29 at 23:51The expand
function returns a list. By setting the input files to a list instead of a string, you are confusing the script. For defining r1 and r2, you should use something that returns a string instead. I would suggest the string's format()
function or an f-string.
Change:
QUESTION
I am using snakemake to design a RNAseq-data analysis pipeline. While I've managed to do that, I want to make my pipeline to be as adaptable as possible and make it able to deal with single-reads (SE) data or paired-end (PE) data within the same run of analyses, instead of analysing SE data in one run and PE data in another.
My pipeline is supposed to be designed like this :
- dataset download that gives 1 file (SE data) or 2 files (PE data) -->
- set of rules A specific to 1 file OR set of rules B specific to 2 files -->
- rule that takes 1 or 2 input files and merges it/them into a single output -->
- final set of rules.
Note : all rules of A have 1 input and 1 output, all rules of B have 2 inputs and 2 outputs and their respective commands look like :
- 1 input :
somecommand -i {input} -o {output}
- 2 inputs :
somecommand -i1 {input1} -i2 {input2} -o1 {output1} -o2 {output2}
Note 2 : except their differences in inputs/outputs, all rules of sets A and B have the same commands, parameters/etc...
In other words, I want my pipeline to be able to switch between the execution of set of rules A or set of rules B depending on the sample, either by giving it information on the sample in a config file at the start (sample 1 is SE, sample 2 is PE... this is known before-hand) or asking snakemake to counts the number of files after the dataset download to choose the proper next set of rules for each sample. If you see another way to do that, you're welcome to tell be about it.
I thought about using checkpoints, input functions and if/else statement, but I haven't managed to solve my problem with these.
Do you have any hints/advice/ways to make that "switch" happen ?
...ANSWER
Answered 2020-Aug-11 at 13:57If you know the layout beforehand, then the easiest way would be to store it in some variable, something like this (or alternatively you read this from a config file into a dictionary):
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install RNASeq
refFlat file (for using bamMetrics)
Interval list (for using bamMetrics)
Genesizes file (for calculating RPKMs)
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page