qsub | Parallellising lapply by submitting them to gridengine | Machine Learning library
kandi X-RAY | qsub Summary
kandi X-RAY | qsub Summary
qsub provides the qsub_lapply function, which helps you parallellise lapply calls to gridengine clusters.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of qsub
qsub Key Features
qsub Examples and Code Snippets
qsub_lapply(
X = 1:3,
FUN = function(i) {
if (i == 2) stop("Something went wrong!")
i + 1
}
)
## Error in FUN(X[[i]], ...): File: /home/rcannood/Workspace/.r2gridengine/20190213_133822_r2qsub_4XGIFwJTDg/log/log.2.e.txt
## Error in (fun
qsub_async <- qsub_lapply(
X = 1:3,
FUN = function(i) {
Sys.sleep(10)
i + 1
},
qsub_config = override_qsub_config(
wait = FALSE
)
)
readr::write_rds(qsub_async, "temp_file.rds")
# you can restart your computer / R session a
qsub_lapply(
X = 1:3,
FUN = function(i) {
# simulate a very long calculation time
# this might annoy other users of the cluster,
# but sometimes there is no way around it
Sys.sleep(sample.int(3600, 1))
i + 1
},
qsub_confi
Community Discussions
Trending Discussions on qsub
QUESTION
New to NextFlow
, here, and struggling with some basic concepts. I'm in the process of converting a set of bash
scripts from a previous publication into a NextFlow
workflow.
I'm converting a simple bash script (included below for convenience) that did some basic prep work and submitted a new job to the cluster scheduler for each iteration.
Ultimate question: What is the most NextFlow-like way to incorporate this script into a NextFlow workflow (preferably using the new DSL2 schema)?
Possible subquestion: Is it possible to emit a list of lists based on bash
variables? I've seen ways to pass lists from workflows into processes, but not out of process. I could print each set of parameters to a file and then emit that file, but that doesn't seem very NextFlow-like.
I would really appreciate any guidance on how to incorporate the following bash
script into a NextFlow workflow. I have added comments and indicate the four variables that I need to emit as a set of parameters.
Thanks!
...ANSWER
Answered 2022-Jan-17 at 01:18What is the most NextFlow-like way to incorporate this script into a NextFlow workflow
In some cases, it is possible to incorporate third-party scripts that do not need to be compiled "as-is" by making them executable and moving them into a folder called 'bin' in the root directory of your project repository. Nextflow automatically adds this folder to the $PATH in the execution environment.
However, some scripts do not lend themselves for inclusion in this manner. This is especially the case if the objective is to produce a portable and reproducible workflow, which is how I interpret "the most Nextflow-like way". The objective ultimately becomes how run each process step in isolation. Given your example, below is my take on this:
QUESTION
I use python to do my data analysis and lately I came up with the idea to save the current git hash in a log file so I can later check which code version created my results (in case I find inconsistencies or whatever).
It works fine as long as I do it locally.
...ANSWER
Answered 2021-Oct-14 at 09:42So after I explained my problem to IT in detail, they could help me solve the problem.
Apparently the $PBS_O_WORKDIR variable stores the directory from which the job was committed.
So I adjusted my access to the githash as follows:
QUESTION
I was trying image classification using IPEX on DevCloud but it is showing "Illegal instruction" for me.
...ANSWER
Answered 2021-Oct-13 at 12:21qsub -I -l nodes=1:gpu:ppn=2 -d .
QUESTION
I am trying to construct and submit an array job based on R in the HPC of my university. I'm used to submit array jobs based on Matlab and I have some doubts on how to translate the overall procedure to R. Let me report a very simple Matlab example and then my questions.
The code is based on 3 files:
- "main" which does some preliminary operations.
- "subf" which should be run by each task and uses some matrices created by "main".
- a bash file which I qsub in the terminal.
1. main:
...ANSWER
Answered 2021-Jul-30 at 17:25The parallelization is handled by the HPC, right? In which case, I think "no", nothing special required.
It depends on how they allow/enable R. In a HPC that I use (not your school), the individual nodes do not have direct internet access, so it would require special care; this might be the exception, I don't know.
Recommendation: if there is a shared filesystem that both you and all of the nodes can access, then create an R "library" there that contains the installed packages you need, then use
.libPaths(...)
in your R scripts here to add that to the search path for packages. The only gotcha to this might be if there are non-R shared library (e.g.,.dll
,.so
,.a
) requirements. For this, either "docker" or "ask admins".If you don't have a shared filesystem, then you might ask the cluster admins if they use/prefer docker images (you might provide an image or a
DOCKERFILE
to create one) or if they have preferred mechanisms for enabling various packages.I do not recommend asking them to install the packages, for two reasons: First, think about them needing to do this with every person who has a job to run, for any number of programming languages, and then realize that they may have no idea how to do it for that language. Second, package versions are very important, and you asking them to install a package may install either a too-new package or overwrite an older version that somebody else is relying on. (See
packrat
andrenv
for discussions on reproducible environments.)Bottom line, the use of a path you control (and using
.libPaths
) enables you to have complete control over package versions. If you have not been bitten by unintended consequences of newer-versioned packages, just wait ... congratulations, you've been lucky.I suggest you can add
source("main.R")
to the beginning ofsubf.R
, which would make your bash file perhaps as simple as
QUESTION
I am writing a snakemake pipeline to eventually identify corona virus variants.
Below is a minimal example with three steps:
...ANSWER
Answered 2021-Jun-10 at 07:54I think the problem is that rule catFasta
doesn't contain the wildcard barcode
. If you think about it, what job name would you expect in {wildcards.barcode}.{rule}.{jobid}
?
Maybe a solution could be to add to each rule a jobname
parameter that could be {barcode}
for guppyplex
and minion
and 'all_barcodes'
for catFasta
. Then use --jobname "{params.jobname}.{rule}.{jobid}"
QUESTION
I am writing a snakemake to produce Sars-Cov-2 variants from Nanopore sequencing. The pipeline that I am writing is based on the artic network, so I am using artic guppyplex
and artic minion
.
The snakemake that I wrote has the following steps:
- zip all the
fastq
files for all barcodes (rulezipFq
) - perform read filtering with
guppyplex
(ruleguppyplex
) - call the
artic minion
pipeline (ruleminion
) - move the stderr and stdout from qsub to a folder under the working directory (rule
mvQsubLogs
)
Below is the snakemake that I wrote so far, which works
...ANSWER
Answered 2021-Jun-08 at 15:40The rule that fails is rule guppyplex
, which looks for an input in the form of {FASTQ_PATH}/{{barcode}}
.
Looks like the wildcard {barcode}
is filled with barcode49/barcode49.consensus.fasta
, which happened because of two reasons I think:
First (and most important): The workflow does not find a better way to produce the final output. In rule catFasta
, you give an input file which is never described as an output in your workflow. The rule minion
has the directory as an output, but not the file, and it is not perfectly clear for the workflow where to produce this input file.
It therefore infers that the {barcode}
wildcard somehow has to contain this .consensus.fasta
that it has never seen before. This wildcard is then handed over to the top, where the workflow crashes since it cannot find a matching input file.
Second: This initialisation of the wildcard with sth. you don't want is only possible since you did not constrain the wildcard properly. You can for example forbid the wildcard to contain a .
(see wildcard_constraints
here)
However, the main problem is that catFasta
does not find the desired input. I'd suggest changing the output of minion
to "nanopolish/{barcode}/{barcode}.consensus.fasta"
, since the you already take the OUTDIR from the params, that should not hurt your rule here.
Edit: Dummy test example:
QUESTION
I am using snakemake in cluster mode to submit a simple one rule workflow to the HPCC, which runs Torque with several compute nodes. The NFSv4 storage is mounted on /data. There is a link /PROJECT_DIR -> /data/PROJECT_DIR/
I submit the job using:
...ANSWER
Answered 2021-Apr-01 at 11:12Solved by providing a jobscript (--jobscript SCRIPT
) with:
QUESTION
I am running Snakemake with the --use-conda
option. Snakemake successfully creates the environment, which should include pysam. I am able to manually activate this created environment, and within it, run my script split_strands.py
, which imports the module pysam, with no problems. However, when running the Snakemake pipeline, I get the following error log:
ANSWER
Answered 2021-Mar-30 at 22:22Turns out that the newer version of snakemake 6.0.0+ must have some issue with this. I used snakemake 5.8.2 instead and things work just fine. Not sure exactly what's going on under the hood but seems identical to this issue: https://github.com/snakemake/snakemake/issues/883
QUESTION
I have this multi-part HTML form... I know it is a bit messy! I am working on it!
I want the validation to work but only work on the inputs that are not disabled. I want to make it so on the dropdowns when the text inputs are not disabled, it checks them but when they are disabled, they don't be checked!
How?
...ANSWER
Answered 2021-Feb-10 at 19:02You need to just change one line and you can do this with JS itself rather than with your selector:
QUESTION
I have a column of a data frame that has thousands complicate sample names like this
...ANSWER
Answered 2021-Jan-08 at 12:04Here is one way you could do it.
It helps to create a data frame with a header column, so it's what I did below, and I called the column "cats"
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install qsub
deb: apt-get install libssh-dev (Debian, Ubuntu, etc)
rpm: dnf install libssh-devel (Fedora, EPEL) (if dnf is not install, try yum)
brew: brew install libssh (OSX)
If you regularly submit jobs to the remote, it will be extremely useful to set up an SSH key configuration. This way, you don’t need to enter your password every time you execute qsub_lapply. On Windows, you will first need to install Git bash or use the Linux for Windows Subsystem. The first step will be to open bash and create a file which contains the following content, and save it to .ssh/config. Secondly, generate an SSH key. You don’t need to enter a password. If you do, though, you will be asked for this password every time you use your SSH key.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page