cd-hit-2d-para(1)
divide a big clustering job into pieces to run cd-hit-2d or cd-hit-est-2d jobs
Description
CD-HIT-2D-PARA.PL
NAME
cd-hit-2d-para.pl - divide a big clustering job into pieces to run cd-hit-2d or cd-hit-est-2d jobs
SYNOPSIS
cd-hit-2d-para.pl options
DESCRIPTION
This script divide a big clustering job into pieces and submit jobs to remote computers over a network to make it parallel. After all the jobs finished, the script merge the clustering results as if you just run a single cd-hit-2d or cd-hit-est-2d.
You can also use it to divide big jobs on a single computer if your computer does not have enough RAM (with -L option).
Requirements:
1 When run this script over a network, the directory where you
run the scripts and the input files must be available on all the remote hosts with identical path.
2 If you choose "ssh" to submit jobs, you have to have
passwordless ssh to any remote host, see ssh manual to know how to set up passwordless ssh.
3 I suggest to use queuing system instead of ssh,
I currently support PBS and SGE
4 cd-hit-2d cd-hit-est-2d cd-hit-div cd-hit-div.pl must be
in same directory where this script is in.
Options
|
-i |
input filename for 1st db in fasta format, required |
|||
|
-i2 |
input filename for 2nd db in fasta format, required
|
-o |
|||
|
output filename, required | |||
|
--P |
program, "cd-hit-2d" or "cd-hit-est-2d", default "cd-hit-2d" | ||
|
--B |
filename of list of hosts, requred unless -Q or -L option is supplied | ||
|
--L |
number of cpus on local computer, default 0 when you are not running it over a cluster, you can use this option to divide a big clustering jobs into small pieces, I suggest you just use "--L 1" unless you have enough RAM for each cpu | ||
|
--S |
Number of segments to split 1st db into, default 2 | ||
|
--S2 |
Number of segments to split 2nd db into, default 8
|
--Q |
|||
|
number of jobs to submit to queue queuing system, default 0 by default, the program use ssh mode to submit remote jobs | |||
|
--T |
type of queuing system, "PBS", "SGE" are supported, default PBS | ||
|
--R |
restart file, used after a crash of run | ||
|
-h |
print this help |
More cd-hit-2d/cd-hit-est-2d options can be speicified in command line
Questions, bugs, contact Weizhong Li at liwz@sdsc.edu