phastMotif(1)
Predicts motifs from a set of multiple alignments. Uses
Description
PHASTMOTIF
NAME
phastMotif - Predicts motifs from a set of multiple alignments. Uses
DESCRIPTION
Predicts motifs from a set of multiple alignments. Uses an EM algorithm similar to that of MEME, but a motif is defined by phylogenetic models rather than multinomial distributions. The specified multiple alignments may actually be single sequences (see -m). Various parameters control the strategy for initialization (see below). Currently, the F81 substitution model is assumed.
USAGE
phastMotif [-t <treefile>] [OPTIONS] <msa_list>
OPTIONS
|
-t <file> (Required unless -m or -p) Use specified tree topology for all phylogenetic models (Newick format). |
-i <fmt>
Input format for alignment. May be FASTA, PHYLIP, MPM, SS, or MAF (default FASTA).
|
-b <file> Read background model from specified file (.mod format). |
By default, the background model is estimated in a preprocessing step, by pooling all data.
|
-s |
|||
|
Estimate a separate background model for each multiple alignment. (Not yet implemented.) | |||
|
-k |
<size> Learn motifs of the specified size (default is 10).
-B <n>
Report best <n> motifs (default 3).
|
-m |
MEME mode. Use multinomial rather than phylogenetic models. Causes multiple alignments to be ignored -- any gaps are discarded and all sequences are assumed independent. | ||
|
-d |
<+lst> Use the discriminative training method of Segal et al. (RECOMB’02), rather than EM. The specified list
should contain the filenames from msa_list that are to be considered *positive* examples (containing the desired motif); all others will be considered negative examples. Can be used with or without -m. -p Use "profile" models rather than phylogenetic models (characters in each alignment column assumed independent). The resulting model is a hybrid of the full model and MEME’s model. Essentially, it uses the multiple alignments but not the phylogeny. NOT YET IMPLEMENTED. -n <n> Perform <n> random restarts and report the motif with highest likelihood. Default number is 10. Ignored with -I, -P, and -R unless -S is specified (see below).
|
-I <mlst> Run the algorithm after a "soft" initialization with |
each of the consensus sequences in the specified list. At each position, <pc> pseudocounts (see -c) are given to the consensus base and 1 pseudocount to all other bases. Each string must have length at most equal to the size of the motif. If shorter, it is used as a "seed" for a motif, with flanking positions treated as wildcards. -P <x,y> Initialize with the x most prevalent y-tuples. A soft initialization is performed, as above. If y is less than the motif size, y-tuples are used as a "seed" for a motif, as above. -R <x,y> Initialize with a random sample of x y-tuples. A soft initialization is performed, as above. If y is less than the motif size, y-tuples are used as a "seed" for a motif, as above. -w <n> (for use with -I, -P, -R) Winnow initialization sequences to the top <n> based on the unmaximized likelihood.
-c <pc>
(for use with -I, -P, -R) Number of pseudocounts for consensus bases (default 5). -S (for use with -I, -P, -R) Instead of doing a deterministic initialization based on a consensus sequence, sample parameters from a Dirichlet distribution defined by the pseudocounts (see -c). In this case, random restarts are performed, as specified by -n.
|
-o <pref> Use the specified prefix for all output files (dflt. "phastm"). -H Produce HTML formatted output, in addition to ordinary output. One file is produced per predicted motif, as well as a single HTML-formatted summary file. | |
|
-D |
Produce a BED file with predicted motifs, for use in the UCSC browser. Currently, sequence names must be formatted such as "chr10:102553847-102554897+", with the final ’+’ or ’-’ indicating strand.
|
-x |
(For use with -H or -D) Suppress ordinary output to stdout. |
|||
|
-h |
Print this help message. |