hhblits(1)

fast homology detection method to iteratively search a HMM database

Section 1 hhsuite bookworm source

Description

HHBLITS

NAME

hhblits - fast homology detection method to iteratively search a HMM database

SYNOPSIS

hhblits -i query [options]

DESCRIPTION

HHblits 3.3.0: HMM-HMM-based lightning-fast iterative sequence search HHblits is a sensitive, general-purpose, iterative sequence search tool that represents both query and database sequences by HMMs. You can search HHblits databases starting with a single query sequence, a multiple sequence alignment (MSA), or an HMM. HHblits prints out a ranked list of database HMMs/MSAs and can also generate an MSA by merging the significant database HMMs/MSAs onto the query MSA.

Steinegger M, Meier M, Mirdita M, V??hringer H, Haunsberger S J, and S??ding J (2019) HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics, doi:10.1186/s12859-019-3019-7 (c) The HH-suite development team
-i
<file>

input/query: single sequence or multiple sequence alignment (MSA) in a3m, a2m, or FASTA format, or HMM in hhm format

<file> may be ’stdin’ or ’stdout’ throughout.

OPTIONS

-d <name>

database name (e.g. uniprot20_29Feb2012) Multiple databases may be specified with ’-d <db1> -d <db2> ...’

-n

[1,8] number of iterations (default=2)

-e

[0,1] E-value cutoff for inclusion in result alignment (def=0.001)

Input alignment format:

-M a2m

use A2M/A3M (default): upper case = Match; lower case = Insert;

’ -’ = Delete; ’.’ = gaps aligned to inserts (may be omitted)

-M first

use FASTA: columns with residue in 1st sequence are match states

-M [0,100]

use FASTA: columns with fewer than X% gaps are match states

-tags/-notags

do NOT / do neutralize His-, C-myc-, FLAG-tags, and trypsin recognition sequence to background distribution (def=-notags)

Output options:

-o <file>

write results in standard format to file (default=<infile.hhr>)

-oa3m <file>

write result MSA with significant matches in a3m format

-opsi <file>

write result MSA of significant matches in PSI-BLAST format

-ohhm <file>

write HHM file for result MSA of significant matches

-oalis <name>

write MSAs in A3M format after each iteration

-blasttab <name> write result in tabular BLAST format (compatible to -m 8 or -outfmt 6 output)

1

2 3 4 5 6 8 9 10 11 12

’query target #match/tLen #mismatch #gapOpen qstart qend tstart tend eval score’

-add_cons

generate consensus sequence as master sequence of query MSA (default=don’t)

-hide_cons

don’t show consensus sequence in alignments (default=show)

-hide_pred

don’t show predicted 2ndary structure in alignments (default=show)

-hide_dssp

don’t show DSSP 2ndary structure in alignments (default=show)

-show_ssconf

show confidences for predicted 2ndary structure in alignments

-Ofas <file>

write pairwise alignments in FASTA xor A2M (-Oa2m) xor A3M (-Oa3m) format

-seq <int>

max. number of query/template sequences displayed (default=1)

-aliw <int>

number of columns per line in alignment list (default=80)

-p [0,100]

minimum probability in summary and alignment list (default=20)

-E [0,inf[

maximum E-value in summary and alignment list (default=1E+06)

-Z <int>

maximum number of lines in summary hit list (default=500)

-z <int>

minimum number of lines in summary hit list (default=10)

-B <int>

maximum number of alignments in alignment list (default=500)

-b <int>

minimum number of alignments in alignment list (default=10)

Prefilter options
-noprefilt

disable all filter steps

-noaddfilter

disable all filter steps (except for fast prefiltering)

-maxfilt

max number of hits allowed to pass 2nd prefilter (default=20000)

-min_prefilter_hits

min number of hits to pass prefilter (default=100)

-prepre_smax_thresh

min score threshold of ungapped prefilter (default=10)

-pre_evalue_thresh

max E-value threshold of Smith-Waterman prefilter score (default=1000.0)

-pre_bitfactor

prefilter scores are in units of 1 bit / pre_bitfactor (default=4)

-pre_gap_open

gap open penalty in prefilter Smith-Waterman alignment (default=20)

-pre_gap_extend

gap extend penalty in prefilter Smith-Waterman alignment (default=4)

-pre_score_offset

offset on sequence profile scores in prefilter S-W alignment (default=50)

Filter options applied to query MSA, database MSAs, and result MSA

-all

show all sequences in result MSA; do not filter result MSA

-interim_filter NONE|FULL

filter sequences of query MSA during merging to avoid early stop (default: FULL)

NONE: disables the intermediate filter FULL: if an early stop occurs compare filter seqs in an all vs. all comparison

-id

[0,100] maximum pairwise sequence identity (def=90)

-diff [0,inf[

filter MSAs by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 Zero and non-numerical values turn off the filtering. (def=1000)

-cov

[0,100] minimum coverage with master sequence (%) (def=0)

-qid

[0,100] minimum sequence identity with master sequence (%) (def=0)

-qsc

[0,100] minimum score per column with master sequence (default=-20.0)

-neff [1,inf]

target diversity of multiple sequence alignment (default=off)

-mark

do not filter out sequences marked by ">@"in their name line

HMM-HMM alignment options:

-norealign

do NOT realign displayed hits with MAC algorithm (def=realign)

-realign_old_hits

realign hits from previous iterations

-mact [0,1[

posterior prob threshold for MAC realignment controlling greediness at alignment ends: 0:global >0.1:local (default=0.35)

-glob/-loc

use global/local alignment mode for searching/ranking (def=local)

-realign

realign displayed hits with max. accuracy (MAC) algorithm

-realign_max <int>

realign max. <int> hits (default=500)

-ovlp <int>

banded alignment: forbid <ovlp> largest diagonals |i-j| of DP matrix (def=0)

-alt <int>

show up to this many alternative alignments with raw score > smin(def=4)

-premerge <int> merge <int> hits to query MSA before aligning remaining hits (def=3)

-smin <float>

minimum raw score for alternative alignments (def=20.0)

-shift [-1,1]

profile-profile score offset (def=-0.03)

-corr [0,1]

weight of term for pair correlations (def=0.10)

-sc

<int> amino acid score (tja: template HMM at column j) (def=1)

0

= log2 Sum(tja*qia/pa) (pa: aa background frequencies)

1

= log2 Sum(tja*qia/pqa) (pqa = 1/2*(pa+ta) )

2

= log2 Sum(tja*qia/ta) (ta: av. aa freqs in template)

3

= log2 Sum(tja*qia/qa) (qa: av. aa freqs in query)

5

local amino acid composition correction

-ssm {0,..,4}

0: no ss scoring 1,2: ss scoring after or during alignment [default=2] 3,4: ss scoring after or during alignment, predicted vs. predicted

-ssw [0,1]

weight of ss score (def=0.11)

-ssa [0,1]

ss confusion matrix = (1-ssa)*I + ssa*psipred-confusion-matrix [def=1.00)

-wg

use global sequence weighting for realignment!

Gap cost options:

-gapb [0,inf[

Transition pseudocount admixture (def=1.00)

-gapd [0,inf[

Transition pseudocount admixture for open gap (default=0.15)

-gape [0,1.5]

Transition pseudocount admixture for extend gap (def=1.00)

-gapf ]0,inf]

factor to increase/reduce gap open penalty for deletes (def=0.60)

-gapg ]0,inf]

factor to increase/reduce gap open penalty for inserts (def=0.60)

-gaph ]0,inf]

factor to increase/reduce gap extend penalty for deletes(def=0.60)

-gapi ]0,inf]

factor to increase/reduce gap extend penalty for inserts(def=0.60)

-egq

[0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00)

-egt

[0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00)

Pseudocount (pc) options:

Context specific hhm pseudocounts:

-pc_hhm_contxt_mode {0,..,3}

position dependence of pc admixture ’tau’ (pc mode, default=2)

0: no pseudo counts:

tau = 0

1: constant

tau = a

2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)ˆc) 3: CSBlast admixture: tau = a(1+b)/(Neff[i]+b) (Neff[i]: number of effective seqs in local MSA around column i)

-pc_hhm_contxt_a

[0,1] overall pseudocount admixture (def=0.9)

-pc_hhm_contxt_b

[1,inf[ Neff threshold value for mode 2 (def=4.0)

-pc_hhm_contxt_c

[0,3] extinction exponent c for mode 2 (def=1.0)

Context independent hhm pseudocounts (used for templates; used for query if contxt file is not available):

-pc_hhm_nocontxt_mode {0,..,3}

position dependence of pc admixture ’tau’ (pc mode, default=2)

0: no pseudo counts:

tau = 0

1: constant

tau = a

2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)ˆc) (Neff[i]: number of effective seqs in local MSA around column i)

-pc_hhm_nocontxt_a

[0,1] overall pseudocount admixture (def=1.0)

-pc_hhm_nocontxt_b

[1,inf[ Neff threshold value for mode 2 (def=1.5)

-pc_hhm_nocontxt_c

[0,3] extinction exponent c for mode 2 (def=1.0)

Context specific prefilter pseudocounts:

-pc_prefilter_contxt_mode {0,..,3}

position dependence of pc admixture ’tau’ (pc mode, default=3)

0: no pseudo counts:

tau = 0

1: constant

tau = a

2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)ˆc) 3: CSBlast admixture: tau = a(1+b)/(Neff[i]+b) (Neff[i]: number of effective seqs in local MSA around column i)

-pc_prefilter_contxt_a

[0,1] overall pseudocount admixture (def=0.8)

-pc_prefilter_contxt_b

[1,inf[ Neff threshold value for mode 2 (def=2.0)

-pc_prefilter_contxt_c

[0,3] extinction exponent c for mode 2 (def=1.0)

Context independent prefilter pseudocounts (used if context file is not available):

-pc_prefilter_nocontxt_mode {0,..,3}

position dependence of pc admixture ’tau’ (pc mode, default=2)

0: no pseudo counts:

tau = 0

1: constant

tau = a

2: diversity-dependent: tau = a/(1+((Neff[i]-1)/b)ˆc) (Neff[i]: number of effective seqs in local MSA around column i)

-pc_prefilter_nocontxt_a

[0,1] overall pseudocount admixture (def=1.0)

-pc_prefilter_nocontxt_b

[1,inf[ Neff threshold value for mode 2 (def=1.5)

-pc_prefilter_nocontxt_c

[0,3] extinction exponent c for mode 2 (def=1.0)

Context-specific pseudo-counts:

-nocontxt

use substitution-matrix instead of context-specific pseudocounts

-contxt <file> context file for computing context-specific pseudocounts (default=)

-csw

[0,inf] weight of central position in cs pseudocount mode (def=1.6)

-csb

[0,1] weight decay parameter for positions in cs pc mode (def=0.9)

Other options:

-v <int>

verbose mode: 0:no screen output 1:only warings 2: verbose (def=2)

-neffmax ]1,20] skip further search iterations when diversity Neff of query MSA

becomes larger than neffmax (default=20.0)

-cpu <int>

number of CPUs to use (for shared memory SMPs) (default=2)

-scores <file> write scores for all pairwise comparisons to file

-filter_matrices filter matrices for similarity to output at most 100 matrices

-atab

<file> write all alignments in tabular layout to file

-maxseq <int>

max number of input rows (def=65535)

-maxres <int>

max number of HMM columns (def=20001)

-maxmem [1,inf[ limit memory for realignment (in GB) (def=3.0)

EXAMPLES

hhblits -i query.fas -o query.hhr -d ./uniclust30

hhblits -i query.fas -o query.hhr -oa3m query.a3m -n 1 -d ./uniclust30

Download databases from <http://wwwuser.gwdg.de/˜compbiol/data/hhsuite/databases/hhsuite_dbs/>.