hhfilter(1)

filter an alignment by maximum sequence identity of match states and minimum coverage

Section 1 hhsuite bookworm source

Description

HHFILTER

NAME

hhfilter - filter an alignment by maximum sequence identity of match states and minimum coverage

SYNOPSIS

hhfilter -i infile -o outfile [options]

DESCRIPTION

HHfilter 3.3.0 Filter an alignment by maximum pairwise sequence identity, minimum coverage, minimum sequence identity, or score per column to the first (seed) sequence.n(c) The HH-suite development team Steinegger M, Meier M, Mirdita M, V??hringer H, Haunsberger S J, and S??ding J (2019) HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics, doi:10.1186/s12859-019-3019-7
-i
<file>

read input file in A3M/A2M or FASTA format

-o <file>

write to output file in A3M format

-a <file>

append to output file in A3M format

OPTIONS

-v <int>

verbose mode: 0:no screen output 1:only warings 2: verbose

-id

[0,100] maximum pairwise sequence identity (%) (def=90)

-diff [0,inf[

filter MSA by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 (def=0)

-cov

[0,100] minimum coverage with query (%) (def=0)

-qid

[0,100] minimum sequence identity with query (%) (def=0)

-qsc

[0,100] minimum score per column with query (def=-20.0)

-neff [1,inf]

target diversity of alignment (default=off)

Input alignment format:

-M a2m

use A2M/A3M (default): upper case = Match; lower case = Insert; ’-’ = Delete; ’.’ = gaps aligned to inserts (may be omitted)

-M first

use FASTA: columns with residue in 1st sequence are match states

-M [0,100]

use FASTA: columns with fewer than X% gaps are match states

Other options:

-maxseq <int>

max number of input rows (def=65535)

-maxres <int>

max number of HMM columns (def=20001)

Example: hhfilter -id 50 -i d1mvfd_.a2m -o d1mvfd_.fil.a2m