psa(5)
biological sequence alignment file format
Description
PSA
NAME
psa - biological sequence alignment file format
DESCRIPTION
psa is an output format used by the pftools package to describe alignments between biological sequences (DNA or protein) and PROSITE profiles.
psa is apparented to the widely used biological sequence file format fasta. Nevertheless it does not only describe a biological sequence, it is especially used to include information of alignments between a motif descriptor like a PROSITE profile and a given sequence. This information is included in the header and reflected in the structure of the sequence following the header line.
SYNTAX
Each sequence in
a psa alignment file or output must be preceded by a
fasta header line.
The general syntax of such a fasta header line is as
follows:
>seq_id [ free_text ]
The header must
start with a ’>’ character which is
directly followed by the seq_id field. This field is
interpreted by most programs as the sequence’s
identifier and/or accession number. It ends at
the first encountered whitespace character.
The pftools programs will use the free_text to
add information about the match score, position and
description of the sequence or motif. Please refer to the
man page of the corresponding programs for further
information about the output formats.
The header can only extend over one line. The following
lines up to a new line starting with a
’>’ character or the end of the file
are interpreted as sequence data.
The line
following the header, starts the alignment data between a
sequence and a PROSITE profile. This data can span
over several lines of different length.
The data is formed by upper or lower-case
characters of the corresponding sequence alphabet (DNA or
protein). The gap characters ’.’ and
’-’ are also supported.
The alignment always has at least the length of the matching
profile. Insertions or deletions detected during the
motif/sequence alignment step will vary the length of the
data reported, and can be identified using the following
conventions:
upper-case character
Any upper-case character of the sequence alphabet identifies a match position between the sequence and the motif descriptor.
lower-case character
A lower-case character of the sequence alphabet is used to symbolize an insertion in the sequence compared to the motif descriptor.
’-’ (dash) character
A ’-’ character in the output identifies the presence of a deletion in the sequence compared to the motif descriptor.
EXAMPLES
|
(1) |
>YD28_SCHPO 556 pos. 291 - 332 sp|Q10256|YD28_SCHPO |
PTDPGlnsKIAQLVSMGFDPLEAAQALDAANGDLDVAASFLL--
This is an
example of the output produced by pfsearch(1) using
the ’-x’ (i.e. psa output) option. The
first line starting with the ’>’
character is the fasta header. It also contains
information about the raw score of the alignment as well as
its position in the input sequence.
On the next line you find the alignment proper. Starting at
position 6, we can find an insertion of the
’lns’ residues in the sequence compared
to the motif. The last two positions of the motif are not
present in the sequence (i.e. they are deleted). This
is indicated by the presence of two ’-’
(dash) characters at the end of the alignment.
NOTES
|
(1) |
The xpsa(5) format defines a more strict syntax of the header line, allowing the exchange of information between different sequence analysis tools. It uses keyword=value pairs to annotate the current match between a sequence and a motif descriptor. This syntax can be easily parsed and extended, according to the needs of bioinformatic tools. | ||
|
(2) |
The current implementation of the pftools package does not use the ’.’ (dot) character in the psa output. Nevertheless psa2msa(1) will read it and interpret it in the same manner as the ’-’ (dash) character. |
SEE ALSO
xpsa(5), pfsearch(1), pfscan(1), pfw(1), pfmake(1), psa2msa(1)
AUTHOR
This manual page
was originally written by Volker Flegel.
The pftools package was developed by Philipp Bucher.
Any comments or suggestions should be addressed to
<pftools@sib.swiss>.