bamseqchksum(1)

bamseqchksum - produce checksums for primary data in BAM files

Section 1 biobambam2 bookworm source

Description

BAMSEQCHKSUM

NAME

bamseqchksum - produce checksums for primary data in BAM files

SYNOPSIS

bamseqchksum [options]

DESCRIPTION

bamseqchksum reads a BAM file from stdin, for each record calculates hash digest checksums over

[1]

flags and sequence

[2]

queryname, flags and sequence

[3]

flags, sequence and qualities

[4]

flags, sequence and source data related aux tags

where the flags are the least significant byte of the BAM FLAGS containing only the bits for multiple segments, first segment and last segment. The sequence is reverse complemented, and quality string reversed, before checksumming if the reverse complemented bit is set.

Depending on the chosen hash digest function either the sum modulo some power of 2 or the product modulo a prime number of these checksums is taken over all non-supplementary and non-secondary BAM alignment records. Separate sums or products are reported for combinations of all and QC pass records and for each readgroup.

The following key=value pairs can be given:

verbose=<0>: Valid values are

1:

print progress report on standard error

0:

do not print progress report

inputformat=<bam>: input file format All versions of bamseqchksum come with support for the BAM input format. If the program in addition is linked to the io_lib package, then the following options are valid:

bam:

BAM (see http://samtools.sourceforge.net/SAM1.pdf)

sam:

SAM (see http://samtools.sourceforge.net/SAM1.pdf)

cram:

CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)

reference=: file name of the reference for CRAM input files. If this key is unset, then the CRAM file header will be scanned for obtaining a reference file name.

hash=<crc32prod>: hash digest used to compute checksums. All versions of biobambam support the following functions:
crc32prod:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ31-1. This is the default and only option for biobambam versions up to 0.0.174.

crc32:

checksums are computed via crc32 and combined by summing up modulo 2ˆ32.

md5:

checksums are computed via md5 and combined by summing up modulo 2ˆ128.

crc32prime32:

identical with crc32prod (alternate implementation for testing purposes)

crc32prime64:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

md5prime64:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

crc32prime96:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

md5prime96:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

crc32prime128:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

md5prime128:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

crc32prime160:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

md5prime160:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

crc32prime192:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

md5prime192:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

crc32prime224:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

md5prime224:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

crc32prime256:

checksums are computed via crc32 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

md5prime256:

checksums are computed via md5 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

null:

no checksums are computed and all checksums in the programs output are 0. This option is for performance testing only.

If libmaus is compiled with support for the nettle library, then the following options are available:

sha1:

checksums are computed via sha1 and combined by summing up modulo 2ˆ160.

sha1prime64:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

sha1prime96:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

sha1prime128:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

sha1prime160:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

sha1prime192:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

sha1prime224:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

sha1prime256:

checksums are computed via sha1 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

sha224:

checksums are computed via sha2-224 and combined by summing up modulo 2ˆ224.

sha224prime64:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

sha224prime96:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

sha224prime128:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

sha224prime160:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

sha224prime192:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

sha224prime224:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

sha224prime256:

checksums are computed via sha2-224 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

sha256:

checksums are computed via sha2-256 and combined by summing up modulo 2ˆ256.

sha256prime64:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

sha256prime96:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

sha256prime128:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

sha256prime160:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

sha256prime192:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

sha256prime224:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

sha256prime256:

checksums are computed via sha2-256 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

sha384:

checksums are computed via sha2-384 and combined by summing up modulo 2ˆ384.

sha384prime64:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

sha384prime96:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

sha384prime128:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

sha384prime160:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

sha384prime192:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

sha384prime224:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

sha384prime256:

checksums are computed via sha2-384 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

sha512:

checksums are computed via sha2-512 and combined by summing up modulo 2ˆ512.

sha512prime64:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ64-59.

sha512prime96:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ96-17.

sha512prime128:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ128-159.

sha512prime160:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ160-47.

sha512prime192:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ192-237.

sha512prime224:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ224-63.

sha512prime256:

checksums are computed via sha2-512 and combined over multiple records by multiplication modulo the prime number 2ˆ256-189.

sha512primesums:

checksums are computed via sha2-512 and combined over multiple records by adding modulo the Mersenne prime number 2ˆ521-1.

sha512primesums512:

checksums are computed via sha2-512 and combined over multiple records by adding modulo 2ˆ512-75.

murmur3:

checksums are computed via MurmurHash3_x64_128 (see https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and combined over multiple records by summing modulo 2ˆ128.

murmur3primesums128:

checksums are computed via MurmurHash3_x64_128 (see https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp) and combined over multiple records by summing modulo 2ˆ128+51.

AUTHOR

Written by David Jackson (using code by German Tischler as a template). Extended to hash digests beyond crc32prod by German Tischler.

REPORTING BUGS

Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

Copyright © 2014-2014 David Jackson, © 2014-2014 Genome Research Limited. Copyright © 2009-2016 German Tischler, © 2011-2014 Genome Research Limited. License GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.