fastq-mcf(1)

ea-utils: detect levels of adapter presence, compute likelihoods and locations of the adapters

Section 1 ea-utils bookworm source

Description

FASTQ-MCF

NAME

fastq-mcf - ea-utils: detect levels of adapter presence, compute likelihoods and locations of the adapters

SYNOPSIS

fastq-mcf [options] <adapters.fa> <reads.fq> [mates1.fq ...]

DESCRIPTION

Version: 1.04.676

Detects levels of adapter presence, computes likelihoods and locations (start, end) of the adapters. Removes the adapter sequences from the fastq file(s).

Stats go to stderr, unless -o is specified.

Specify -0 to turn off all default settings

If you specify multiple ’paired-end’ inputs, then a -o option is required for each. IE: -o read1.clip.q -o read2.clip.fq

OPTIONS

-h

This help

-o FIL

Output file (stats to stdout)

-s N.N

Log scale for adapter minimum-length-match (2.2)

	-t N		% occurance threshold before adapter clipping (0.25)
	-m N		Minimum clip length, overrides scaled auto (1)
	-p N		Maximum adapter difference percentage (10)
	-l N		Minimum remaining sequence length (19)
	-L N		Maximum remaining sequence length (none)
	-D N		Remove duplicate reads : Read_1 has an identical N bases (0)
	-k N		sKew percentage-less-than causing cycle removal (2)
	-x N		’N’ (Bad read) percentage causing cycle removal (20)
	-q N		quality threshold causing base removal (10)
	-w N		window-size for quality trimming (1)
	-H		remove >95% homopolymer reads (no)
	-X		remove low complexity reads (no)
	-0		Set all default parameters to zero/do nothing
	-U\|u		Force disable/enable Illumina PF filtering (auto)
	-P N		Phred-scale (auto)
	-R		Don’t remove N’s from the fronts/ends of reads
	-n		Don’t clip, just output what would be done
	-C N		Number of reads to use for subsampling (300k)
	-S		Save all discarded reads to ’.skip’ files
	-d		Output lots of random debugging stuff

Quality adjustment options:

--cycle-adjust

CYC,AMT Adjust cycle CYC (negative = offset from end) by amount AMT

--phred-adjust

SCORE,AMT Adjust score SCORE by amount AMT

--phred-adjust-max

SCORE Adjust scores > SCORE to SCOTE

Filtering options*:

--[mate-]qual-mean

NUM Minimum mean quality score

--[mate-]qual-gt

NUM,THR At least NUM quals > THR

--[mate-]max-ns

NUM Maxmium N-calls in a read (can be a %)

--[mate-]min-len

NUM Minimum remaining length (same as -l)

--homopolymer-pct

PCT Homopolymer filter percent (95)

--lowcomplex-pct

PCT Complexity filter percent (95)

If mate- prefix is used, then applies to second non-barcode read only

Adapter files are ’fasta’ formatted:

Specify n/a to turn off adapter clipping, and just use filters

Increasing the scale makes recognition-lengths longer, a scale of 100 will force full-length recognition of adapters.

Adapter sequences with _5p in their label will match ’end’s, and sequences with _3p in their label will match ’start’s, otherwise the ’end’ is auto-determined.

Skew is when one cycle is poor, ’skewed’ toward a particular base. If any nucleotide is less than the skew percentage, then the whole cycle is removed. Disable for methyl-seq, etc.

Set the skew (-k) or N-pct (-x) to 0 to turn it off (should be done for miRNA, amplicon and other low-complexity situations!)

Duplicate read filtering is appropriate for assembly tasks, and never when read length < expected coverage. -D 50 will use 4.5GB RAM on 100m DNA reads - be careful. Great for RNA assembly.

*Quality filters are evaluated after clipping/trimming

Homopolymer filtering is a subset of low-complexity, but will not be separately tracked unless both are turned on.