run_discoSnp++.sh(1)

pipelining kissnp2 and kissreads for calling SNPs and small indels from NGS reads without the need of a reference genome

Section 1 discosnp bookworm source

Description

RUN_DISCOSNP++.SH

NAME

run_discoSnp++.sh - pipelining kissnp2 and kissreads for calling SNPs and small indels from NGS reads without the need of a reference genome

SYNOPSIS

run_discoSnp++.sh -r read_file_of_files [OPTIONS]

DESCRIPTION

run_discoSnp++.sh, a pipelining kissnp2 and kissreads for calling SNPs and small indels from NGS reads without the need of a reference genome Version 2.3.X

MANDATORY:

-r read_file_of_files

Example: -r bank.fof with bank.fof containing the two lines

data_sample/reads_sequence1.fasta

data_sample/reads_sequence2.fasta.gz

DISCOSNP++ OPTIONS:

-g: reuse a previously created graph (.h5 file) with same prefix and same k and c parameters. -b value.

0: forbid variants for which any of the two paths is branching (high
precision, lowers the recall in complex genomes). Default value

1: (smart branching) forbid SNPs for which the two paths are branching (e.g. the two paths can be created either with a ’A’ or a ’C’ at the same position 2: No limitation on branching (lowers the precision, high recall)

-s value. In b2 mode only: maximal number of symmetrical croasroads traversed while trying to close a bubble. Default: no limit -D value. discoSnp++ will search for deletions of size from 1 to D included. Default=100 -a value. Maximal size of ambiguity of INDELs. INDELS whose ambiguity is higher than this value are not output [default ’20’] -P value. discoSnp++ will search up to P SNPs in a unique bubble. Default=1 -p prefix. All out files will start with this prefix. Default="discoRes" -l: remove low complexity bubbles -k value. Set the length of used kmers. Must fit the compiled value. Default=31 -t: extend found polymorphisms with unitigs - Forced usage when using discoSnpRad -T: extend found polymorphisms with contigs -c value. Set the minimal coverage per read set: Used by kissnp2 (don’t use kmers with lower coverage) and kissreads (read coherency threshold). This coverage can be automatically detected per read set or specified per read set, see the documentation. Default=auto -C value. Set the maximal coverage for each read set: Used by kissnp2 (don’t use kmers with higher coverage). Default=2ˆ31-1 -d value. Set the number of authorized substitutions used while mapping reads on found SNPs (kissreads). Default=1 -n: do not compute the genotypes -u: max number of used threads -v: verbose 0 (avoids progress output) or 1 (enables progress output) -- default=1.

REFERENCE GENOME AND/OR VCF CREATION OPTIONS

-G: reference genome file (fasta, fastq, gzipped or nor). In absence of this file the VCF created by VCF_creator won’t contain mapping related results. -R: use the reference file also in the variant calling, not only for mapping results -B: bwa path. e.g. /home/me/my_programs/bwa-0.7.12/ (note that bwa must be pre-compiled)

Optional unless option -G used and bwa is not in the binary path.

-M: Maximal number of mapping errors during BWA mapping phase.

Useless unless mapping on reference genome is required (option -G). Default=4.

-h: Prints this message and exist -e: map SNP predictions on reference genome with their extensions. - Forced usage when using discoSnpRad

Any further question: read the readme file or contact us via the Biostar forum: https://www.biostars.org/t/discosnp/

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.