art_SOLiD(1)

Simulation of Applied Biosystems SOLiD Sequencing

Description

ART_SOLID

NAME

art_SOLiD - Simulation of Applied Biosystems SOLiD Sequencing

DESCRIPTION

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data.

art_SOLiD can be used for Simulation of Applied Biosystems SOLiD Sequencing.

USAGE

SINGLE-END (F3 READ) SIMULATION

art_SOLiD [ options ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <LEN_READ> <FOLD_COVERAGE>

MATE-PAIR READS (F3-R3 PAIR) SIMULATION

art_SOLiD [ options ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <LEN_READ> <FOLD_COVERAGE> <MEAN_FRAG_LEN> <STD_DEV>

PAIRED-END READS (F3-F5 PAIR) SIMULATION

art_SOLiD [ options ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <LEN_READ_F3> <LEN_READ_F5> <FOLD_COVERAGE> <MEAN_FRAG_LEN> <STD_DEV>

AMPLICON SEQUENCING SIMULATION

art_SOLiD [ options ] -A s <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <LEN_READ> <READS_PER_AMPLICON>

art_SOLiD [ options ] -A m <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <LEN_READ> <READ_PAIRS_PER_AMPLICON>

art_SOLiD [ options ] -A p <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <LEN_READ_F3> <LEN_READ_F5> <READ_PAIRS_PER_AMPLICON>

OPTIONS

MANDATORY OPTIONS

INPUT_SEQ_FILE - filename of DNA/RNA reference sequences in
FASTA format
OUTPUT_FILE_PREFIX - prefix or directory for all output read data
files
FOLD_COVERAGE - fold of read coverage over the reference
sequences
LEN_READ - length of F3/R3 reads
LEN_READ_F3 - length of F3 reads for paired-end read
simulation
LEN_READ_F5 - length of F5 reads for paired-end read
simulation
READS_PER_AMPLICON - number of reads per amplicon
READ_PAIRS_PER_AMPLICON - number of read pairs per amplicon
MEAN_FRAG_LEN - mean DNA/RNA fragment size for
matepair/paired-end read simulation
STD_DEV - standard deviation of the DNA/RNA fragment
sizes for matepair/paired-end read simulation

OPTIONAL PARAMETERS

-A specify the read type for amplicon sequencing simulation (s:single-end,
m: matepair, p: paired-end)
-M
indicate to use CIGAR ’M’ instead of ’=/X’ for alignment match/mismatch
-s
indicate to generate a SAM alignment file
-r
specify the random seed for the simulation
-f
specify the scale factor adjusting error rate (e.g., -f 0 for zero-error
rate simulation)
-p
specify user’s own read profile for simulation

EXAMPLES

1) singl-end 25bp reads simulation at 10X coverage

art_SOLiD -s seq_reference.fa ./outdir/single_dat 25 10

2) singl-end 75bp reads simulation at 20X coverage with user’s error
profile

art_SOLiD -s -p ../SOLiD_profiles/profile_pseudo ./seq_reference.fa ./dat_userProfile 75 20

3) matepair 35bp (F3-R3) reads simulation at 20X coverage with DNA/RNA MEAN
fragment size 2000bp and STD 50

art_SOLiD -s seq_reference.fa ./outdir/matepair_dat 35 20 2000 50

4) matepair reads simulation with a fixed random seed

art_SOLiD -r 777 -s seq_reference.fa ./outdir/matepair_fs 50 10 1500 50

5) paired-end reads (75bp F3, 35bp F5) simulation with the MEAN fragment
size 250 and STD 10 at 20X coverage

art_SOLiD -s seq_reference.fa ./outdir/paired_dat 75 35 50 250 10

6) amplicon sequencing with 25bp single-end reads at 100 reads per amplicon

art_SOLiD -A s -s amp_reference.fa ./outdir/amp_single 25 100

7) amplicon sequencing with 50bp matepair reads at 80 read pairs per
amplicon

art_SOLiD -A m -s amp_reference.fa ./outdir/amp_matepair 50 80

8) amplicon sequencing with paired-end reads (35bp F3, 25bp F5 reads) at 50
pairs per amplicon

art_SOLiD -A p -s amp_reference.fa ./outdir/amp_pair 35 25 50

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.