beagle(1)
Genotype calling, genotype phasing and imputation of ungenotyped markers
Description
Beagle
NAME
Beagle - Genotype calling, genotype phasing and imputation of ungenotyped markers
SYNOPSIS
java -Xmx[GB]g -jar /usr/share/beagle/beagle.jar [options]
DESCRIPTION
Beagle performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection. Genotypic imputation works on phased haplotypes using a Li and Stephens haplotype frequency model. Beagle also implements the Refined IBD algorithm for detecting homozygosity-by-descent (HBD) and identity-by-descent (IBD) segments.
OPTIONS
Data input/output parameters
gt=filename
Optional
Specifies a VCF file containing a GT (genotype) format field
for each marker. If a genotype contains the phased allele
separator, "|", then Beagle will preserve the
phase of the genotype during the analysis. If you use the gt
argument, all genotypes in the output file will be phased
and non-missing.
gl=filename
Optional
Specifies a VCF file containing a GL or PL (genotype
likelihood) format field for each marker. Any data in the GT
format field will be ignored. If both GL and PL format
fields are present for a marker, the GL format will be
used.
gtgl=filename
Optional
Specifies a VCF file containing a GT, GL or PL format field
for each marker. If a genotype is non-missing, Beagle will
ignore the genotype likelihood. If both GL and PL format
fields are present for a marker, the GL field will be
used.
ref=filename
Optional
Specifies a VCF file containing phased reference genotypes.
See the impute parameter.
out=prefix
Required
Specifies the output filename prefix. The prefix may be an
absolute or relative filename, but it cannot be a directory
name.
excludesamples=filename
Optional
Specifies a file containing non-reference samples (one
sample per line) to be excluded from the analysis and output
files.
excludemarkers=filename
Optional
Specifies a file containing markers (one marker per line) to
be excluded from the analysis and the output files. An
excluded marker identifier can either be an identifier from
the VCF recordâs ID field or a genomic coordinate in
the format: CHROM:POS.
map=filename
Optional
Specifies a PLINK format genetic map on the cM scale. HapMap
GrCh36 and GrCh37 genetic maps in PLINK format are available
for download from the Beagle website. Use of a genetic map
is recommended if you are imputing ungenotyped markers. If
no genetic map is specified, Beagle will assume a constant
recombination rate of 1 cM / Mb.
chrom=chrom:start-end
Optional
Specifies a chromosome or chromosome interval using a
chromosome identifier in the VCF file and the starting and
ending positions of the interval. The entire chromosome, the
beginning of the chromosome, and the end of a chromosome can
be specified by chrom=[chrom],
chrom=[chrom:-end], and chrom=[chrom:start-]
respectively.
maxlr=number_â¥_1
Default = 5000
Specifies the maximum likelihood ratio at a genotype. If M
is the maximum of the likelihoods of each possible genotype,
any likelihood that is less than (M â maxlr) is set to
0.0 to improve computational efficiency.
General parameters
nthreads=positive_integer
Default: machine-dependent
Specifies the number of threads of execution. If no
nthreads parameter is specified, the nthreads
parameter will be set equal to the number of CPU cores on
the host machine.
lowmem=true/false
Default = false
Specifies whether a memory efficient algorithm should be
used. The memory efficient algorithm increases run-time by a
factor less than 2.0.
window=positive_integer
Default = 50000
Specifies the number of markers to include in each sliding
window. The window parameter must be at least twice
as large as the overlap parameter. The window
parameter controls the amount of memory used in the
analysis. For human data, it is recommended that the
window parameter be greater than or equal to the
typical number of markers in 5 cM.
overlap=positive_integer
Default = 3000
Specifies the number of markers of overlap between sliding
windows. For human data, it is recommended that the overlap
be set to the typical number of markers in 0.5 cM (when
ibd=false) or 2.0 cM (when ibd=true).
seed=integer
Default = -99999
Specifies the seed for the random number generator.
Phasing and imputation parameters
niterations=non-negative_integer
Default = 5
Specifies the number of phasing iterations. The phasing
iterations are preceded by 10 burn-in iterations which carry
out the Beagle version 4.0 phasing algorithm. If you want to
phase your data with the Beagle 4.0 phasing algorithm, use
niterations=0. Accuracy and compute time increase
with the number of iterations.
impute=true/false
Default = true
Specifies whether markers that are present in the reference
panel but absent in your data will be imputed. This option
has no effect if the ref and gt arguments are
not used.
gprobs=true/false
Default = false
Specifies whether a GP (genotype probability) format field
will be included in the output VCF file when imputing
ungenotyped markers. By default, a GP fields is not printed
because a DS (alternate allele dose) format field is always
printed when imputing ungenotyped markers.
ne=integer
Default = 1000000
Specifies the effective population size when imputing
ungenotyped markers. The default value is suitable for a
large outbred human population. Smaller values in the
hundreds or thousands for the ne parameter are
suggested for inbred human and animal populations.
err=non-negative_number
Default = 0.0001
Specifies the allele miscall rate. The default value should
give good results for most sequence and SNP array data.
cluster=non-negative_number
Default = 0.005
Specifies the maximum cM distance between individual markers
that are combined into an aggregate marker when imputing
ungenotyped markers.
IBD parameters
ibd=true/false
Default = false
Specifies whether IBD analysis will be performed when the
gt argument is used.
ibdlod=non-negative_integer
Default = 3.0
Specifies the minimum LOD score for reported IBD.
ibdscale=non-negative_number
Default: data-dependent
Specifies the scale parameter used to build the haplotype
frequency model for IBD analysis. If no ibdscale
parameter is specified the scale parameter for the IBD
analysis will be set to max{2, sqrt[sample size]/100}, which
we have found to work well for outbred populations.
ibdtrim=non-negative_integer
Default = 40
Specifies the number of markers trimmed from the end of a
shared haplotype when testing for IBD. Note: The default
ibdtrim parameter is designed for European samples
genotyped with a 1M SNP array (˜ 1 marker per 3 kb). For
human SNP array data, it is recommended to set the
ibdtrim parameter to the typical number of markers in
a 0.15 cM region. Pilot studies of randomly selected genomic
regions can be used to fine-tune the values of the
ibdtrim parameter.
SEE ALSO
https://faculty.washington.edu/browning/beagle/beagle.html
AUTHOR
Beagle was written by Brian L. Browning.
This manual page was written by Dylan Aïssi <bob.dybian@gmail.com>, for the Debian project (but may be used by others).