QTLtools-mbv(1)

Match genotypes in a VCF to a BAM file

Section 1 qtltools bookworm source

Description

QTLtools-mbv

NAME

QTLtools mbv - Match genotypes in a VCF to a BAM file

SYNOPSIS

QTLtools mbv --bam [sample.bam|sample.sam|sample.cram] --vcf [in.vcf|in.bcf|in.vcf.gz] --out output_file [OPTIONS]

DESCRIPTION

This mode checks if the genotypes in the VCF are observed in the RNAseq reads in the BAM file to quickly solve sample mislabeling and detect cross-sample contamination and PCR amplification bias. The details of the method are described <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044394/>. In brief, we measure, for each individual in the VCF, the proportions of heterozygous and homozygous genotypes for which both alleles are captured by the sequencing reads in the BAM file. A ’match’ would have close to 100% concordance for both measures, whereas a ’mismatch’ will have significantly lower concordance for both metrics. Increased cross-sample contaminations leads to decreased homozygous concordance values with no change in heterozygous concordance while increased amplification bias leads to decreased heterozygous concordance with no change in homozygous concordance. We recommend using uniquely mapping reads only by specifying the correct --filter-mapping-quality.

OPTIONS

--vcf [in.vcf|in.bcf|in.vcf.gz]

Genotypes in VCF/BCF format. Should contain all the samples in the dataset. REQUIRED.

--bam [in.bam|in.sam|in.cram]

Sequence data in BAM/SAM/CRAM format. REQUIRED.

--out output

Output file name REQUIRED.

--reg chr:start-end

Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5

--filter-mapping-quality integer

Minimum mapping quality for a read or read pair to be considered. Set this to only include uniquely mapped reads. DEFAULT=10

--filter-base-quality integer

Minimum phred quality for a base to be considered. DEFAULT=5

--filter-binomial-pvalue float

Binomial p-value threshold below which a heterozygous genotype is considered as exhibiting allelic imbalance. DEFAULT=0.05

--filter-minimal-coverage integer

Minimum number of reads overlapping a genotype for it to be considered. DEFAULT=10

--filter-imputation-qual float

Minimum imputation information score for a variant to be considered. DEFAULT=0.9

--filter-imputation-prob float

Minimum posterior probability for a genotype to be considered. DEFAULT=0.99

--filter-keep-duplicates

Keep reads designated as duplicate by the aligner.

OUTPUT FILE COLUMNS

--out filename

This file does not have header and it contains the following columns:

Image grohtml-81074-1.png

EXAMPLES

o

Running mbv on an RNAseq sample mapped with GEM:

QTLtools mbv --bam HG00381.chr22.bam --out HG00381.chr22.mbv.txt --vcf genotypes.chr22.vcf.gz --filter-mapping-quality 150

You can then plot column 9 vs. 10 to identify the genotyped sample in the VCF that matches best your sequence data.

SEE ALSO

QTLtools(1)

QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

Fort A., Panousis N. I., Garieri M. et al. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets, Bioinformatics 33(12), 1895 2017. <https://doi.org/10.1093/bioinformatics/btx074>

AUTHORS

Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)