seqkit(1)

cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Section 1 seqkit bookworm source

Description

SEQKIT

NAME

seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

DESCRIPTION

SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Version: 2.1.0

Author: Wei Shen <shenwei356@gmail.com>

Documents : http://bioinf.shenwei.me/seqkit Source code: https://github.com/shenwei356/seqkit Please cite: https://doi.org/10.1371/journal.pone.0163962

Seqkit utlizies the pgzip (https://github.com/klauspost/pgzip) package to read and write gzip file, and the outputted gzip file would be slighty larger than files generated by GNU gzip.

Seqkit writes gzip files very fast, much faster than the multi-threaded pigz, therefore there’s no need to pipe the result to gzip/pigz.

Usage:

seqkit [command]

Available Commands:

amplicon

extract amplicon (or specific region around it) via primer(s)

bam

monitoring and online histograms of BAM record features

common

find common sequences of multiple files by id/name/sequence

concat

concatenate sequences with same ID from multiple files

convert

convert FASTQ quality encoding between Sanger, Solexa and Illumina

duplicate

duplicate sequences N times

faidx

create FASTA index file and extract subsequence

fish

look for short sequences in larger sequences using local alignment

fq2fa

convert FASTQ to FASTA

fx2tab

convert FASTA/Q to tabular format (and length, GC content, average quality...)

genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell) grep search sequences by ID/name/sequence/sequence motifs, mismatch allowed head print first N FASTA/Q records head-genome print sequences of the first genome with common prefixes in name locate locate subsequences/motifs, mismatch allowed mutate edit sequence (point mutation, insertion, deletion) pair match up paired-end reads from two fastq files range print FASTA/Q records in a range (start:end) rename rename duplicated IDs replace replace name/sequence by regular expression restart reset start position for circular genome rmdup remove duplicated sequences by ID/name/sequence sample sample sequences by number or proportion sana sanitize broken single line FASTQ files scat real time recursive concatenation and streaming of fastx files seq transform sequences (extract ID, filter by length, remove gaps...) shuffle shuffle sequences sliding extract subsequences in sliding windows sort sort sequences by id/name/sequence/length split split sequences into files by id/seq region/size/parts (mainly for FASTA) split2 split sequences into files by size/parts (FASTA, PE/SE FASTQ) stats simple statistics of FASTA/Q files subseq get subsequences by region/gtf/bed, including flanking sequences tab2fx convert tabular format to FASTA/Q format translate translate DNA/RNA to protein sequence (supporting ambiguous bases) version print version information and check for update watch monitoring and online histograms of sequence features

Flags:

--alphabet-guess-seq-length int

length of sequence prefix of the first FASTA record based on which seqkit guesses the sequence type (0 for whole seq) (default 10000)

-h, --help

help for seqkit

--id-ncbi

FASTA head is NCBI-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud...

--id-regexp string

regular expression for parsing ID (default "ˆ(\\S+)\\s?")

--infile-list string

file of input files list (one file per line), if given, they are appended to files from cli arguments

-w, --line-width int

line width when outputting FASTA format (0 for no wrap) (default 60)

-o, --out-file string

out file ("-" for stdout, suffix .gz for gzipped out) (default "-")

--quiet

be quiet and do not show extra information

-t, --seq-type string

sequence type (dna|rna|protein|unlimit|auto) (for auto, it automatically detect by the first sequence) (default "auto")

-j, --threads int

number of CPUs. can also set with environment variable SEQKIT_THREADS) (default 4)

Use "seqkit [command] --help" for more information about a command.

AUTHOR

This manpage was written by Nilesh Patra for the Debian distribution and can be used for any other usage of the program.