cleanasn(1)

clean up irregularities in NCBI ASN.1 objects

Section 1 ncbi-tools-bin bookworm source

Description

CLEANASN

NAME

cleanasn - clean up irregularities in NCBI ASN.1 objects

SYNOPSIS

cleanasn [-] [-A filename] [-B str] [-C str] [-D str] [-F str] [-K str] [-L filename] [-M filename] [-N str] [-O str] [-P str] [-Q str] [-R] [-S str] [-T] [-U str] [-V str] [-X str] [-Z str] [-a str] [-b] [-c] [-d str] [-f str] [-i filename] [-j filename] [-k filename] [-m str] [-n path] [-o filename] [-p path] [-q path] [-r path] [-v path] [-x ext]

DESCRIPTION

cleanasn is a utility program to clean up irregularities in NCBI ASN.1 objects.

OPTIONS

A summary of options is included below.

-

Print usage message

-A filename

Accession list file

-B str

Branch, per the flags in str:

c

Has coding regions

d

No coding regions

p

Passes validation

q

Validator errors or rejects

r

Only pop/phy/mut/eco/WGS sets

s

Exclude pop/phy/mut/eco/WGS sets

t

Only nuc-prot sets

u

Exclude nuc-prot sets

v

Only segmented sequences

w

Exclude segmented sequences

x

Only segmented proteins

y

Exclude segmented proteins

-C str

Sequence operations, per the flags in str:

c

Compress

d

Decompress

l

Recalculated segmented sequence length

v

Virtual gaps inside segmented sequence

s

Convert segmented set to delta sequence

t

Non-NucProt segmented set to delta sequence

u

Improved non-NucProt segmented set to delta sequence

g

Raw to delta by assembly gap

m

Merge assembly gap features

-D str

Clean up descriptors, per the flags in str:

t

Remove Title

c

Remove Comment

n

Remove Nuc-Prot Set title

e

Remove Pop/Phy/Mut/Eco Set title

m

Remove mRNA title

p

Remove Protein title

a

Title to name

b

AutoDef title or name

x

Prefix title with organism name

-F str

Clean up features, per the flags in str:

u

Remove User-objects

d

Remove db_xrefs

e

Remove /evidence and /inference

g

Fuse multi-interval genes

i

Fuse adjacent-interval imported features

r

Remove redundant gene xrefs

f

Fuse duplicate features

s

Package features on referenced Bioseq

k

Package coding-region or parts features

z

Delete or update EC numbers

b

Set Best coding-region reading frame

x

Retranslate coding regions

a

Adjust for missing stop codon

-K str

Perform a general cleanup, per the flags in str:

b

BasicSeqEntryCleanup

p

C++ BasicCleanup (via an external utility)

v

AdvancedSeqEntryCleanup

s

SeriousSeqEntryCleanup

x

ExtendedSeqEntryCleanup

g

GpipeSeqEntryCleanup

n

Normalize descriptor order

u

Remove NcbiCleanup User Objects

c

Synchronize genetic Codes

f

CDS partial from translation

e

Impose CDS partials

d

Resynchronize CDS partials

m

Resynchronize mRNA partials

t

Resynchronize Peptide partials

a

Adjust consensus splice

i

Promote to "worst" Seq-ID

r

Reassign local IDs

l

Remove locus

-L filename

Log file

-M filename

Macro file

-N str

Clean up links, per the flags in str:

o

Link CDS mRNA by Overlap

p

Link CDS mRNA by Product

l

Link CDS mRNA by Label and Location

r

Reassign feature IDs

m

Merge colliding feature IDs

f

Fix missing reciprocal feature IDs

c

Clear feature IDs

-O str

Missing prot-ref name

-P str

Publication options:

a

Remove All publications

s

Remove Serial number

f

Remove Figure, numbering, and name

r

Remove Remark

u

Update PMID-only publication

j

Lookup ISO Journal title abbreviation

m

Merge identical publication features

#

Replace unpublished with PMID

-Q str

Report:

c

Record count

r

ASN.1 BSEC report

s

ASN.1 SSEC report

n

NORM vs. SSEC report

e

PopPhyMutEco AutoDef report

o

Overlap report

l

Latitude-longitude country diff

d

Log SSEC differences

g

GenBank SSEC diff

f

asn2gb/asn2flat diff

h

Seg-to-delta GenBank diff

v

Validator SSEC diff

m

Modernize Gene/RNA/PCR

u

Unpublished Pub lookup

p

Published Pub lookup

j

Unindexed Journal report

t

tRNA anticodon report

w

Component offset report

x

Custom scan

-R

Remote fetching from ID (NCBI sequence databases)

-S str

Selective difference filter (capital letters skip)

s

SSEC

b

BSEC

A

Author

p

Publication

l

Location

r

RNA

q

Qualifier sort order

g

Genbank block

k

Package CdRegion or parts features

m

Move publication

o

Leave duplicate Bioseq publication

d

Automatic definition line

e

Pop/Phy/Mut/Eco Set definition line

-T

Taxonomy Lookup

-U str

Modernize, per the flags in str:

g

Genes

r

RNA

p

PCR Primers

-V str

Remove features by validator severity:

r

Reject

e

Error

w

Warning

i

Info

-X str

Miscellaneous options, per str:

d

Automatic definition line

s

Automatic definition line with Source qualifiers

e

Pop/Phy/Mut/Eco Set definition line

n

Instantiate NC title

m

Instantiate NM titles

x

Special XM titles

p

Instantiate Protein titles

g

GPipe instantiate titles

c

Create mRNAs for coding sequences

f

Fix reciprocal protein_id/transcript_id

v

Revert preRNA or ncRNA transcript_id

t

Parse anticodon from Sequence

b

Batch cleanup of multireader output

z

Wrap SegSet with NucProt set

w

GFF/WGS genome cleanup

-Z str

Remove indicated User-object

-a str

ASN.1 type

a

Any (default)

e

Seq-entry

b

Bioseq

s

Bioseq-set

m

Seq-submit

t

Batch Bioseq-set

u

Batch Seq-submit

-b

Input ASN.1 is Binary

-c

Input ASN.1 is Compressed

-d str

Source database

a

Any (default)

g

GenBank

e

EMBL

d

DDBJ

b

EMBL or DDBJ

i

INSD

r

RefSeq

n

NCBI

x

Exclude EMBL/DDBJ

y

Exclude gbcon, gbest, gbgss, gbhtg, gbpat, gbsts

-f str

Substring filter

-i filename

Single input file (defaults to stdin)

-j filename

First filename

-k filename

Last filename

-m str

Flatfile mode:

r

Release

e

Entrez

s

Sequin

d

Dump

-n path

asn2flat executable (default is /netopt/ncbi_tools/bin/asn2flat)

-o filename

Single output file (defaults to stdout)

-p path

Process all matching files in path

-q path

ffdiff executable (default is /netopt/genbank/subtool/bin/ffdiff)

-r path

Path for results

-v path

asnval executable (default is /netopt/ncbi_tools/bin/asnval)

-x ext

File selection suffix for use with -p (defaults to .ent)

AUTHOR

The National Center for Biotechnology Information.

SEE ALSO

asndisc(1), asnval(1), sequin(1).