pandoc(1)

pandoc - general markup converter

Section 1 pandoc bookworm source

Description

'\" t

. ftr V B . ftr VI BI . ftr VB B . ftr VBI BI

. ftr V CR . ftr VI CI . ftr VB CB . ftr VBI CBI

pandoc - general markup converter

\f[V]pandoc\f[R] [\f[I]options\f[R]] [\f[I]input-file\f[R]]...

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.

Pandoc can convert between numerous markup and word processing formats, including, but not limited to, various flavors of Markdown, HTML, LaTeX and Word docx. For the full lists of input and output formats, see the \f[V]--from\f[R] and \f[V]--to\f[R] options below. Pandoc can also produce PDF output: see creating a PDF, below.

Pandoc\[cq]s enhanced version of Markdown includes syntax for tables, definition lists, metadata blocks, footnotes, citations, math, and much more. See below under Pandoc\[cq]s Markdown.

Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document (an \f[I]abstract syntax tree\f[R] or AST), and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer. Users can also run custom pandoc filters to modify the intermediate AST.

Because pandoc\[cq]s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc\[cq]s simple document model. While conversions from pandoc\[cq]s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc\[cq]s Markdown can be expected to be lossy.

If no \f[I]input-files\f[R] are specified, input is read from \f[I]stdin\f[R]. Output goes to \f[I]stdout\f[R] by default. For output to a file, use the \f[V]-o\f[R] option:

\f[C] pandoc -o output.html input.txt \f[R]

By default, pandoc produces a document fragment. To produce a standalone document (e.g.\ a valid HTML file including \f[V]\f[R] and \f[V]\f[R]), use the \f[V]-s\f[R] or \f[V]--standalone\f[R] flag:

\f[C] pandoc -s -o output.html input.txt \f[R]

For more information on how standalone documents are produced, see Templates below.

If multiple input files are given, pandoc will concatenate them all (with blank lines between them) before parsing. (Use \f[V]--file-scope\f[R] to parse files individually.)

The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the \f[V]-f/--from\f[R] option, the output format using the \f[V]-t/--to\f[R] option. Thus, to convert \f[V]hello.txt\f[R] from Markdown to LaTeX, you could type:

\f[C] pandoc -f markdown -t latex hello.txt \f[R]

To convert \f[V]hello.html\f[R] from HTML to Markdown:

\f[C] pandoc -f html -t markdown hello.html \f[R]

Supported input and output formats are listed below under Options (see \f[V]-f\f[R] for input formats and \f[V]-t\f[R] for output formats). You can also use \f[V]pandoc --list-input-formats\f[R] and \f[V]pandoc --list-output-formats\f[R] to print lists of supported formats.

If the input or output format is not specified explicitly, pandoc will attempt to guess it from the extensions of the filenames. Thus, for example,

\f[C] pandoc -o hello.tex hello.txt \f[R]

will convert \f[V]hello.txt\f[R] from Markdown to LaTeX. If no output file is specified (so that output goes to \f[I]stdout\f[R]), or if the output file\[cq]s extension is unknown, the output format will default to HTML. If no input file is specified (so that input comes from \f[I]stdin\f[R]), or if the input files\[cq] extensions are unknown, the input format will be assumed to be Markdown.

Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through \f[V]iconv\f[R]:

\f[C] iconv -t utf-8 input.txt | pandoc | iconv -f utf-8 \f[R]

Note that in some output formats (such as HTML, LaTeX, ConTeXt, RTF, OPML, DocBook, and Texinfo), information about the character encoding is included in the document header, which will only be included if you use the \f[V]-s/--standalone\f[R] option.

To produce a PDF, specify an output file with a \f[V].pdf\f[R] extension:

\f[C] pandoc test.txt -o test.pdf \f[R]

By default, pandoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed (see \f[V]--pdf-engine\f[R] below). Alternatively, pandoc can use ConTeXt, roff ms, or HTML as an intermediate format. To do this, specify an output file with a \f[V].pdf\f[R] extension, as before, but add the \f[V]--pdf-engine\f[R] option or \f[V]-t context\f[R], \f[V]-t html\f[R], or \f[V]-t ms\f[R] to the command line. The tool used to generate the PDF from the intermediate format may be specified using \f[V]--pdf-engine\f[R].

You can control the PDF style using variables, depending on the intermediate format used: see variables for LaTeX, variables for ConTeXt, variables for \f[V]wkhtmltopdf\f[R], variables for ms. When HTML is used as an intermediate format, the output can be styled using \f[V]--css\f[R].

To debug the PDF creation, it can be useful to look at the intermediate representation: instead of \f[V]-o test.pdf\f[R], use for example \f[V]-s -o test.tex\f[R] to output the generated LaTeX. You can then test it with \f[V]pdflatex test.tex\f[R].

When using LaTeX, the following packages need to be available (they are included with all recent versions of TeX Live): \f[V]amsfonts\f[R], \f[V]amsmath\f[R], \f[V]lm\f[R], \f[V]unicode-math\f[R], \f[V]iftex\f[R], \f[V]listings\f[R] (if the \f[V]--listings\f[R] option is used), \f[V]fancyvrb\f[R], \f[V]longtable\f[R], \f[V]booktabs\f[R], \f[V]graphicx\f[R] (if the document contains images), \f[V]hyperref\f[R], \f[V]xcolor\f[R], \f[V]ulem\f[R], \f[V]geometry\f[R] (with the \f[V]geometry\f[R] variable set), \f[V]setspace\f[R] (with \f[V]linestretch\f[R]), and \f[V]babel\f[R] (with \f[V]lang\f[R]). If \f[V]CJKmainfont\f[R] is set, \f[V]xeCJK\f[R] is needed. The use of \f[V]xelatex\f[R] or \f[V]lualatex\f[R] as the PDF engine requires \f[V]fontspec\f[R]. \f[V]lualatex\f[R] uses \f[V]selnolig\f[R]. \f[V]xelatex\f[R] uses \f[V]bidi\f[R] (with the \f[V]dir\f[R] variable set). If the \f[V]mathspec\f[R] variable is set, \f[V]xelatex\f[R] will use \f[V]mathspec\f[R] instead of \f[V]unicode-math\f[R]. The \f[V]upquote\f[R] and \f[V]microtype\f[R] packages are used if available, and \f[V]csquotes\f[R] will be used for typography if the \f[V]csquotes\f[R] variable or metadata field is set to a true value. The \f[V]natbib\f[R], \f[V]biblatex\f[R], \f[V]bibtex\f[R], and \f[V]biber\f[R] packages can optionally be used for citation rendering. The following packages will be used to improve output quality if present, but pandoc does not require them to be present: \f[V]upquote\f[R] (for straight quotes in verbatim environments), \f[V]microtype\f[R] (for better spacing adjustments), \f[V]parskip\f[R] (for better inter-paragraph spaces), \f[V]xurl\f[R] (for better line breaks in URLs), \f[V]bookmark\f[R] (for better PDF bookmarks), and \f[V]footnotehyper\f[R] or \f[V]footnote\f[R] (to allow footnotes in tables).

Instead of an input file, an absolute URI may be given. In this case pandoc will fetch the content using HTTP:

\f[C] pandoc -f html -t markdown https://www.fsf.org \f[R]

It is possible to supply a custom User-Agent string or other header when requesting a document from a URL:

\f[C] pandoc -f html -t markdown --request-header User-Agent:\[dq]Mozilla/5.0\[dq] \[rs] https://www.fsf.org \f[R]

\f[V]-f\f[R] \f[I]FORMAT\f[R], \f[V]-r\f[R] \f[I]FORMAT\f[R], \f[V]--from=\f[R]\f[I]FORMAT\f[R], \f[V]--read=\f[R]\f[I]FORMAT\f[R] Specify input format. \f[I]FORMAT\f[R] can be:

\f[V]bibtex\f[R] (BibTeX bibliography)

\f[V]biblatex\f[R] (BibLaTeX bibliography)

\f[V]commonmark\f[R] (CommonMark Markdown)

\f[V]commonmark_x\f[R] (CommonMark Markdown with extensions)

\f[V]creole\f[R] (Creole 1.0)

\f[V]csljson\f[R] (CSL JSON bibliography)

\f[V]csv\f[R] (CSV table)

\f[V]docbook\f[R] (DocBook)

\f[V]docx\f[R] (Word docx)

\f[V]dokuwiki\f[R] (DokuWiki markup)

\f[V]epub\f[R] (EPUB)

\f[V]fb2\f[R] (FictionBook2 e-book)

\f[V]gfm\f[R] (GitHub-Flavored Markdown), or the deprecated and less accurate \f[V]markdown_github\f[R]; use \f[V]markdown_github\f[R] only if you need extensions not supported in \f[V]gfm\f[R].

\f[V]haddock\f[R] (Haddock markup)

\f[V]html\f[R] (HTML)

\f[V]ipynb\f[R] (Jupyter notebook)

\f[V]jats\f[R] (JATS XML)

\f[V]jira\f[R] (Jira/Confluence wiki markup)

\f[V]json\f[R] (JSON version of native AST)

\f[V]latex\f[R] (LaTeX)

\f[V]markdown\f[R] (Pandoc\[cq]s Markdown)

\f[V]markdown_mmd\f[R] (MultiMarkdown)

\f[V]markdown_phpextra\f[R] (PHP Markdown Extra)

\f[V]markdown_strict\f[R] (original unextended Markdown)

\f[V]mediawiki\f[R] (MediaWiki markup)

\f[V]man\f[R] (roff man)

\f[V]muse\f[R] (Muse)

\f[V]native\f[R] (native Haskell)

\f[V]odt\f[R] (ODT)

\f[V]opml\f[R] (OPML)

\f[V]org\f[R] (Emacs Org mode)

\f[V]rtf\f[R] (Rich Text Format)

\f[V]rst\f[R] (reStructuredText)

\f[V]t2t\f[R] (txt2tags)

\f[V]textile\f[R] (Textile)

\f[V]tikiwiki\f[R] (TikiWiki markup)

\f[V]twiki\f[R] (TWiki markup)

\f[V]vimwiki\f[R] (Vimwiki)

the path of a custom Lua reader, see Custom readers and writers below

Extensions can be individually enabled or disabled by appending \f[V]+EXTENSION\f[R] or \f[V]-EXTENSION\f[R] to the format name. See Extensions below, for a list of extensions and their names. See \f[V]--list-input-formats\f[R] and \f[V]--list-extensions\f[R], below.

\f[V]-t\f[R] \f[I]FORMAT\f[R], \f[V]-w\f[R] \f[I]FORMAT\f[R], \f[V]--to=\f[R]\f[I]FORMAT\f[R], \f[V]--write=\f[R]\f[I]FORMAT\f[R] Specify output format. \f[I]FORMAT\f[R] can be:

\f[V]asciidoc\f[R] (AsciiDoc) or \f[V]asciidoctor\f[R] (AsciiDoctor)

\f[V]beamer\f[R] (LaTeX beamer slide show)

\f[V]bibtex\f[R] (BibTeX bibliography)

\f[V]biblatex\f[R] (BibLaTeX bibliography)

\f[V]commonmark\f[R] (CommonMark Markdown)

\f[V]commonmark_x\f[R] (CommonMark Markdown with extensions)

\f[V]context\f[R] (ConTeXt)

\f[V]csljson\f[R] (CSL JSON bibliography)

\f[V]docbook\f[R] or \f[V]docbook4\f[R] (DocBook 4)

\f[V]docbook5\f[R] (DocBook 5)

\f[V]docx\f[R] (Word docx)

\f[V]dokuwiki\f[R] (DokuWiki markup)

\f[V]epub\f[R] or \f[V]epub3\f[R] (EPUB v3 book)

\f[V]epub2\f[R] (EPUB v2)

\f[V]fb2\f[R] (FictionBook2 e-book)

\f[V]gfm\f[R] (GitHub-Flavored Markdown), or the deprecated and less accurate \f[V]markdown_github\f[R]; use \f[V]markdown_github\f[R] only if you need extensions not supported in \f[V]gfm\f[R].

\f[V]haddock\f[R] (Haddock markup)

\f[V]html\f[R] or \f[V]html5\f[R] (HTML, i.e.\ HTML5/XHTML polyglot markup)

\f[V]html4\f[R] (XHTML 1.0 Transitional)

\f[V]icml\f[R] (InDesign ICML)

\f[V]ipynb\f[R] (Jupyter notebook)