htmlproofer(1)

validate rendered HTML files

Section 1 ruby-html-proofer bookworm source

Description

HTMLProofe

NAME

htmlproofer - validate rendered HTML files

SYNOPSIS

htmlproofer directory [options]

DESCRIPTION

htmlproofer is a set of tests to validate HTML output. These tests check if image references are legitimate, if they have alt tags, if internal links are working, and so on. HTMLProofer can run on a file, a directory, an array of directories, or an array of links. Below is a mostly comprehensive list of checks that it can perform.

Images (<img> elements)

Whether all images have alt tags

Whether internal image references are not broken

Whether external images are showing

Whether images are HTTPS

Links (<a>, <link> elements)

Whether internal links are working

Whether internal hash references (#linkToMe) are working

Whether external links are working

Whether links are HTTPS

Whether CORS/SRI is enabled

Scripts (<script> elements)

Whether internal script references are working

Whether external scripts are loading

Whether CORS/SRI is enabled

Favicon

Whether favicons are valid.

OpenGraph

Whether the images and URLs in the OpenGraph metadata are valid.

HTML

Whether your HTML markup is valid.

This is done via Nokogiri to ensure well-formed markup.

OPTIONS

Listed below are the command line options for htmlproofer:
--allow-missing-href

Don’t flag tags missing an href attribute. This is the default for HTML5.

--allow-hash-href

Ignores href="#".

--as-links

Assumes that PATH is a comma-separated array of links to check.

--alt-ignore image1,[image2,...]

A comma-separated list of Strings or RegExps containing images whose missing alt attributes are safe to ignore.

--assume-extension

Automatically add extension (e.g. .html) to file paths, to allow extensionless URLs (as supported by Jekyll 3 and GitHub Pages).

--checks-to-ignore check1,[check2,...]

An array of Strings indicating which checks not to perform.

--check-external-hash

Checks whether external hashes exist (even if the webpage exists). This slows the checker down.

--check-favicon

Enables the favicon checker.

--check-html

Enables HTML validation errors from Nokogiri.

--check-img-http

Check, that images use HTTPS.

--check-opengraph

Enables the Open Graph checker.

--check-sri

Check that <link> and <script> external resources do use SRI.

--directory-index-file filename

Sets the file to look for when a link refers to a directory. Defaults to index.html.

--disable-external

Don’t run the external link checker, which can take a lot of time.

--empty-alt-ignore

If true, ignores images with empty alt attribues.

--error-sort sort

Defines the sort order for error output. Can be :path, :desc, or :status. Defaults to :path.

--enforce-https

Fails if a link is not marked as HTTPS.

--extension ext

The extension of HTML files including the dot. Defaults to .html.

--external_only

Only check for problems with external references.

--file-ignore file1,[file2,...]

A comma-separated list of Strings or RegExps containing file paths that are safe to ignore.

--help

Print this usage information on the command line.

--http-status-ignore 123,[xxx, ...]

A comma-separated list of numbers representing status codes to ignore.

--internal-domains domain1,[domain2,...]

A comma-separated list of Strings containing domains that will be treated as internal urls.

--report-invalid-tags

Ignore errors from --check-html associated with unknown markup.

--report-missing-names

Ignore errors from --check-html associated with missing entities.

--report-script-embeds

Ignore errors from --check-html associated with <script>s.

--log-level level

Sets the logging level, as determined by Yell. One of :debug, :info, :warn, :error, or :fatal. Defaults to :info.

--only-4xx

Only reports errors for links that fall within the 4xx status code range

--storage-dir directory

Directory where to store the cache log. Defaults to tmp/.htmlproofer.

--timeframe time

A string representing the caching timeframe.

--typhoeus-config string

JSON-formatted string of Typhoeus config. It will override the html-proofer defaults.

--url-ignore link1,[link2,...]

A comma-separated list of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Non-HTTP(S) URIs are always ignored.

--url-swap re:string,[re:string,...]

A comma-separated list containing key-value pairs of RegExp => String. It transforms URLs that match RegExp into String via gsub. The escape sequences \: should be used to produce literal :s.’

Special usage cases

For options which require an array of input, values can be surrounded by quotes. Don’t use any spaces. For example, to exclude an array of HTTP status code:

htmlproofer --http-status-ignore 999,401,404 ./out

For options like --url-ignore, which require an array of regular expressions, the following syntax works:

htmlproofer --url-ignore /www.github.com/,/foo.com/ ./out

The --url-swap switch is a bit special, since one will pass in a pair of RegEx:String values. The escape sequences \: should be used to produce literal :. htmlproofer will figure out what you mean.

htmlproofer --url-swap wow:cow,mow:doh --extension .html.erb --url-ignore www.github.com ./out

AUTHORS

The program author is Garen Torikian.

This manual page page was written by Daniel Leidert <daniel.leidert@wgdd.de> for the Debian distribution (but may be used by others).