html-proofer - Finding broken links in Jekyll
html-proofer on GitHub is a “set of test to validate your HTML output”. I’m interested in the broken link checking part of it.
I couldn’t get it working well but I’m not a Rubyist.
I’ve written this article to help you try it out.
In your Jekyll directory
# get all gem files up to date sudo bundle update # add to Gemfile gem 'html-proofer' sudo bundle update # start up the site # I use the alias: jsu # 'bundle exec jekyll serve --livereload --unpublished > /dev/null 2>&1 & # run the proofer htmlproofer --allow_hash_href --empty_alt_ignore --assume_extension --disable_external ./_site
Comand Line Options
Here are the ones that I’ve found useful:
- –allow_hash_href - ignore # internal links
- –empty_alt_ignore - I’ve got older blog posts with empty alt tags on images which I need to fix
- –assume_extension - allow extensionles urls (Jekyll3)
- –disable_external - only does internal links
The output for me was like this, and I found that exporting to a text file was easier to read.
- ./_site/2016/10/16/Why-Blog.html * image /assets/Dave_180.jpg does not have an alt attribute (line 0) * internally linking to /2019/04/07/Twitter-card-open-graph-site-preview, which does not exist (line 0) <a href="/2019/04/07/Twitter-card-open-graph-site-preview">wrote a blog post on it</a> * internally linking to /about, which does not exist (line 0) <a class="page-link" href="/about">About</a> * linking to internal hash # that does not exist (line 0) <a href="#" class="menu-icon"> <!-- <a class="menu-icon"> --> <svg viewBox="0 0 18 15" width="18px" height="15px"> <path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"></path> <path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"></path> <path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"></path> </svg> </a>
Outputting to a text file:
bundle exec htmlproofer --allow_hash_href --alt_ignore --assume_extension ./_site &> links.log
On my Ubuntu 20.04.1 LTS machine the script seems to fail with a Ruby runtime error:
- ./_site/about.html * External link https://channel9.msdn.com/events/DDD/DDD12-Developer-Day-2017/Streaming-Large-Volumes-of-Data-into-SQL failed: got a time out (response code 0) htmlproofer 3.16.0 | Error: HTML-Proofer found 90 failures! Traceback (most recent call last): 25: from /usr/local/bin/bundle:23:in `<main>' 24: from /usr/local/bin/bundle:23:in `load' 23: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:34:in `<top (required)>' 22: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/friendly_errors.rb:123:in `with_friendly_errors' 21: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/exe/bundle:46:in `block in <top (required)>' 20: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:24:in `start' 19: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/base.rb:476:in `start' 18: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:30:in `dispatch' 17: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor.rb:399:in `dispatch' 16: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command' 15: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run' 14: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli.rb:476:in `exec' 13: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:28:in `run' 12: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `kernel_load' 11: from /var/lib/gems/2.7.0/gems/bundler-2.1.4/lib/bundler/cli/exec.rb:63:in `load' 10: from /usr/local/bin/htmlproofer:23:in `<top (required)>' 9: from /usr/local/bin/htmlproofer:23:in `load' 8: from /var/lib/gems/2.7.0/gems/html-proofer-3.16.0/bin/htmlproofer:11:in `<top (required)>' 7: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program' 6: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go' 5: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute' 4: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each' 3: from /var/lib/gems/2.7.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute' 2: from /var/lib/gems/2.7.0/gems/html-proofer-3.16.0/bin/htmlproofer:109:in `block (2 levels) in <top (required)>' 1: from /var/lib/gems/2.7.0/gems/html-proofer-3.16.0/lib/html-proofer/runner.rb:51:in `run' /var/lib/gems/2.7.0/gems/html-proofer-3.16.0/lib/html-proofer/runner.rb:176:in `print_failed_tests': \e[31mHTML-Proofer found 90 failures!\e[0m (RuntimeError)
As I’m not a Rubyist, but am interested in the results of this tool (I’ve written many broken link checkers), lets see if it works well from Docker side.
I tried running
sudo apt install docker.io from WSL2, then tried installing from Windows side following Docker Desktop WSL 2 backend
docker run --rm -it -v $(pwd):/src klakegg/html-proofer:3.16.0 --allow-hash-href --alt-ignore --assume_extension ./_site
And got a similar runtime error on my site. Very strange as the Ubuntu 18.04 LTS version worked on the same site without a RuntimeError.
This is proving quite difficult to use and buggy on a few machines, and difficult to read the output.
The app did seem to work, just failed at the end with a runtime error.
There are some useful features in this project, and I suspect it is my lack of Ruby experience here that is the problem.
Sometimes it is good to know when to stop and move on to the next thing!