New version of HTML::Lint validates HTML entities

| 1 Comment

I've released a beta of the new version of HTML::Lint, version 2.11_01. (At the time of this writing, this 2.11_01 release has not reached its search.cpan.org page yet) This version adds HTML entity checking to the tag checking that HTML::Lint has done since the dawn of time. If you're already using HTML::Lint, please help test this beta version!

Entity checking can be a messy business, but can be invaluable for finding little mistakes, especially in static HTML pages sent to you from other sources. For example, if I have this HTML file, filled with HTML entities and ampersands and all sorts of potential problems, HTML::Lint sniffs out the problems and reports them:

<html>
    <head>
        <title>Ace of &spades;: A tribute to Mot&oumlrhead. &#174; &metalhorns;</title>
        <script>
            function foo() {
                if ( 6 == 9 && 25 == 6 ) {
                    x = 14;
                }
            }
        </script>
    </head>
    <body bgcolor="white">
        <p>
        Thanks for visiting Ace of &#9824;
        <!-- Numeric version of &spades; -->
        <p>
        Ace of &#x2660; is your single source for
        everything related to Mot&ouml;rhead.
        <p>
        Here's an icon of my girlfriend Jenny: &#8675309;
        <!-- invalid because we cap at 65536 -->
        <p>
        And here's an icon of a deceased cow: &#xdeadbeef;
        <!-- Invalid because we cap at xFFFF -->
        <p>
        Another <i>deceased cow: &xdeadbeef;
        <!-- Not a valid hex entity, but unknown to our lookup tables -->
        <p>
        Here's an awesome link to
        <!-- here comes the ampersand in the YouTube URL! -->
        <a href="http://www.youtube.com/watch?v=8yLhA0ROGi4&feature=related">"You Better Swim"</a>
        from the SpongeBob movie.
        <!--
        Here in the safety of comments, we can put whatever &invalid; and &malformed entities we want, &
        nobody can stop us.  Except maybe Cheech & Chong.
        -->
    </body>
</html>


$ weblint motorhead.html
motorhead.html (3:9) Entity &ouml; is missing its closing semicolon
motorhead.html (3:9) Entity &oumlrhead. &#174; is unknown
motorhead.html (3:9) Entity &metalhorns; is unknown
motorhead.html (17:9) Entity &#8675309; is invalid
motorhead.html (19:9) Entity &#xdeadbeef; is invalid
motorhead.html (22:17) Entity &xdeadbeef; is unknown
motorhead.html (31:5) <i> at (22:17) is never closed

That last error about the unclosed <i> tag has always been part of HTML::Lint, but all the others are new with this version of HTML::Lint.

The HTML-Lint distribution includes the HTML::Lint module, which is object based for easy handling, and also includes Test::HTML::Lint so that you can add HTML validation to your test suites.

my $html = $app->generate_home_page();
html_ok( $html, 'Home page is valid HTML' );

If you're not doing any validation of your HTML in your apps, I suggest you give HTML::Lint a try.

1 Comment

Leave a comment

Job hunting for programmers


Land the Tech Job You Love, Andy Lester's guide to job hunting for programmers and other technical professionals, is available in PDF, ePub and .mobi formats, all DRM-free, as well as good old-fashioned paper.