HTML 4 For Dummies - Bonus Chapters

The Doctor Is In: Page Checkups


 

So you’ve written a bunch of HTML, and you think you’re done. Not by a long shot -- at least, not if you want to avoid being a laughing stock on the Web. Before all these great tools were available, an occasional misspelled word or broken link was acceptable, but now you have no excuse for such boo-boos. Before you post pages for public display, you must perform three important checks: HTML validation, spelling, and links. Lots of different tools can do one or more of these jobs, many of which are available for free on the Web. You can find a number of stand-alone utilities as well, not to mention those already embedded in HTML editors. As we said, you no longer have a valid excuse for mistakes.

HTML validation: Bad code is bad news

A majority of browsers are forgiving of markup errors. Most don’t even require an <HTML> tag to identify an HTML page, and instead look only for an .html or .htm suffix to identify a document as readable. Just because the real world is that way doesn’t make it right. You may see a day when browsers can’t afford to be so forgiving, and that day is drawing closer as HTML becomes more complicated and precise. It’s better to get it right from the beginning and save yourself a bunch of trouble later on.

Webtechs HTML Validator

HTML validation is built into many HTML editors, and although not many standalone HTML validation applications exist, you do have your choice of a number of free online validation systems. We like a couple in particular. The first is the Webtechs HTML Validator, which was one of the original validators. This validator lets you check entire documents or just snippets of HTML.

You can choose which HTML DTD you want to match -- from strict HTML 2.0 to various implementations of 4.0, as well as several browser-specific DTDs in between. The output from this validator can be a little difficult to read, however, because it is in a dialect of English we call “DTD speak.” Here’s the output we received when we intentionally broke a page and submitted it to the validator to check against the 3.2 DTD with the “Show Input,” “Show Formatted Output,” and “Treat URL Ampersands” parameters selected:

nsgmls:<OSFD>0:13:14:E: there is no attribute "BGCOLOR"
nsgmls:<OSFD>0:19:7:E: element "CENTER" undefined
nsgmls:<OSFD>0:23:59:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:29:15:E: character "%" is not allowed in the value of attribute "WIDTH"
nsgmls:<OSFD>0:33:19:E: there is no attribute "CELLPADDING"
nsgmls:<OSFD>0:37:11:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:39:47:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:41:86:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:45:47:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:47:95:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:57:11:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:59:47:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:61:82:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:65:47:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:67:92:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:75:11:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:77:47:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:79:93:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:83:47:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:85:86:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:93:11:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:95:49:E: there is no attribute "WIDTH"
nsgmls:<OSFD>0:97:88:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:107:15:E: character "%" is not allowed
in the value of attribute "WIDTH"
nsgmls:<OSFD>0:109:96:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:110:98:E: there is no attribute "BORDER"
nsgmls:<OSFD>0:114:15:E: character "%" is not allowed
in the value of attribute "WIDTH"
nsgmls:<OSFD>0:125:16:E: general entity "amp" not defined and no default entity

Pretty ugly if we do say so ourselves. Notice that there are several repetitions of some errors and what that tells us is that 3.2 doesn’t support that attribute. You have to learn how to read validator output to know what to fix and what to leave alone. Generally, editor validators are kinder or gentler and provide some mechanism in the interface to help you fix errors, rather than presenting you with complicated output like this.

Web Icon To check your HTML with the Webtechs Validator, point your Web browser to www.webtechs.com/html-val-svc/.

A Kinder, Gentler HTML Validator

If the Webtechs Validator is a little intimidating, you can use the Kinder, Gentler HTML Validator instead. Based on the Webtechs Validator, its output is a little less overwhelming and easier to read, but with fewer options. You must enter the URL to be checked, make a couple of choices about what kind of information you want returned, and then wait for the results. The URL for this site is ugweb.cs.ualberta.ca/~gerald/validate/.

When we submitted the same Web page as in the previous example to this validator, we received an explanation of not only what was wrong, but why it was wrong. For instance, the page didn’t include a <DOCTYPE> definition and the validator provided us with a solid discussion of why the page needed to have this often overlooked tag. The error output is much easier to read as well.

Granted, it takes a bit longer to get through the program’s output, and you have no control over the DTD that your HTML is checked against. But if you’ve never worked with a validator before, this is a good place to start before tackling Webtechs’ more detailed (but less intelligible) results.

Regardless of which validator you use, you must check each and every HTML page for accuracy. The more valid your HTML, the better the chance that your pages will look as you intend them to on a variety of browsers.

Of course you can spel: spell check your pages!

What is the biggest problem with checking HTML pages for spelling errors? The tags themselves are misspellings, according to Webster’s and most other dictionaries. Sitting and clicking the ignore key for each and every new tag can make spell checking tedious. After your eyes glaze over, you’re more apt to miss real misspellings. Once again, many editors include HTML-aware spell checkers that skip markup and check just the text. Because so many editors support this option, few stand-alone utilities are available, or any dedicated online spell checkers that we could find.

Web Icon Dr. HTML is an HTML checking tool that performs several different checks, including spell checks, on any HTML document or on an entire site. To investigate this utility and try its analytical skills, please visit this Web site:
www2.imagiware.com/RxHTML/.

Regardless of how you do it, even if it means cutting and pasting text from a browser to a word processor, you must check your pages for spelling errors. Bad spelling is often considered to be an indicator of intelligence and abilities, and we wouldn’t want anyone to underestimate you.

Don’t lose the connection: link checking

If you think spelling errors are embarrassing, here’s something that’s even worse: broken hyperlinks. Hyperlinks make the Web what it is; if you have broken links on your site, that’s borderline blasphemous. Seriously, if your text promises a link to a great resource or page but produces the dreaded 404 Object Not Found error when that link is clicked, users will be disappointed and may not ever revisit your site. The worst broken link is one that points to a resource in your own pages. You can’t be held responsible for what others do to their sites, but you are 100 percent accountable for your own site. Don’t let broken links happen to you!

As with the other checks, many HTML editors include built-in local link checkers, and some editors even scour the Web for you to check external links. In addition, a majority of Web servers also offer this feature. Checking external links isn’t as simple as it sounds because a program is involved that must work over an active Internet connection to query each link. This can be processor intensive, and you should check external links only during off-peak hours, like early morning, to avoid tying up other Web servers as well. A number of scripts and utilities are available on the Web to help you test your links. In the following sections, we share some of our favorites.

MOMSpider

MOMSpider was one of the first link checkers available to Web authors. This link checker is written in Perl and runs on any virtually any UNIX machine. The nice thing about MOMSpider is that it needn’t reside on the same computer as the site it checks, so even if you don’t serve your Web from UNIX, you can still check links from MOMSpider on a remote system.

Anyone who has some knowledge of Perl can easily configure MOMSpider to create custom output and to check both internal and external links on a site. Don’t fret; if you don’t know Perl, you can easily find a programmer who can adjust a MOMSpider in his or her sleep for a nominal fee. Many ISPs run a MOMSpider on your site for a low monthly fee and will cheerfully handle the configuration and implementation for you.

Web Icon To find out more about MOMSpider visit the official site at
www.ics.uci.edu/pub/websoft/MOMspider/.

Web Walker

Web Walker is a simpler, annotated version of MOMSpider that non-Perl users can implement themselves with just a little study. Once again it must run on a UNIX server with Perl installed, but the program itself is heavily commented to help you configure it without calling in a programmer. If you feel adventurous and want to try your hand at a little programming, give Web Walker a shot.

Web Icon Point your browser at the Web Walker page for more information:
info.webcrawler.com/mak/projects/robots/active/html/webwalker.html.

CheckBot

CheckBot is yet another Perl script based on the work of Roy Fielding, the programmer who created MOMSpider, and is similar to Web Walker in that it is a simpler, more annotated version of MOMSpider. CheckBot runs on any server with Perl installed and you can configure it without too much hassle if you’re willing to do a little reading.

Web Icon To learn more about CheckBot take a look at its Web page at
www.xs4all.nl/~graaff/checkbot/.

You’ve probably noticed that all the link checkers we mention are scripts (Perl scripts to be specific). Let this be your first clue that link checking is not quick and easy, but an essential task all the same. We recommend you check all links on a site weekly. If you can’t manage that, check them at least monthly. If not, you’ll have dead links and eventually a dead site.

Web Icon

We’ve only highlighted a few of the many different validation and checking utilities available on the Web. You can find numerous others and more will soon appear. For a complete, up-to-date list of these tools, visit the Yahoo! Validation and Checkers page at www.yahoo.com/Computers_and_Internet/Information_and_Documentation/
Data_Formats/HTML/Validation_and_Checkers/
.

Extra 6 Main Page | Previous Section | Next Section


Home | Bonus Chapters | FTP Resources | Site Overview | Book Contents | Book URLs | Book Examples | Wayfinding Toolkit | Contact Info


URL: http://www.lanw.com/html4dum/h4d4e/extras/ex06/e06s02.htm
E-mail: HTML For Dummies

Webmaster: Natanya Pitts, LANWrights
Copyright Information
Revised -- January 16, 1998