validation
HTML Validation
Over a week ago now PPK wrote an article for A List Apart magazine entitled JavaScript Triggers in which he described a method of providing additional information for a scripting layer to use. Basically, this information is whacked into attributes on your standard HTML elements and become the triggers
. There is a warning included at the top of his article indicating that the subject may be controversial, as I can tell A List Apart has built itself around some pretty pedantic people :-). Alongside PPK's article was another, entitled Validating a Custom DTD from J. David Eisenberg. It almost seemed to be hinting at saving grace on the part of ALA; we did a bad but looky here.
Was PPK right?
A few days after the article was published PPK wrote an entry in his blog to wrap up the feedback he got on the subject of validation. There were a couple of points raised, some valid, some moot. In my opinion the correct solution is namespacing in the world of XML, however this is not a real world solution, the 10lb gorilla doesn't support it so it isn't going to stick. The solution that seemed to hold the most weight is the use of custom DTDs, these make your document validate and it works, best of both worlds! Well, actually, not.
Conformance with the HTML specification
Most people's gold standard of validation is either the W3C's validator or one of the other similar online services. It gives you a nice, official looking report with a clear message at the top of the document which tells you whether you got it right or wrong. All these validators sit atop of a newer version of James Clark's SP, pretty much the defacto validating SGML parser. The fact that no one in the real world uses this parser almost makes them and the issue of validation entirely academic, but I'll come onto this later. My point right now is that these only check a document against it's DTD to ensure validity.
The HTML specification says:
Each markup language defined in SGML is called an SGML application. An SGML application is generally characterized by:
- An SGML declaration. The SGML declaration specifies which characters and delimiters may appear in the application.
- A document type definition (DTD). The DTD defines the syntax of markup constructs. The DTD may include additional definitions such as character entity references.
- A specification that describes the semantics to be ascribed to the markup. This specification also imposes syntax restrictions that cannot be expressed within the DTD.
- Document instances containing data (content) and markup. Each instance contains a reference to the DTD to be used to interpret it.
The first two requirements are checked by the validator, however the third isn't. This third point is where strict conformance dies and your document essentially ceases to be an HTML document.
The HTML specification includes a few appendices including notes on invalid documents which outline what happens if your document isn't valid. It recommends (to UAs, not authors) that if a user agent encounters an attribute it does not recognize, it should ignore the entire attribute specification (i.e., the attribute and its value)
. This bodes well for our use case, however in a note to authors, directly underneath, it also states that since user agents may vary in how they handle error conditions, authors and users must not rely on specific error recovery behavior
(where the definition of must not
is given by RFC 2119 and means absolute prohibition
, which is pretty final). So this method is definite no-no according to the specification.
The same section also says that for reasons of interoperability, authors must not "extend" HTML through the available SGML mechanisms (e.g., extending the DTD, adding a new set of entity definitions, etc.)
. Again, this pretty clearly and cleanly cuts out J. David Eisenberg's method. If you do use his method to force SGML validity you sacrifice your status as an HTML document.
Is validation moot?
With all this heartache to fight for strict conformance to the HTML specification, is it worth it? The other day I noticed Nate mention that validation is a tool, not a label
. I'm not entirely sure what that means, but I think the general meme is that if your document is parsed correctly in the major browsers, the final hurdle kind of becomes moot, it's just not worth the effort.
From my standpoint I see the that purpose of strict validation is that all documents will become universally easy to parse and UAs can simply use drop in parsers. It's the dream. At the moment we have to put a fantastic amount of work into producing a really good parser that is up to the task of tackling the Web. This is not a good place to be. However, most people will tell you that this is only a dream, it's not going to happen. The mindshare you would have to win to get there is awesome (not like the hotdogs). In my opinion the best we can do is to ensure content we produce is known good and accept the rest of the world just isn't going to get there.

2 Comments:
This is pretty much how I feel, if you're going to add your own attributes and elements to HTML you're no longer writing in HTML.
The entire point of validating a document isn't to get brownie points from standards advocates, it's to ensure that you're writing something that has meaning to people besides yourself.
Just because the browser renders it correctly (let's face it, they'll render pretty much anything if you ask nicely enough) people think that it's correct. You wouldn't "extend" ActionScript, or PHP, or JavaScript like this, what makes HTML so different?
You summed up my thoughts better than I could (and I didn't even attempt it...). In the end we have to be pragmatic, not theoratic.
Post a Comment
<< Home