A recent exchange on the micro-blogging service Twitter prompted me to write this quick note.
One of my friends complained about the work involved when searching for unmatched tags in an XHTML file.
So here is my Open Source toolchain for lazy web authors/coders.
1. The Editor
Text editors are the subject of heated religious discussion in some parts of the old Intertubes. They come in all shapes, sizes and degrees of power.
The good people at WordPress have a nice little write-up on editors with links to project pages. You find it here.
I use Vim.
2. A good color scheme for your editor
Color schemes are one of the really helpful features of syntax-aware editors. For Vim I chose Shobogenzo to which I added a few customisations of my own.
Nothing will alert you to an unmatched tag quicker than the fact that everything between cursor and EOF drowns in bold text of a particularily vile color.
You can compare Vim color schemes at this site.
Vim has some neat and nifty search and replace features of its own, but true to the old *nix philosophy of not re-inventing the wheel when it can be avoided it leaves things to the core utils wherever possible.
So – when it comes to finding the number of occurences of a tag in an XHTML file you can pipe your stuff to grep like so:
:w ! grep -c "<tagname\W"then run grep on the closing instance of your tag like so
:w ! grep -c "</tagname>". The difference between the two results tells you the number of unmatched tags – if any.
This will save you a lot of scrolling up and down while counting.
Grep can also show line numbers from the input file when used on a file from the command line.
grep -n "regex" inputfile will do the trick. Redirect output to a file or to a pager for more comfortable reading. See
man grep for more details.
4. HTML Tidy
The original Tidy was built by Dave Raggett and lived at W3C. The project has since been taken over by a team and now operates out of Sourceforge. This link takes you to their project page
Tidy knows all sorts of neat tricks and can amongst others do a quick well-formedness check on your HTML source, or convert HTML to XHTML.
In a *nix environment Tidy can read from
stdin – which means that you can pass it snippets of HTML and it will turn them into valid XHTML. You will have to remove some fluff from the output cause Tidy always attempts to produce a full file from what it’s given.
man tidy for more info.
This is a lean and fast command line util for all sorts of quick and dirty manipulations on X(HT)ML files, on top of being able to produce basic well-formedness checks (optionally against a DTD).
xmlstarlet’s man page may be a bit cryptic for first time users. I recommend a visit to the project page at Sourceforge which has some extended examples of usage.
It can read commands from file or from stdin. Check it out here.
This article is not meant to be exhaustive. So please don’t wail or flame if your favourite tool was not listed. Go and produce a write-up of your own.