Match your tags

A recent exchange on the micro-blogging service Twitter prompted me to write this quick note.

One of my friends complained about the work involved when searching for unmatched tags in an XHTML file.

So here is my Open Source toolchain for lazy web authors/coders.

1. The Editor

Text editors are the subject of heated religious discussion in some parts of the old Intertubes. They come in all shapes, sizes and degrees of power.

The good people at WordPress have a nice little write-up on editors with links to project pages. You find it here.

I use Vim.

2. A good color scheme for your editor

Color schemes are one of the really helpful features of syntax-aware editors. For Vim I chose Shobogenzo to which I added a few customisations of my own.

Shobogenzo was written by a friendly Dutchman who does a lot of web stuff, hence it has better support for HTML, CSS and Javascript than some of the colorschemes that were written with low level languages in mind.

Nothing will alert you to an unmatched tag quicker than the fact that everything between cursor and EOF drowns in bold text of a particularily vile color.

You can compare Vim color schemes at this site.

3. Grep

Vim has some neat and nifty search and replace features of its own, but true to the old *nix philosophy of not re-inventing the wheel when it can be avoided it leaves things to the core utils wherever possible.

So – when it comes to finding the number of occurences of a tag in an XHTML file you can pipe your stuff to grep like so: :w ! grep -c "<tagname\W"then run grep on the closing instance of your tag like so :w ! grep -c "</tagname>". The difference between the two results tells you the number of unmatched tags – if any.

This will save you a lot of scrolling up and down while counting.

Grep can also show line numbers from the input file when used on a file from the command line. grep -n "regex" inputfile will do the trick. Redirect output to a file or to a pager for more comfortable reading. See man grep for more details.

4. HTML Tidy

The original Tidy was built by Dave Raggett and lived at W3C. The project has since been taken over by a team and now operates out of Sourceforge. This link takes you to their project page

Tidy knows all sorts of neat tricks and can amongst others do a quick well-formedness check on your HTML source, or convert HTML to XHTML.

In a *nix environment Tidy can read from stdin – which means that you can pass it snippets of HTML and it will turn them into valid XHTML. You will have to remove some fluff from the output cause Tidy always attempts to produce a full file from what it’s given.

Check man tidy for more info.

5. xmlstarlet

This is a lean and fast command line util for all sorts of quick and dirty manipulations on X(HT)ML files, on top of being able to produce basic well-formedness checks (optionally against a DTD).

xmlstarlet’s man page may be a bit cryptic for first time users. I recommend a visit to the project page at Sourceforge which has some extended examples of usage.

6. Spidermonkey Javascript Shell

The fine folks at Mozilla have produced a standalone version of their Spidermonkey javascript engine that you can use to run quick syntax checks from the commandline.

It can read commands from file or from stdin. Check it out here.


This article is not meant to be exhaustive. So please don’t wail or flame if your favourite tool was not listed. Go and produce a write-up of your own.


About dozykraut

Proud member of Hillbilly's on Linux, promoting open source redneckism in remote parts of the Milky Way.
This entry was posted in Linux and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s