Xtract: a `grep'-like tool for XML documents


Caveats

Xtract parses XML documents strictly according to the well-formedness rules of the XML standard. We strongly recommend that you use an XML validation service (such as http://www.stg.brown.edu/service/xmlvalid/) to check your documents.

HTML documents may in addition use self-closing tags, and omit certain end-tags, so there is also a parser for HTML included. It is error-correcting - improperly nested tags (such as are produced by various commercial HTML-generators) are fixed according to a stack discipline. If you find that the error-correction breaks correct documents please report it as a bug.

There are certain aspects of the XML standard that we have not yet implemented: Unicode, the external subset, and parameter entity references in the DTD. Their absence should not cause important problems, but do let us know if you really really need them. Remember that this is BETA QUALITY software.

Xtract does not perform any validation against a document's DTD, nor does it attach a DTD to its output. If you want DTD validation, use separate tools.


The official Xtract website is at http://www.cs.york.ac.uk/fp/Xtract/