Personal tools
 

XML Design Issues / FieldML Style Guide


Element and Attribute Naming

In summary: as long as possible. Element names that introduce new concepts into the tree should be as descriptive as possible, and elements where some information can be gleaned from their context may be shortened. The storage requirements of element and attribute names should never be considered an important factor, as basic GZIP compression performs miracles on plain text, and compression techniques tailored specifically for XML documents will soon be commonplace (see the XMill effort).

Element and attribute names defined in the FieldML specification that consist of multiple English words (or abbreviations, such as num instead of number), will be entirely lower case, and separated by hyphens. This differs from MathML, where no separator is placed between English words (eg. <maligngroup>), and BioML, which uses underscores to separate words. Some example elements are <num-elements> and <num-local-coordinates>.

Elements vs. Attributes

The structure of the XML languages that are developed at Engineering Science is based on the following general philosophy with some exceptions. By far the most contentious issue in DTD design is the attributes versus element debate. On this issue, the following comments were posted to the XML-DEV newsgroup in November 1999 by Steven Champeon, and I tend to agree with them:

The method I try to use involves asking the following
questions:

1. Is a given element capable of having components
which may themselves have components? If so, use
nested elements.

2. Does a given attribute lose any meaning if taken
out of sequence? If so, use a nested element.

3. How does your set of elements/attributes map to a
database schema? could you then take that schema
and normalize it more completely? do you lose
structural information by so doing? If so, keep
your existing tagset definition. If not, move
element values into attributes.

4. Are there limitations/weaknesses in my processing
application that make it harder or more complex to
grab an attribute value than to grab an element by
name?

In the case of CellML/FieldML/PhysiomeML, data files marked up using these languages are unlikely to be processed by anything other than complex software applications, making (4) irrelevant; for instance, it is unlikely that XML will be combined with CSS for display in the browser, but only with XSL which can perform complex transformations. In the design phase, a database schema is non-obvious, so (3) will be ignored for now. This leaves only (1) and (2) as deciding factors.

I'd like to add to Steven's comments that the use of attributes allows some semantic information to be specified (for instance an attribute can be an ID), or some constraints on attribute values to be applied (for instance a NMTOKEN must not contain whitespace.) These arguments will lose some of their validity after the release of a XML-Schema recommendation, which will allow greater control over element content than is currently possible.

Numerical Indexing

Indexing into lists and vectors, which is required frequently in field descriptions for referencing ensemble field parameter vectors among other things, will be from 1. This is due to a fondness for aging legacy programming languages in the Department of Engineering Science establishment.