Chemical Markup Language - Unit Dictionary Convention

3 March 2011

This version:
http://www.xml-cml.org/convention/unit-dictionary-20110303
Latest version:
http://www.xml-cml.org/convention/unit-dictionary
Authors:
See acknowledgments.
Editors:
Sam Adams, University of Cambridge
Joe Townsend, University of Cambridge

Abstract

This specification defines the requirements of the Chemical Markup Language unit-dictionary convention.


Table of Contents

1. Introduction
    1.1 Notational Conventions
    1.2 Namespaces
2. Applying the unit-dictionary convention
3. UnitList Element
    3.1 Namespace
    3.2 Title
    3.3 Description
    3.4 Units
4. Unit Elements
    4.1 ID
    4.2 Title
    4.3 Symbol
    4.4 Parent SI
    4.5 Multiplier and/or Constant to SI
    4.6 Unit Type
    4.7 Definition
    4.8 Description
5. Example Unit Dictionary

Appendices

A. References
B. Acknowledgements


1. Introduction

Units are required throughout CML and are usually indicated using the units attribute. Each unit needs to have a unique identifier and be defined in such a way that they can be understood by both humans and machines.

Lists of units are similar to dictionaries but require more information for each "entry" such as their relationship to a standard (SI) base unit or what type of unit they are i.e. the units metre, angstrom and picometre are all of type length whilst the unit Kelvin is of type temperature. The phrases "unit list" and "unit dictionary" are used interchangably; the only reason for choosing one term over the other is for readability.

Where units are already defined in the standard unit dictionaries (see http://www.xml-cml.org/unit/) these units SHOULD be used, rather than redefining the concepts in another unit dictionary.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [ IETF RFC 2119 ].

The terms "element", "attribute", "child" and "parent" in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup Language (XML) [ W3C XML ].

The use of fonts is as follows:

1.2 Namespaces

This specification uses the following namespaces and prefixes to indicate those namespaces:

Prefix Namespace URI Description
cml http://www.xml-cml.org/schema Chemical Markup Language elements
convention http://www.xml-cml.org/convention/ Standard Chemical Markup Language convention namespace
xhtml http://www.w3.org/1999/xhtml XHTML

2. Applying the unit-dictionary convention

The unit-dictionary convention MUST be specified using the convention attribute on a unitList element.

3. UnitList Element

3.1 Namespace

The unitList element MUST have a namespace attribute, the value of which MUST be a valid URI defining the the scope within which the entry terms are unique.

The unitList's namespace URI SHOULD resolve to a representation of the dictionary of units. The unitList's namespace URI SHOULD end with either a '/' character or a '#' character so that terms may be referenced by appending them to the URI.

3.2 Title

The unitList element SHOULD have a title attribute intended for human-readability.

The title attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

The value of the title attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.

3.3 Description

The unitList element SHOULD have a single description child element, the contents of which provide a human-readable description of the domain of the dictionary. The description element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace. The description element MUST NOT contain any child elements not in the http://www.w3.org/1999/xhtml namespace.

3.4 Units

The unitList element MUST contain one more more child unit elements, and MUST not contain any other child elements from the http://www.xml-cml.org/schema namespace.

<?xml version="1.0" encoding="UTF-8" ?>
<unitList xmlns="http://www.xml-cml.org/schema"
     xmlns:convention="http://www.xml-cml.org/convention/"
     xmlns:xhtml="http://www.w3.org/1999/xhtml"
     convention="convention:unit-dictionary"
     title="example unit list"
     namespace="http://www.xml-cml.org/unit/">
     <description>
         <xhtml:p>
            This is an example unit list for demonstration purposes
         </xhtml:p>
     </description>
     <unit>
     <!-- rest of document omitted -->
     </unit>
</unitList>

4. Unit Elements

4.1 ID

A unit element MUST have an id attribute, the value of which MUST be unique within the scope of the unitList.

The value of the id attribute MUST start with a letter, and MUST only contain letters, numbers, dot, hyphen or underscore.

IdStartChar ::= [A-Z] | [a-z]
IdChar ::= IdStartChar | [0-9] | "." | "-" | "_"
Id ::= IdStartChar (IdChar)*

4.2 Title

A unit element MUST have a title attribute, the value of which will typically be the full name of the unit whilst the id is typically an abbreviation. For example, the SI unit of mass is the kilogram; the id is 'kg' and the title is 'kilogram'.

The title attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

4.3 Symbol

A unit element MUST have a symbol attribute, the value of which is the full symbol used to represent this unit. For example the units 'Kelvin', 'Hertz', 'Joule' and 'Becquerel' have symbols 'K', 'Hz', 'J' and 'Bq' respectively.

The symbol attribute MUST NOT be empty and MUST contain at least one non-whitespace character.

The value of the symbol attribute MAY contain any valid unicode character, however it is RECOMMENDED that any character from outside of the ASCII subset (codepoints 32-127) is represented using an entity reference.

4.4 Parent SI

A unit element MUST have a parentSI attribute, the value of which is a QName referencing the parent SI unit e.g. 'calorie' has the SI parent 'joule'.

4.5 Multiplier and/or Constant to SI

A unit element MUST have at least one of a multiplierToSI attribute and a constantToSI attribute, the value of which must be a double.

multiplerToSI specifies the factor by which the non-SI unit should be multiplied to convert a quantity to its representation in SI Units. This is applied before constantToSI. The value is unity for an SI unit by definition.

constantToSI specifies the amount to add to a quantity in non-SI units to convert its representation to SI Units. This is applied after multiplierToSI. The value is zero for SI units by definition.

A unit that is related to an SI unit only by a constant offset e.g. Celsius, which has a constant to the SI unit of temperature (Kelvin) of '273' need not specify a multiplier (although it MAY specify a multiplier to SI of '1').

A unit that is related to an SI unit only by a constant multiplier e.g. gram, which has a multiplier to the SI unit kilogram of '0.001' need not specify a constant (although it MAY specify a constant to SI of '0').

4.6 Unit Type

Every unit element MUST have a unitType attribute, the value of which is a QName referencing the unit type (e.g. time, temperature, length, force) of the unit.

4.7 Definition

A unit element MUST contain a single definition child element, the content of which provides a concise human-readable definition of the unit. For example, the definition of the SI unit of time (a second) would be;
"The SI base unit of time, equal to the duration of 9192631770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium-133 atom."

The definition element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace.

There MUST be at least one non-whitespace character as a child of the http://www.w3.org/1999/xhtml element.

4.8 Description

A unit element MAY have a single description child element, the content of which provides further information regarding the unit, including, but not limited to: examples, human-readable semantics and hyperlinks to other useful resources.

The description element MUST contain one or more child elements in the http://www.w3.org/1999/xhtml namespace.

There MUST be at least one non-whitespace character as a child of the http://www.w3.org/1999/xhtml element.

5. Example Dictionary

                <?xml version="1.0" encoding="UTF-8" ?>
                <unitList
                        xmlns:convention="http://www.xml-cml.org/convention/"
                        xmlns:siUnits="http://www.xml-cml.org/unit/si/"
                        xmlns="http://www.xml-cml.org/schema"
                        xmlns:h="http://www.w3.org/1999/xhtml"
                        xmlns:unitType="http://www.xml-cml.org/unit/unitType/"
                        convention="convention:unit-dictionary"
                        title="example units dictionary"
                        namespace="http://www.xml-cml.org/unit/example/"
                        >
                    <description>
                        <h:p>
                            An example units dictionary.
                        </h:p>
                    </description>
                    <unit
                            title="second"
                            id="s"
                            symbol="s"
                            parentSI="siUnits:s"
                            multiplierToSI="1" 
                            constantToSI="0"
                            unitType="unitType:time">
                        <definition>
                            <h:p>
                                The SI base unit of time, equal to the duration of 9192631770 periods of the radiation corresponding 
                                to the transition between the two hyperfine levels of the ground state of the 
                                caesium-133 atom.
                            </h:p>
                        </definition>
                        <description>
                            <h:p>
                                The second has had many definitions throughout history; originally, it was one sixtieth
                                of one twenty-fourth of a solar day (the factor of sixty coming from Babylonian counting
                                and the factor of 24 from Ancient Egypt).
                            </h:p>
                            <h:p>
                                The present definition dates from the Thirteenth General Conference on Weights and Measures,
                                which took place in 1967.
                            </h:p>
                        </description>
                    </unit>
                    <unit
                            title="metre"
                            id="m"
                            symbol="m"
                            parentSI="siUnits:m"
                            multiplierToSI="1.0" 
                            unitType="unitType:length">
                        <definition>
                            <h:p>
                                The SI base unit of length, defined as the length of the path travelled by 
                                light in absolute vacuum during 1/299792458 of a second.
                            </h:p>
                        </definition>	
                        <description>
                            <h:p>
                                The modern metre dates from 1791, when it was defined one ten-millionth of
                                the length of the earth's meridian along a quadrant; it became France's
                                official unit of length in 1793. Until 1960, the metre (like the kilogram)
                                was defined by a prototype - in this case, a platinum-iridium bar; in 1960,
                                the SI defined the metre as 1650763.73 wavelengths of the orange-red 
                                emission line (the 2p10 - 5d5 transition) in the EM spectrum of Krypton-86 
                                in vacuum. Since 1983, the present definition has been used.
                            </h:p>
                        </description>
                    </unit>
                    <unit 
                            title="Angstrom" 
                            id="angstrom" 
                            symbol="&#197;" 
                            parentSI="siUnits:m"
                            multiplierToSI="1E-10" 
                            unitType="unitType:length">
                    <definition>
                      <h:p>
                        1E-10 metres.
                    </h:p>
                    </definition>
                    <description>
                      <h:p>
                        The angstrom is named after the Swedish physicist Anders Jonas Angstrom
                        (1814-1874), one of the founders of spectroscopy, after his
                        spectrum chart
                        of solar radiation in the electromagnetic spectrum on the order
                        of multiples of one ten-millionth of a millimetre, or 1E-10 metres.
                      </h:p>
                    </description>
                  </unit>
                </unitList>
            

A. References

[RFC2119]
IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels , S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition) , T. Bray, J. Paoli, C.M. Sperberg-McQueen E. Maler and F. Yergeau, Editors. World Wide Web Consortium. 26 October 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. latest version of XML is available at http://www.w3.org/TR/REC-xml.

B. Acknowledgements


Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.