Information technology — Document Schema Definition Language (DSDL) — Part 2: Regular-grammar-based validation — RELAX NG

ISO/IEC 19757-2:2003 specifies RELAX NG, a schema language for XML. A RELAX NG schema specifies a pattern for the structure and content of an XML document. The pattern is specified by using a regular tree grammar. A RELAX NG schema is itself an XML document. ISO/IEC 19757-2:2003 specifies: when an XML document is a correct RELAX NG schema; and when an XML document is valid with respect to a correct RELAX NG schema.

Technologies de l'information — Langage de définition de schéma de documents (DSDL) — Partie 2: Validation de grammaire orientée courante — RELAX NG

General Information

Status
Withdrawn
Publication Date
27-Nov-2003
Withdrawal Date
27-Nov-2003
Current Stage
9599 - Withdrawal of International Standard
Completion Date
10-Dec-2008
Ref Project

Relations

Buy Standard

Standard
ISO/IEC 19757-2:2003 - Information technology -- Document Schema Definition Language (DSDL)
English language
34 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 19757-2
First edition
2003-12-01


Information technology — Document
Schema Definition Language (DSDL) —
Part 2:
Regular-grammar-based validation —
RELAX NG
Technologies de l’information — Langage de définition de schéma de
documents (DSDL) —
Partie 2: Validation de grammaire orientée courante — RELAX NG





Reference number
ISO/IEC 19757-2:2003(E)
©
ISO/IEC 2003

---------------------- Page: 1 ----------------------
ISO/IEC 19757-2:2003(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.


©  ISO/IEC 2003
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2003 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 19757-2:2003(E)

Contents Page
Foreword. v
Introduction. vi

1
Scope. 1

2 Normative references. 1
3 Terms and definitions. 1
4 Notation. 4
4.1 EBNF. 4
4.2 Inference rules. 5
4.2.1 Variables. 5
4.2.2
Propositions. 5
4.2.3
Expressions. 6
5
Data model. 7
6
Full syntax. 8
7 Simplification. 9
7.1 General. 9
7.2 Annotations. 9
7.3 Whitespace. 9
7.4 datatypeLibrary attribute. 10
7.5 type attribute of value element. 10
7.6
href attribute. 10
7.7
externalRef element. 10
7.8
include element. 10
7.9
name attribute of element and attribute elements. 11
7.10 ns attribute. 11
7.11 QNames. 11
7.12 div element. 11
7.13 Number of child elements. 11
7.14 mixed element. 12
7.15 optional element. 12
7.16
zeroOrMore element. 12
7.17
Constraints. 12
7.18
combine attribute. 13
7.19
grammar element. 13
7.20 define and ref elements. 14
7.21 notAllowed element. 14
7.22 empty element. 14
8 Simple syntax. 14
9 Semantics. 15
9.1 Inference rules. 15
9.2
Name classes. 16
9.3
Patterns. 16
9.3.1
choice pattern. 16
9.3.2
group pattern. 16
9.3.3 empty pattern. 17
9.3.4 text pattern. 17
9.3.5 oneOrMore pattern. 17
9.3.6 interleave pattern. 17
9.3.7 element and attribute pattern. 17
9.3.8 data and value pattern. 18
9.3.9
Built-in datatype library. 19
© ISO/IEC 2003 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 19757-2:2003(E)
9.3.10
list pattern. 19
9.4
Validity. 19
10
Restrictions. 19
10.1
General. 19

10.2 Prohibited paths. 19
10.2.1 General. 19
10.2.2 attribute pattern. 20

10.2.3 oneOrMore pattern. 20
10.2.4 list pattern. 20
10.2.5 except element in data pattern. 20
10.2.6
start element. 21
10.3
String sequences. 21
10.4
Restrictions on attributes. 23
10.5
Restrictions on interleave. 23
11 Conformance. 23
Annex A (normative) RELAX NG schema for RELAX NG. 24
Annex B (informative) Examples. 30
B.1 Data model. 30
B.2 Full syntax example. 31
B.3 Simple syntax example. 31
B.4
Validation example. 32
Bibliography. 34
iv © ISO/IEC 2003 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 19757-2:2003(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 19757-2 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 34, Document description and processing languages.
ISO/IEC 19757 consists of the following parts, under the general title Information technology — Document
Schema Definition Language (DSDL):
— Part 2: Regular-grammar-based validation — RELAX NG
The following parts are under preparation.
— Part 1: Overview
— Part 4: Selection of validation candidates
Rule-based validation — Schematron, Datatypes, Path-based integrity constraints, Character repertoire
validation, Declarative document manipulation, Datatype- and namespace-aware DTDs and Interoperability
framework will form the subjects of future Parts 3, 5, 6, 7, 8, 9 and 10, respectively.

© ISO/IEC 2003 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC 19757-2:2003(E)
Introduction
The structure of this part of ISO/IEC 19757 is as follows. Clause 5 describes the data model, which is the
abstraction of an XML document used throughout the rest of the document. Clause 6 describes the syntax of a
RELAX NG schema. Clause 7 describes a sequence of transformations that are applied to simplify a RELAX NG

schema, and also specifies additional requirements on a RELAX NG schema. Clause 8 describes the syntax that
results from applying the transformations; this simple syntax is a subset of the full syntax. Clause 9 describes the

semantics of a correct RELAX NG schema that uses the simple syntax; the semantics specify when an element is
valid with respect to a RELAX NG schema. Clause 10 describes requirements that apply to a RELAX NG schema
after it has been transformed into simple form. Finally, Clause 11 describes conformance requirements for RELAX
NG validators.
[1]
This part of ISO/IEC 19757 is based on the RELAX NG Specification . A tutorial for RELAX NG is available
[2]
separately (see the RELAX NG Tutorial ).
vi © ISO/IEC 2003 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19757-2:2003(E)

Information technology — Document Schema Definition
Language (DSDL) —
Part 2:
Regular-grammar-based validation — RELAX NG
1 Scope

This part of ISO/IEC 19757 specifies RELAX NG, a schema language for XML. A RELAX NG schema specifies a
pattern for the structure and content of an XML document. The pattern is specified by using a regular tree
grammar. This part of ISO/IEC 19757 establishes requirements for RELAX NG schemas and specifies when an
XML document matches the pattern specified by a RELAX NG schema.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references,
only the edition cited applies. For undated references, the latest edition of the referenced document (including any
amendments) applies.
NOTE Each of the following documents has a unique identifier that is used to cite the document in the text. The unique
identifier consists of the part of the reference up to the first comma.
W3C XML, Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 6 October 2000,
available at
W3C XML-Names, Namespaces in XML, W3C Recommendation, 14 January 1999, available at

W3C XLink, XML Linking Language (XLink) Version 1.0, W3C Recommendation, 27 June 2001, available at

W3C XML-Infoset, XML Information Set, W3C Recommendation, 24 October 2001, available at

IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies,
Internet Standards Track Specification, November 1996, available at
IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, Internet Standards Track
Specification, November 1996, available at
IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, Internet Standards Track Specification,
August 1998, available at
IETF RFC 2732, Format for Literal IPv6 Addresses in URL's, Internet Standards Track Specification, December
1999, available at
IETF RFC 3023, XML Media Types, Internet Standards Track Specification, August 1998, available at

3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
resource
something with identity, potentially addressable by a URI
© ISO/IEC 2003 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO/IEC 19757-2:2003(E)
3.2
URI
compact string of characters that uses the syntax defined in IETF RFC 2396 to identify an abstract or physical
resource
3.3
URI reference
URI or relative URI and optional fragment identifier
3.4
relative URI
form of URI reference that can be resolved with respect to a base URI to produce another URI
3.5
base URI
URI used to resolve relative URIs
3.6
fragment identifier
additional information in a URI reference used by a user agent after the retrieval action on a URI has been
successfully performed
3.7
instance
XML document that is being validated with respect to a RELAX NG schema
3.8
space character
character with the code value #x20
3.9
whitespace character
character with the code value #x20, #x9, #xA or #xD
3.10
name
pair of a URI and a local name
3.11
namespace URI
URI that is part of a name
3.12
local name
NCName that is part of a name
3.13
NCName
string that matches the NCName production of W3C XML-Names
3.14
name class
part of a schema that can be matched against a name
3.15
pattern
part of a schema that can be matched against a set of attributes and a sequence of elements and strings
2 © ISO/IEC 2003 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 19757-2:2003(E)
3.16
foreign attribute
attribute with a name whose namespace URI is neither the empty string nor the RELAX NG namespace URI
3.17
foreign element
an element with a name whose namespace URI is not the RELAX NG namespace URI
3.18
full syntax
syntax of a RELAX NG grammar before simplification
3.19
simple syntax
syntax of a RELAX NG grammar after simplification
3.20
simplification
transformation of a RELAX NG schema in the full syntax to a schema in the simple syntax
3.21
datatype library
mapping from local names to datatypes
NOTE a datatype library is identified by a URI
3.22
datatype
set of strings together with an equivalence relation on that set
3.23
axiom
proposition that is provable unconditionally
3.24
inference rule
rule consisting of one or more positive or negative antecendents and exactly one consequent, which makes the
consequent provable if all the positive antecedents are provable and none of the negative antecendents is
provable
3.25
valid with respect to a schema
member of the set of XML documents described by the schema
3.26
schema
specification of a set of XML documents
3.27
grammar
start pattern together with a mapping from NCNames to patterns
3.28
correct schema
schema that satisfies all the requirements of this part of ISO/IEC 19757
© ISO/IEC 2003 – All rights reserved 3

---------------------- Page: 9 ----------------------
ISO/IEC 19757-2:2003(E)
3.29
validator
software module that determine whether a schema is correct and whether an instance is valid with respect to a
schema
3.30
path
list of NCNames separated by / or //
3.31
infoset
an abstraction of an XML document defined by W3C XML-Infoset
3.32
information item
constituent of an information set
3.33
data model
abstract representation of an XML document defined by this part of ISO/IEC 19757
3.34
XML document
string that is a well-formed XML document as defined in W3C XML
3.35
EBNF
Extended BNF
notation used to described context-free grammars
3.36
weak matching
kind of matching specified in detail in 9.3.7
3.37
in-scope grammar
nearest ancestor grammar element
3.38
content-type
one of the three values empty, complex, or simple
3.39
mixed sequence
sequence that may contain both elements and strings
4 Notation
4.1 EBNF
This part of ISO/IEC 19757 uses EBNF notation to describe the full syntax and the simple syntax of RELAX NG. A
description of a grammar in EBNF consists of one or more production rules. Each production rule consists of the
name of a non-terminal, followed by ::=, followed by a list of alternatives separated by |. Within an alternative,
italic type is used to reference a non-terminal, concatenation indicates sequencing, [] indicates optionality, +
indicates repetition one or more times and * indicates repetition zero or more times; other characters in normal
type stand for themselves.
4 © ISO/IEC 2003 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC 19757-2:2003(E)
4.2 Inference rules
4.2.1 Variables
The symbol used for a variable indicates the variable's range as follows:
— n ranges over names
— nc ranges over name classes
— ln ranges over local names; a local name is a string that matches the NCName production of W3C XML-
Names, that is, a name with no colons
— u ranges over URIs
— cx ranges over contexts (as defined in Clause 5)
— a ranges over sets of attributes; a set with a single member is considered the same as that member
— m ranges over sequences of elements and strings; a sequence with a single member is considered the same
as that member; the sequences ranged over by m may contain consecutive strings and may contain strings
that are empty
NOTE There are sequences ranged over by m that cannot occur as the children of an element.
— p ranges over patterns (elements matching the pattern production)
— s ranges over strings
— ws ranges over the empty sequence and strings that consist entirely of whitespace
— params ranges over sequences of parameters
— e ranges over elements
— ct ranges over content-types
4.2.2 Propositions
The following notation is used for propositions:
— n in nc means that name n is a member of name class nc
— cx ⊦ a; m =~ p means that with respect to context cx, the attributes a and the sequence of elements and
strings m matches the pattern p
— disjoint(a , a ) means that there is no name that is the name of both an attribute in a and of an attribute in a
1 2 1 2
— m interleaves m ; m means that m is an interleaving of m and m
1 2 3 1 2 3
— cx ⊦ a; m =~ p means that with respect to context cx, the attributes a and the sequence of elements and
weak
strings m weakly matches the pattern p
— okAsChildren(m) means that the mixed sequence m can occur as the children of an element: it does not
contain any member that is an empty string, nor does it contain two consecutive members that are both
strings
© ISO/IEC 2003 – All rights reserved 5

---------------------- Page: 11 ----------------------
ISO/IEC 19757-2:2003(E)
— deref(ln) = nc p means that the grammar contains nc
p
— datatypeAllows(u, ln, params, s, cx) means that in the datatype library identified by URI u, the string s
interpreted with context cx is a legal value of datatype ln with parameters params
— datatypeEqual(u, ln, s , cx , s , cx ) means that in the datatype library identified by URI u, string s
1 1 2 2 1
interpreted with context cx represents the same value of the datatype ln as the string s interpreted in the
1 2
context of cx
2
— s = s means that s and s are identical
1 2 1 2
— valid(e) means that the element e is valid with respect to the grammar
— start() = p means that the grammar contains p
— groupable(ct , ct ) means that the content-types ct and ct are groupable
1 2 1 2
— p : ct means that pattern p has content-type ct
c
— incorrectSchema() means that the schema is incorrect
4.2.3 Expressions
The following notation is used for expressions in propositions:
— name( u, ln ) returns a name with URI u and local name ln
— m , m returns the concatenation of the sequences m and m
1 2 1 2
— a + a returns the union of a and a
1 2 1 2
— ( ) returns an empty sequence
— { } returns an empty set
— "" returns an empty string
— attribute( n, s ) returns an attribute with name n and value s
— element( n, cx, a, m ) returns an element with name n, context cx, attributes a and mixed sequence m as
children
— max( ct , ct ) returns the maximum of ct and ct where the content-types in increasing order are empty( ),
1 2 1 2
complex( ), simple( )
— normalizeWhiteSpace( s ) returns the string s, with leading and trailing whitespace characters removed, and
with each other maximal sequence of whitespace characters replaced by a single space character
— split( s ) returns a sequence of strings one for each whitespace delimited token of s; each string in the
returned sequence will be non-empty and will not contain any whitespace
— context( u, cx ) returns a context which is the same as cx except that the default namespace is u; if u is the
empty string, then there is no default namespace in the constructed context
— empty( ) returns the empty content-type
— complex( ) returns the complex content-type
6 © ISO/IEC 2003 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/IEC 19757-2:2003(E)
— simple( ) returns the simple content-type
— [cx] within the start-tag of a pattern refers to the context of the pattern element
5 Data model
RELAX NG deals with XML documents representing both schemas and instances through an abstract data
model. XML documents representing schemas and instances shall be well-formed in conformance with W3C XML
and shall conform to the constraints of W3C XML-Names.
An XML document is represented by an element. An element consists of
— a name
— a context
— a set of attributes
— an ordered sequence of zero or more children; each child is either an element or a non-empty string; the
sequence never contains two consecutive strings
A name consists of
— a string representing the namespace URI; the empty string has special significance, representing the
absence of any namespace
— a string representing the local name; this string matches the NCName production of W3C XML-Names
A context consists of
— a base URI
— a namespace map; this maps prefixes to namespace URIs, and also may specify a default namespace URI
(as declared by the xmlns attribute)
An attribute consists of
— a name
— a string representing the value
A string consists of a sequence of zero or more characters, where a character is as defined in W3C XML.
The element for an XML document is constructed from t
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.