Information technology — Document Schema Definition Language (DSDL) — Part 2: Regular-grammar-based validation — RELAX NG

ISO/IEC 19757-2:2008 specifies RELAX NG, a schema language for XML. A RELAX NG schema specifies a pattern for the structure and content of an XML document. The pattern is specified by using a regular tree grammar. It establishes requirements for RELAX NG schemas and specifies when an XML document matches the pattern specified by a RELAX NG schema.

Technologies de l'information — Langage de définition de schéma de documents (DSDL) — Partie 2: Validation de grammaire orientée courante — RELAX NG

General Information

Status
Published
Publication Date
09-Dec-2008
Current Stage
9093 - International Standard confirmed
Completion Date
06-Jun-2019
Ref Project

Relations

Buy Standard

Standard
ISO/IEC 19757-2:2008 - Information technology -- Document Schema Definition Language (DSDL)
English language
43 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 19757-2
Second edition
2008-12-15

Information technology — Document
Schema Definition Language (DSDL) —
Part 2:
Regular-grammar-based validation —
RELAX NG
Technologies de l'information — Langage de définition de schéma de
documents (DSDL) —
Partie 2: Validation de grammaire orientée courante — RELAX NG



Reference number
ISO/IEC 19757-2:2008(E)
©
ISO/IEC 2008

---------------------- Page: 1 ----------------------
ISO/IEC 19757-2:2008(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.


COPYRIGHT PROTECTED DOCUMENT


©  ISO/IEC 2008
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2008 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 19757-2:2008(E)
Contents Page
Foreword. iii
Introduction. iv
1 Scope. 1
2 Normative references. 1
3 Terms and definitions. 1
4 Notation. 4
4.1 EBNF. 4
4.2 Inference rules. 4
4.2.1 Variables. 4
4.2.2 Propositions. 5
4.2.3 Expressions. 6
5 Data model. 7
6 Full syntax. 8
7 Simplification. 9
7.1 General. 9
7.2 Annotations. 9
7.3 Whitespace. 9
7.4 datatypeLibrary attribute. 9
7.5 type attribute of value element. 10
7.6 href attribute. 10
7.7 externalRef element. 10
7.8 include element. 10
7.9 name attribute of element and attribute elements. 11
7.10 ns attribute. 11
7.11 QNames. 11
7.12 div element. 11
7.13 Number of child elements. 11
7.14 mixed element. 12
7.15 optional element. 12
7.16 zeroOrMore element. 12
7.17 Constraints. 12
7.18 combine attribute. 13
7.19 grammar element. 13
7.20 define and ref elements. 13
7.21 notAllowed element. 14
7.22 empty element. 14
8 Simple syntax. 14
9 Semantics. 15
9.1 Inference rules. 15
9.2 Name classes. 15
9.3 Patterns. 16
9.3.1 choice pattern. 16
9.3.2 group pattern. 16
9.3.3 empty pattern. 16
9.3.4 text pattern. 16
9.3.5 oneOrMore pattern. 17
9.3.6 interleave pattern. 17
9.3.7 element and attribute pattern. 17
9.3.8 data and value pattern. 18
9.3.9 Built-in datatype library. 18
9.3.10 list pattern. 19
9.4 Validity. 19
© ISO/IEC 2008 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 19757-2:2008(E)
10 Restrictions. 19
10.1 General. 19
10.2 Prohibited paths. 19
10.2.1 General. 19
10.2.2 attribute pattern. 20
10.2.3 oneOrMore pattern. 20
10.2.4 list pattern. 20
10.2.5 except element in data pattern. 20
10.2.6 start element. 21
10.3 String sequences. 21
10.4 Restrictions on attributes. 23
10.5 Restrictions on interleave. 23
11 Conformance. 23
Annex A (normative) RELAX NG schema for RELAX NG. 24
Annex B (informative) Examples. 30
B.1 Data model. 30
B.2 Full syntax example. 31
B.3 Simple syntax example. 31
B.4 Validation example. 32
Annex C (normative) RELAX NG Compact syntax. 34
C.1 Introduction. 34
C.2 Syntax. 34
C.3 Lexical structure. 36
C.4 Declarations. 37
C.5 Annotations. 39
C.5.1 Support for annotations. 39
C.5.2 Initial annotations. 39
C.5.3 Documentation shorthand. 39
C.5.4 Following annotations. 40
C.5.5 Grammar annotations. 40
C.6 Conformance. 40
C.6.1 Types of conformance. 40
C.6.2 Validator. 41
C.6.3 Structure preserving translator. 41
C.6.4 Non-structure preserving translator. 41
C.7 Media type registration template for the RELAX NG Compact Syntax. 41
Bibliography. 43
iv © ISO/IEC 2008 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 19757-2:2008(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form
the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in
the development of International Standards through technical committees established by the respective organization
to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual
interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International Standards
adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International
Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights.
ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 19757-2 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee
SC 34, Document description and processing languages.
This second edition cancels and replaces the first edition (ISO/IEC 19757-2:2003), of which it constitutes a minor revision.
It also incorporates the Amendment ISO/IEC 19757-2:2003/Amd.1:2006.
ISO/IEC 19757 consists of the following parts, under the general title Information technology — Document Schema
Definition Language (DSDL):
— Part 2: Regular-grammar-based validation — RELAX NG
— Part 3: Rule-based validation — Schematron
— Part 4: Namespace-based validation dispatching language — NVDL
— Part 8: Document semantics renaming language —  DSRL
— Part 9: Namespace and datatype declaration in Document Type Definitions (DTDs)
The following parts are under preparation:
— Part 1: Overview
— Part 5: Extensible Datatypes
— Part 7: Character Repertoire Description Language (CREPDL)
© ISO/IEC 2008 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC 19757-2:2008(E)
Introduction
The structure of this part of ISO/IEC 19757 is as follows. Clause 5 describes the data model, which is the abstraction
of an XML document used throughout the rest of the document. Clause 6 describes the syntax of a RELAX NG
schema. Clause 7 describes a sequence of transformations that are applied to simplify a RELAX NG schema, and
also specifies additional requirements on a RELAX NG schema. Clause 8 describes the syntax that results from
applying the transformations; this simple syntax is a subset of the full syntax. Clause 9 describes the semantics of a
correct RELAX NG schema that uses the simple syntax; the semantics specify when an element is valid with respect
to a RELAX NG schema. Clause 10 describes requirements that apply to a RELAX NG schema after it has been
transformed into simple form. Finally, Clause 11 describes conformance requirements for RELAX NG validators.
[1]
This part of ISO/IEC 19757 is based on the RELAX NG Specification , and the compact syntax shown in Annex C
[3]
is based on the RELAX NG Compact Syntax . A tutorial for RELAX NG is available separately (see the RELAX NG
[2]
Tutorial ).
vi © ISO/IEC 2008 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19757-2:2008(E)
Information technology — Document Schema Definition
Language (DSDL) —
Part 2:
Regular-grammar-based validation — RELAX NG
1 Scope
This part of ISO/IEC 19757 specifies RELAX NG, a schema language for XML. A RELAX NG schema specifies a
pattern for the structure and content of an XML document. The pattern is specified by using a regular tree grammar.
This part of ISO/IEC 19757 establishes requirements for RELAX NG schemas and specifies when an XML document
matches the pattern specified by a RELAX NG schema.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references,
only the edition cited applies. For undated references, the latest edition of the referenced document (including any
amendments) applies.
NOTE Each of the following documents has a unique identifier that is used to cite the document in the text. The unique
identifier consists of the part of the reference up to the first comma.
W3C XML, Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 6 October 2000,
available at http://www.w3.org/TR/2000/REC-xml-20001006
W3C XML-Names, Namespaces in XML, W3C Recommendation, 14 January 1999, available at
http://www.w3.org/TR/1999/REC-xml-names-19990114/
W3C XLink, XML Linking Language (XLink) Version 1.0, W3C Recommendation, 27 June 2001, available at
http://www.w3.org/TR/2001/REC-xlink-20010627/
W3C XML-Infoset, XML Information Set, W3C Recommendation, 24 October 2001, available at
http://www.w3.org/TR/2001/REC-xml-infoset-20011024/
IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, Internet
Standards Track Specification, November 1996, available at http://www.ietf.org/rfc/rfc2045.txt
IETF RFC 2046, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, Internet Standards Track
Specification, November 1996, available at http://www.ietf.org/rfc/rfc2046.txt
IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, Internet Standards Track Specification, August
1998, available at http://www.ietf.org/rfc/rfc2396.txt
IETF RFC 2732, Format for Literal IPv6 Addresses in URL's, Internet Standards Track Specification, December 1999,
available at http://www.ietf.org/rfc/rfc2732.txt
IETF RFC 3023, XML Media Types, Internet Standards Track Specification, August 1998, available at
http://www.ietf.org/rfc/rfc3023.txt
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
resource
something with identity, potentially addressable by a URI
© ISO/IEC 2008 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO/IEC 19757-2:2008(E)
3.2
URI
compact string of characters that uses the syntax defined in IETF RFC 2396 to identify an abstract or physical resource
3.3
URI reference
URI or relative URI and optional fragment identifier
3.4
relative URI
form of URI reference that can be resolved with respect to a base URI to produce another URI
3.5
base URI
URI used to resolve relative URIs
3.6
fragment identifier
additional information in a URI reference used by a user agent after the retrieval action on a URI has been successfully
performed
3.7
instance
XML document that is being validated with respect to a RELAX NG schema
3.8
space character
character with the code value #x20
3.9
whitespace character
character with the code value #x20, #x9, #xA or #xD
3.10
name
pair of a URI and a local name
3.11
namespace URI
URI that is part of a name
3.12
local name
NCName that is part of a name
3.13
NCName
string that matches the NCName production of W3C XML-Names
3.14
name class
part of a schema that can be matched against a name
3.15
pattern
part of a schema that can be matched against a set of attributes and a sequence of elements and strings
2 © ISO/IEC 2008 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 19757-2:2008(E)
3.16
foreign attribute
attribute with a name whose namespace URI is neither the empty string nor the RELAX NG namespace URI
3.17
foreign element
element with a name whose namespace URI is not the RELAX NG namespace URI
3.18
full syntax
syntax of a RELAX NG grammar before simplification
3.19
simple syntax
syntax of a RELAX NG grammar after simplification
3.20
simplification
transformation of a RELAX NG schema in the full syntax to a schema in the simple syntax
3.21
datatype library
mapping from local names to datatypes
NOTE A datatype library is identified by a URI.
3.22
datatype
set of strings together with an equivalence relation on that set
3.23
axiom
proposition that is provable unconditionally
3.24
inference rule
rule consisting of one or more positive or negative antecendents and exactly one consequent, which makes the
consequent provable if all the positive antecedents are provable and none of the negative antecendents is provable
3.25
valid with respect to a schema
member of the set of XML documents described by the schema
3.26
schema
specification of a set of XML documents
3.27
grammar
start pattern together with a mapping from NCNames to patterns
3.28
correct schema
schema that satisfies all the requirements of this part of ISO/IEC 19757
3.29
validator
software module that determine whether a schema is correct and whether an instance is valid with respect to a schema
© ISO/IEC 2008 – All rights reserved 3

---------------------- Page: 9 ----------------------
ISO/IEC 19757-2:2008(E)
3.30
path
list of NCNames separated by / or //
3.31
infoset
an abstraction of an XML document defined by W3C XML-Infoset
3.32
information item
constituent of an information set
3.33
data model
abstract representation of an XML document defined by this part of ISO/IEC 19757
3.34
XML document
string that is a well-formed XML document as defined in W3C XML
3.35
EBNF
Extended BNF
notation used to described context-free grammars
3.36
weak matching
kind of matching specified in detail in 9.3.7
3.37
in-scope grammar
nearest ancestor grammar element
3.38
content-type
one of the three values empty, complex, or simple
3.39
mixed sequence
sequence that may contain both elements and strings
4 Notation
4.1 EBNF
This part of ISO/IEC 19757 uses EBNF notation to describe the full syntax and the simple syntax of RELAX NG. A
description of a grammar in EBNF consists of one or more production rules. Each production rule consists of the
name of a non-terminal, followed by ::=, followed by a list of alternatives separated by |. Within an alternative, italic
type is used to reference a non-terminal, concatenation indicates sequencing, [] indicates optionality, + indicates
repetition one or more times and * indicates repetition zero or more times; other characters in normal type stand for
themselves.
4.2 Inference rules
4.2.1 Variables
The symbol used for a variable indicates the variable's range as follows:
— n ranges over names
4 © ISO/IEC 2008 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC 19757-2:2008(E)
— nc ranges over name classes
— ln ranges over local names; a local name is a string that matches the NCName production of W3C XML-Names,
that is, a name with no colons
— u ranges over URIs
— cx ranges over contexts (as defined in Clause 5)
— a ranges over sets of attributes; a set with a single member is considered the same as that member
— m ranges over sequences of elements and strings; a sequence with a single member is considered the same as
that member; the sequences ranged over by m may contain consecutive strings and may contain strings that are
empty
NOTE There are sequences ranged over by m that cannot occur as the children of an element.
— p ranges over patterns (elements matching the pattern production)
— s ranges over strings
— ws ranges over the empty sequence and strings that consist entirely of whitespace
— params ranges over sequences of parameters
— e ranges over elements
— ct ranges over content-types
4.2.2 Propositions
The following notation is used for propositions:
— n in nc means that name n is a member of name class nc
— cx⊦ a; m =~ p means that with respect to context cx, the attributes a and the sequence of elements and strings
m matches the pattern p
— disjoint(a , a ) means that there is no name that is the name of both an attribute in a and of an attribute in a
1 2 1 2
— m interleaves m ; m means that m is an interleaving of m and m
1 2 3 1 2 3
— cx⊦ a; m =~ p means that with respect to context cx, the attributes a and the sequence of elements and
weak
strings m weakly matches the pattern p
— okAsChildren(m) means that the mixed sequence m can occur as the children of an element: it does not contain
any member that is an empty string, nor does it contain two consecutive members that are both strings
— deref(ln) = nc p means that the grammar contains nc p

— datatypeAllows(u, ln, params, s, cx) means that in the datatype library identified by URI u, the string s interpreted
with context cx is a legal value of datatype ln with parameters params
— datatypeEqual(u, ln, s , cx , s , cx ) means that in the datatype library identified by URI u, string s interpreted
1 1 2 2 1
with context cx represents the same value of the datatype ln as the string s interpreted in the context of cx
1 2 2
© ISO/IEC 2008 – All rights reserved 5

---------------------- Page: 11 ----------------------
ISO/IEC 19757-2:2008(E)
— s = s means that s and s are identical
1 2 1 2
— valid(e) means that the element e is valid with respect to the grammar

start() = p means that the grammar contains p
— groupable(ct , ct ) means that the content-types ct
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.