Information technology — Database languages — SQL Technical Reports — Part 1: XQuery Regular Expression Support in SQL

ISO/IEC TR 19075-1:2011 describes the regular expression support in SQL adopted from the regular expression syntax of XQuery 1.0 and XPath 2.0 Functions and Operators (Second Edition), which is derived from Perl. It discusses five operators using this regular expression syntax: LIKE_REGEX predicate, to determine the existence of a match to a regular expression. OCCURRENCES_REGEX numeric function, to determine the number of matches to a regular expression. POSITION_REGEX function, to determine the position of a match. SUBSTRING_REGEX function, to extract a substring matching a regular expression. TRANSLATE_REGEX function, to perform replacements using a regular expression.

Technologies de l'information — Langages de base de données — Rapport techniques SQL — Partie 1: Support d'expressions régulières de XQuery en SQL

General Information

Status
Withdrawn
Publication Date
05-Jul-2011
Withdrawal Date
05-Jul-2011
Current Stage
9599 - Withdrawal of International Standard
Start Date
31-Aug-2021
Completion Date
31-Aug-2021
Ref Project

RELATIONS

Buy Standard

Technical report
ISO/IEC TR 19075-1:2011 - Information technology -- Database languages -- SQL Technical Reports
English language
23 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

TECHNICAL ISO/IEC
REPORT TR
19075-1
First edition
2011-07-15
Information technology — Database
languages — SQL Technical Reports —
Part 1:
XQuery Regular Expression Support in
SQL
Technologies de l'information — Langages de base de données —
Rapport techniques SQL —
Partie 1: Support d'expressions régulières de XQuery en SQL
Reference number
ISO/IEC TR 19075-1:2011(E)
ISO/IEC 2011
---------------------- Page: 1 ----------------------
ISO/IEC TR 19075-1:2011(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2011

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,

electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or

ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2011 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC TR 19075-1:2011(E)
Contents Page

Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

1 Scope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 XQuery regular expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Matching a specific character. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Metacharacters and escape sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Dot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Anchors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 Line terminators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

2.6 Bracket expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.6.1 Listing characters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.6.2 Matching a range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.6.3 Negation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.6.4 Character class subtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.7 Alternation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.8 Quantifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.9 Locating a match. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.10 Capture and back-reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.11 Precedence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.12 Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Operators using regular expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 LIKE_REGEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 OCCURRENCES_REGEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 POSITION_REGEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 SUBSTRING_REGEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5 TRANSLATE_REGEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

©ISO/IEC 2011 – All rights reserved Contents iii
---------------------- Page: 3 ----------------------
ISO/IEC TR 19075-1:2011(E)
(Blank page)

iv XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC TR 19075-1:2011(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission)

form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC

participate in the development of International Standards through technical committees established by the

respective organization to deal with particular fields of technical activity. ISO and IEC technical committees

collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental,

in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC

have established a joint technical committee, ISO/IEC JTC 1.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of the joint technical committee is to prepare International Standards. Draft International Standards

adopted by the joint technical committee are circulated to national bodies for voting. Publication as an Interna-

tional Standard requires approval by at least 75 % of the national bodies casting a vote.

In exceptional circumstances, when the joint technical committee has collected data of a different kind from

that which is normally published as an International Standard (“state of the art”, for example), it may decide

to publish a Technical Report. A Technical Report is entirely informative in nature and shall be subject to

review every five years in the same manner as an International Standard.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent

rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.

ISO/IEC TR 19075-1 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,

Subcommittee SC 32, Data management and interchange.

ISO/IEC TR 19075 consists of the following parts, under the general title Information technology — Database

languages — SQL Technical Reports:
— Part 1: XQuery Regular Expression Support in SQL
©ISO/IEC 2011 – All rights reserved Foreword v
---------------------- Page: 5 ----------------------
ISO/IEC TR 19075-1:2011(E)
Introduction
The organization of this part of ISO/IEC TR 19075 is as follows:
1) Clause 1, “Scope”, specifies the scope of this part of ISO/IEC TR 19075.

2) Clause 2, “XQuery regular expressions”, explains how XQuery regular expressions are formed.

3) Clause 3, “Operators using regular expressions”, explains how the SQL operators use regular expressions.

vi XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 6 ----------------------
TECHNICAL REPORT ISO/IEC TR 19075-1:2011(E)
Information technology — Database languages — SQL Technical Reports —
Part 1:
XQuery Regular Expression Support in SQL
1 Scope

This Technical Report describes the regular expression support in SQL adopted from the regular expression

syntax of [XQuery F&O], which is derived from Perl. This Technical Report discusses five operators using

this regular expression syntax:

— LIKE_REGEX predicate, to determine the existence of a match to a regular expression.

— OCCURRENCES_REGEX numeric function, to determine the number of matches to a regular expression.

— POSITION_REGEX function, to determine the position of a match.

— SUBSTRING_REGEX function, to extract a substring matching a regular expression.

— TRANSLATE_REGEX function, to perform replacements using a regular expression.
©ISO/IEC 2011 – All rights reserved Scope 1
---------------------- Page: 7 ----------------------
ISO/IEC TR 19075-1:2011(E)
(Blank page)

2 XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.1 Matching a specific character
2 XQuery regular expressions

XQuery regular expression syntax is specified in [XQuery F&O], section 7.6.1, “Regular expression syntax”.

This paper references the XQuery specification, with two small modifications (required since character strings

in an RDBMS are not necessarily normalized according to XML conventions). The following subsections

provide an overview of this syntax.

The XQuery regular expression syntax is itself a modification of another regular expression syntax found in

[XML Schema: Datatypes].

This section presents an overview of the capabilities of XQuery regular expression syntax. In the process, this

section will illustrate some of the SQL operators. The SQL operators themselves are presented in the next

section.

The following discussion does not cover every aspect of XQuery regular expressions; for this, [XQuery F&O]

is the reference (though hardly a tutorial; a variety of popular works contain detailed treatments of regular

expressions).
2.1 Matching a specific character

Perhaps the most elementary pattern matching requirement is the ability to match a single character or string.

For most characters, this is done by simply writing the character in the regular expression. For example, suppose

you want to know if a string S contains the letters “xyz”. This could be done with the following predicate:

S LIKE_REGEX 'xyz'

Note that the SQL LIKE predicate would require an exact match for “xyz”. However, the convention with

regular expressions is that S need only contain a substring that is “xyz”. For example, all of the following

values of S would yield True for the predicate above:
xyz
abcxyz123
1 xyz 2 xyz 3 xyz

Note that in the last example, there are actually three occurrences of the regular expression “xyz” within the

tested value. The user may wish to know the number of occurrences of a match. This can be done with

OCCURRENCES_REGEX. For example:
OCCURRENCES_REGEX ('xyz' IN '1 xyz 2 xyz 3 xyz') = 3

The user might also wish to know the position of a specific match. This can be done using POSITION_REGEX.

For example, to learn the starting character position of the second occurrence,
POSITION_REGEX ( 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 9

It is also possible to ask for the character position of the first character after the match. For example:

©ISO/IEC 2011 – All rights reserved XQuery regular expressions 3
---------------------- Page: 9 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.1 Matching a specific character
POSITION_REGEX ( AFTER 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 12

If AFTER is used and the last character of the subject string is consumed, then the result is the length of the

string plus 1 (one):
POSITION_REGEX ( AFTER 'xyz' IN 'xyz' ) = 4
2.2 Metacharacters and escape sequences

As mentioned, most characters can be matched by simply writing the character in the regular expression.

However, certain characters are reserved as metacharacters. The complete list of metacharacters is:

. \ ? * + { } ( ) | [ ] ^ $

The use of each of these metacharacters will be explained later. If you want to match a metacharacter, then you

need to use an escape sequence, consisting of a backslash (“/”) followed by the metacharacter. For example,

to test whether a string contains a dollar sign, you could write
S LIKE_REGEX '\$'

In particular, the escape sequence representing a backslash is two consecutive backslashes. There are various

other defined escape sequences, matching either a single character, or any of a group of characters. The single

character escape sequences are:
\n newline (U+000A)
\r return (U+000D)
\t tab (U+0009)
\- minus sign ('-')

The so-called category escapes are exemplified by “\p{L}” or “\p{Lu}”. A category escape begins with

“\p{” followed by one uppercase letter, optionally a lowercase letter, and then the closing brace. In these

example, “\p{L}” matches any letter (as defined by Unicode) and “\p{Lu}” matches any uppercase letter.

Some interesting category escapes are listed below:
\p{L} Any letter.
\p{Lu} Any uppercase letter.
\p{Ll} Any lowercase letter.
\p{Nd} Any decimal digit.
\p{P} Any punctuation mark.
\p{Z} Any separator (space, line, paragraph, etc.).

The complete list of category escapes is found in [XML Schema: Datatypes], section F.1.1, “Character class

escapes”.

There are also complementary category escapes, which are exemplified by “\P{L}” or “\P{Lu}”. A comple-

mentary category escape matches any character that would not be matched by the corresponding category

4 XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.2 Metacharacters and escape sequences

escape. The difference is that the (positive) character escape is written with a lowercase “p” whereas the com-

plementary character escape is written with an uppercase “P”.

The so-called block escapes match any character in a block of Unicode, that is, a predefined consecutive range

of code points. For example, “\p{IsBasicLatin}” matches the ASCII character set. There are also com-

plementary block escapes, such as “\P{IsBasicLatin}”, which matches any single character that is not

an ASCII character.
Finally, there are the following multi-character escape sequences:

\s As defined by [XML Schema: Datatypes], this escape matches space (U+0020), tab (U+0009),

newline (U+000A), or return (U+000D). Since character strings in an RDBMS have not undergone

XML line termination normalization, we broaden it to include any character or two-character

sequence that is recognized by [Unicode18] as a line terminator. Subclause 2.5, “Line terminators”,

discusses this issue further.
\S Any single character not matched by \s.

\i Underscore (“_”), colon (“:”) or letter (this is a lot more than just the Latin letters; see [XML 1.0]

appendix B, rule [84]).
\I Any single character not matched by \i.

\c Any single character matched by NameChar, as defined in [XML 1.0] section 2.3, rule [4].

\C Any single character not matched by \c.
\d Any single digit
\D Any single character not matched by \d.

\w Any single Unicode character except those classified as “punctuation”, “separator”, or “other”.

\W The complement of \w.
2.3 Dot

Dot (period, “.”) is a metacharacter that is used to match any single character (the same behavior as “_” in

LIKE predicates), or any single character that is not a line terminator. The default is to match anything except

a line terminator. The alternative, called dot-all mode, is specified using a flag that contains a lowercase “s”.

For example
S LIKE_REGEX 'a.b'
matches the following:
'xa0by'
but not the following:
'xa
by'

because the character between the “a” and the “b” is a line terminator. However, using dot-all mode like this:

©ISO/IEC 2011 – All rights reserved XQuery regular expressions 5
---------------------- Page: 11 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.3 Dot
S LIKE_REGEX 'a.b' FLAG 's'
would match both examples.
2.4 Anchors

We have seen that regular expressions look for a match anywhere within a string, without needing to match

the entire string. But what if you want to require a match of the entire string? For this, you can use anchors.

The anchors are the metacharacters “^” for the start of a string (or line), and “$” for the end of a string (or line).

For example:
S LIKE_REGEX '^xyz$'
can only match a string that is precisely 'xyz'.

Anchors may be used separately to require a “begins with” or “end with” match. For example

S LIKE_REGEX '^xyz'
matches any string that begins with “xyz”, and
S LIKE_REGEX 'xyz$'
matches any string that ends with “xyz”.

Instead of matching the begin or end of the string, the anchors may be used to anchor a match to the begin or

end of a line, by performing the match in multi-line mode. Multi-line mode is specified using a flag containing

a lowercase “m”. For example:
S LIKE_REGEX '^xyz' FLAG 'm'

performs an anchored search in multi-line mode, matching any string containing a line that begins with “xyz”.

The example above would match the following string:
'line one
xyz
line three'
2.5 Line terminators

The metacharacters “.”, “^”, and “$” and the multi-character escape sequences “\s” and “\S” are defined in

terms of a “line terminator”. What counts as a line terminator? [XQuery F&O] only recognizes a line feed

(U+000A) as a line terminator. This definition works well for XQuery, because XML normalizes the line ter-

minators on various platforms to a line feed.

A closer look shows that XML has two definitions of line handling, in section 2.11, “End-of-line handling”,

of [XML 1.0] and [XML 1.1]. So which should we use for SQL?

A first stop in answering this is to look at [SQL/XML WD] Subclause 6.17, “”, which requires

XML 1.0 as a basic level of support, and permits XML 1.1 support in the form of Feature X211, “XML 1.1

6 XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.5 Line terminators

support”. So, we might specify that the character string is normalized according to either XML 1.0 or XML

1.1 as an implementation-defined choice, or perhaps via a conformance feature.

However, some of the line terminators, even in XML 1.0, are two-character sequences. XML normalizes its

input, which means that such two-character sequences are converted to a single character. This changes the

relative position of every subsequent character, which would cause unexpected results for POSITION_REGEX.

Our solution is to look to [Unicode18], a Unicode standard containing guidelines for regular expression proces-

sors. This provides a referenceable definition of line terminator that does not require normalizing the subject

character string.
2.6 Bracket expressions

So far, we have seen how to match a specific character, or any character from certain predefined sets of characters.

Using bracket expressions, you can specify your own group of characters. (XML Schema and XQuery call

these character class expressions, but the term bracket expression is in common use.)

A bracket expression is begun by a left bracket “[” and terminated by a right bracket “]”. Bracket expressions

have a different list of special characters, namely
^ [ ] \

For clarity, we will call these special characters, in contrast to the metacharacters listed earlier.

2.6.1 Listing characters

If a bracket expression does not contain any of the special characters, then the bracket expression matches any

single character that is listed between the brackets. For example,
S LIKE_REGEX '[abc]'
matches any of the following:
'say'
'boy'
'lack'

All backslash escape sequences are available for use within a bracket expression. For example, to match either

a caret or a backslash, you can use
S LIKE_REGEX '[\^\\]'
To match all letters or digits, one might use
S LIKE_REGEX '[\p{L}\p{Nd}]'

where “\p{L}” is the escape matching any letter and “\p{Nd}” is the escape matching any digit.

©ISO/IEC 2011 – All rights reserved XQuery regular expressions 7
---------------------- Page: 13 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.6 Bracket expressions
2.6.2 Matching a range
A minus sign “-” is used to specify a character range. For example:
S LIKE_REGEX '[sa-my]'

matches the lowercase letters “s”, all the letters between “a” and “m” inclusive, and “y”. Ranges are defined

in terms of the UCS code point ordering. When there are multiple ranges, the bracket expression matches the

union of the ranges. For example:
S LIKE_REGEX '[a-me-z]'
matches all lowercase letters.

Using a special character in a range is sometimes permitted, but tricky. Rather than present the rules here, our

advice is to use a backslash escape if the start or end point of a range must be a special character.

2.6.3 Negation

A caret “^” is a special character when it is the first character of a bracket expression, where it indicates that

the set of characters is anything not listed by the following bracket expression. For example:

S LIKE_REGEX '[^aj-m]'
is True if S contains any character that is not “a”, “j”, “k”, “l”, or “m”.
2.6.4 Character class subtraction

A bracket expression may conclude with a minus sign “-” followed by a nested bracket expression. This is

called a character class subtraction, and indicates that any character matched by the nested bracket expression

is to be removed from the set of characters that might be a match. For example:
S LIKE_REGEX '[a-z-[m-p]]'

matches anything between “a” and “z”, except for the letters between “m” and “p”, inclusive. This example

is equivalent to:
S LIKE_REGEX '[a-lq-z]'

Seemingly you can nest character class subtractions indefinitely. This concludes the presentation of bracket

expressions.
2.7 Alternation

You can specify a choice of regular expressions using a vertical bar “|”. For example:

8 XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 14 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.7 Alternation
S LIKE_REGEX 'a|b'
is True if S contains either an “a” or a “b”.
Alternation has lower precedence than concatenation. Thus
S LIKE_REGEX 'ab|xyz'

is True if S contains either “ab” or “xyz”. To override this precedence, you can use parentheses, such as this

example:
S LIKE_REGEX 'a(b|xy)z'
The preceding example is True if S contains either “abz” or “axyz”.
2.8 Quantifiers

Quantifiers are metacharacters that specify a match for some number of repetitions of a regular expression.

There are two sets of quantifiers, the greedy and the reluctant. The greedy quantifiers are:

{n} Exactly n repetitions.
{n,} n or more repetitions.
{n,m} Between n and m repetitions, inclusive.
? 0 (zero) or 1 (one) repetition; equivalent to {0,1}.
* 0 (zero) or more repetitions; equivalent to {0,}.
+ 1 (one) or more repetitions; equivalent to {1,}.

The reluctant quantifiers are formed by suffixing a question mark to a greedy quantifier. Thus, “*?” is the

reluctant form of “*”, and “??” is the reluctant form of “?”. The greedy quantifiers try to match as much as

possible, whereas the reluctant quantifiers try to match as little as possible (while still allowing the overall

regular expression to match). There is no difference in behavior between the greedy and reluctant quantifiers

for LIKE_REGEX. We will look at this difference for the other operators shortly.
Examples:
S LIKE_REGEX 'a{3}'
is equivalent to
S LIKE_REGEX 'aaa'

and matches any string containing at least three consecutive instances of “a”. Note that if S contains more than

three consecutive instances of “a”, it still matches; to test whether S contains a substring of three consecutive

instances of “a” and no more is a lot harder, since you have to also require something other than an “a” at both

ends of the substring.
S LIKE_REGEX 'ab+c'
is equivalent to
©ISO/IEC 2011 – All rights reserved XQuery regular expressions 9
---------------------- Page: 15 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.8 Quantifiers
S LIKE_REGEX 'ab{1,}c'

and matches any string that contains a substring consisting of an “a”, one or more “b”s, and then a “c”.

2.9 Locating a match

LIKE_REGEX only cares whether a match exists; the other operators care about where a match is located in

the string. Consider the regular expression “a+” and the string “"a1aa2aaa3”. There are ten possible matches

for “a+”, indicated by the underlining on the following lines:
'a1aa2aaa3' -- position 1, length 1
'a1aa2aaa3' -- position 3, length 1
'a1aa2aaa3' -- position 3, length 2
'a1aa2aaa3' -- position 4, length 1
'a1aa2aaa3' -- position 6, length 1
'a1aa2aaa3' -- position 6, length 2
'a1aa2aaa3' -- position 6, length 3
'a1aa2aaa3' -- position 7, length 1
'a1aa2aaa3' -- position 8, length 2
'a1aa2aaa3' -- position 9, length 1

Notice that some of the matches are substrings of other matches. The rules of XQuery regular expressions are

designed to ignore certain matches, so that the recognized matches are mutually disjoint. Obviously there are

many ways to do this, so the rules provide priorities in determining the recognized matches. There are three

priorities:

1) The top priority is to find a match as early in the string as possible. This is commonly called the leftmost

rule.

2) The second priority is to find the first alternative of an alternation, if possible. We are unaware of a common

name for this rule.

3) The last priority is to find the longest possible match for greedy quantifiers, and the shortest match for

reluctant quantifiers. In the case of greedy quantifiers, this is commonly called the longest rule; we are

unaware of a common name for the rule regarding reluctant quantifiers.

[Historical note: POSIX only has a leftmost longest rule. There were no reluctant quantifiers, and the priority

for matching alternations was the longest match rather than the first alternative.]

These rules will be illustrated by examples:
Subject regular match(es) priority
string expression underlined
baaaaaa ba|a* baaaaaa leftmost (even though baaaaaa would be longer);
baaaaaa second match must start after the first match
ab a|ab ab first alternative (rather than matching ab)

10 XQuery Regular Expression Support in SQL/Foundation ©ISO/IEC 2011 – All rights reserved

---------------------- Page: 16 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.9 Locating a match
Subject regular match(es) priority
string expression underlined
abcabbabc ab* abcabbabc leftmost
abcabbabc longest (greedy quantifier consumes two "b"s)
abcabbabc longest
abcabbabc ab*? abcabbabc shortest (no need to match "b")
abcabbabc shortest
abcabbabc shortest
2.10 Capture and back-reference

A parenthesized sub-expression is a portion of a regular expression that is enclosed in parentheses. Parenthesized

sub-expressions are numbered in order of their left parenthesis. For example, in the regular expression

((a)|(b))
there are three sub-expressions:
1) ((a)|(b))
2) (a)
3) (b)

A sub-expression can be referenced later in a regular expression using a back-reference, taking the form of a

backslash followed by one or more digits. Thus the three sub-expressions in the example can be referenced as

“\1”, “\2”, and “\3”. For example, consider the regular expression:
\p{Z}(\p{L}*)\p{Z}*\1\p{Z}

The first and only parenthesized sub-expression (“\p{L}*”) matches any sequence of letters that is bounded

by some kind of space character (“\p{Z}”) before and after the sequence of letters. The back-reference (“\1”)

matches whatever sequence of letters was captured by the first sub-expression. This regular expression might

be used to search for occurrences of a repeated word (perhaps caused by a cut-and-paste error). Here is an

example of a subject string, with underlining to indicate the match for the entire regular expression:

Hello Dolly you're looking looking swell

When a back-reference references a parenthesized group with a quantifier, then the back-reference matches the

last iteration of the quantified sub-expression. For example, consider the regular expression:

'(ab*)*c*\1'
and the subject string:
'abbbabbabcabbbbb'
The matches to “(ab*)” are shown by underlining below:
'abbbabbabcabbbbb'
©ISO/IEC 2011 – All rights reserved XQuery regular expressions 11
---------------------- Page: 17 ----------------------
ISO/IEC TR 19075-1:2011(E)
2.10 Capture and back-reference
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'

These three iterations of “(ab*)” are matched by “(ab*)*” and then the “c” is matched. Next, we need to

match “\1”. The last match for the first parenthesized sub-expression is “ab”, so the overall match is indicate

by underlining below:
'abbbabbabcabbbbb'

In the event that a sub-expression is unmatched, a back-reference to it matches the zero-length string. For

example, consider the regular
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.