Information technology — Guidance for the use of database language SQL — Part 1: XQuery regular expressions

Technologies de l'information — Recommandations pour l'utilisation du langage de base de données SQL — Partie 1: Expressions régulières de XQuery en SQL

General Information

Status
Published
Publication Date
30-Aug-2021
Current Stage
6060 - International Standard published
Start Date
31-Aug-2021
Completion Date
31-Aug-2021
Ref Project

RELATIONS

Buy Standard

Standard
ISO/IEC 19075-1:2021 - Information technology -- Guidance for the use of database language SQL
English language
25 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/IEC PRF 19075-1:Version 10-jul-2021 - Information technology -- Guidance for the use of database language SQL
English language
25 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

INTERNATIONAL ISO/IEC
STANDARD 19075-1
First edition
2021-08
Information technology — Guidance
for the use of database language
SQL —
Part 1:
XQuery regular expressions
Technologies de l'information — Recommandations pour l'utilisation
du langage de base de données SQL —
Partie 1: Expressions régulières de XQuery en SQL
Reference number
ISO/IEC 19075-1:2021(E)
ISO/IEC 2021
---------------------- Page: 1 ----------------------
ISO/IEC 19075-1:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 19075-1:2021(E)
Contents Page

Foreword......................................................................................v

Introduction..................................................................................vii

1 Scope....................................................................................1

2 Normativereferences....................................................................2

3 Termsanddefinitions....................................................................3

4 XQueryregularexpressions..............................................................4

4.1 ContextofXQueryregularexpressions......................................................4

4.2 IntroductiontoXQueryregularexpressions.................................................4

4.3 Matchingaspecificcharacter..............................................................4

4.4 Metacharactersandescapesequences......................................................5

4.5 Dot...................................................................................6

4.6 Anchors...............................................................................7

4.7 Lineterminators........................................................................7

4.8 Bracket expressions.....................................................................8

4.8.1 Introductiontobracketexpressions........................................................8

4.8.2 Listing characters.......................................................................8

4.8.3 Matchingarange........................................................................9

4.8.4 Negation...............................................................................9

4.8.5 Characterclasssubtraction...............................................................9

4.9 Alternation.............................................................................9

4.10 Quantifiers............................................................................10

4.11 Locatingamatch.......................................................................11

4.12 Captureandback-reference..............................................................12

4.13 Precedence...........................................................................13

4.14 Modes................................................................................13

5 Operatorsusingregularexpressions....................................................15

5.1 Introductiontooperatorsusingregularexpressions.........................................15

5.2 LIKE_REGEX..........................................................................15

5.3 OCCURRENCES_REGEX..................................................................16

5.4 POSITION_REGEX......................................................................17

5.5 SUBSTRING_REGEX....................................................................19

5.6 TRANSLATE_REGEX....................................................................20

Bibliography................................................................................23

Index........................................................................................24

© ISO/IEC 2021 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 19075-1:2021(E)
Tables
Table Page

1 Match priorities...........................................................................11

iv © ISO/IEC 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 19075-1:2021(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that are

members of ISO or IEC participate in the development of International Standards through technical

committees established by the respective organization to deal with particular fields of technical activity.

ISO and IEC technical committees collaborate in fields of mutual interest. Other international

organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the

work.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of document should be noted. This document was drafted in accordance with the editorial

rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details

of any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent

declarations received (see patents.iec.ch).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the World

Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see

www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.

This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,

Subcommittee SC 32, Data management and interchange.

This first edition of ISO/IEC 19075-1 cancels and replaces ISO/IEC TR 19075-1:2011.

This document is intended to be used in conjunction with the following editions of the parts of the

ISO/IEC 9075 series:
— ISO/IEC 9075-1, sixth edition or later;
— ISO/IEC 9075-2, sixth edition or later;
— ISO/IEC 9075-3, sixth edition or later;
— ISO/IEC 9075-4, seventh edition or later;
— ISO/IEC 9075-9, fifth edition or later;
— ISO/IEC 9075-10, fifth edition or later;
— ISO/IEC 9075-11, fifth edition or later;
— ISO/IEC 9075-13, fifth edition or later;
© ISO/IEC 2021 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/IEC 19075-1:2021(E)
— ISO/IEC 9075-14, sixth edition or later;
— ISO/IEC 9075-15, second edition or later;
— ISO/IEC 9075-16, first edition or later.

A list of all parts in the ISO/IEC 19075 series can be found on the ISO and IEC websites.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-

committees.
© ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/IEC 19075-1:2021(E)
Introduction
Theorganizationofthisdocumentisasfollows:
1) Clause1,“Scope”,specifiesthescopeofthisdocument.

2) Clause2,“Normativereferences”,identifiesadditionalstandardsthat,throughreferenceinthis

document,constituteprovisionsofthisdocument.

3) Clause3,“Termsanddefinitions”,definesthetermsanddefinitionsusedinthisdocument.

4) Clause4,“XQueryregularexpressions”,explainshowXQueryregularexpressionsareformed.

5) Clause5,“Operatorsusingregularexpressions”,explainshowtheSQLoperatorsuseregular

expressions.
© ISO/IEC 2021 – All rights reserved vii
---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19075-1:2021(E)
Informationtechnology—GuidancefortheuseofdatabaselanguageSQL—
Part1:
XQueryregularexpressions
1 Scope

ThisdocumentdescribestheregularexpressionsupportinSQL(ISO/IEC9075-2)adoptedfromthe

regularexpressionsyntaxofXQueryandXPathFunctionsandOperators3.1,whichisderivedfromPerl.

Thisdocumentdiscussesfiveoperatorsusingthisregularexpressionsyntax:
— LIKE_REGEXpredicate,todeterminetheexistenceofamatchtoaregularexpression.

— OCCURRENCES_REGEXnumericfunction,todeterminethenumberofmatchestoaregularexpression.

— POSITION_REGEXfunction,todeterminethepositionofamatch.
— SUBSTRING_REGEXfunction,toextractasubstringmatchingaregularexpression.
— TRANSLATE_REGEXfunction,toperformreplacementsusingaregularexpression.
© ISO/IEC 2021 – All rights reserved 1
---------------------- Page: 8 ----------------------
ISO/IEC 19075-1:2021(E)
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content consti-

tutes requirements of this document. For dated references, only the edition cited applies. For undated

references, the latest edition of the referenced document (including any amendments) applies.

ISO/IEC9075-1,Informationtechnology—Databaselanguages—SQL—Part1:Framework
(SQL/Framework)
ISO/IEC9075-2,Informationtechnology—Databaselanguages—SQL—Part2:Foundation
(SQL/Foundation)
ISO/IEC9075-14,Informationtechnology—Databaselanguages—SQL—Part14:XML-Related
Specifications(SQL/XML)

Bray,Timetal..ExtensibleMarkupLanguage(XML)Version1.0,W3CRecommendation[online].Fifth

Edition.Cambridge,Massachusetts,USA:W3C,26November2008.Availableat
http://www.w3.org/TR/xml

Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.1,W3CRecommendation[online].Second

Edition.Cambridge,Massachusetts,USA:W3C,16August2006.Availableat
http://www.w3.org/TR/xml11

Biron,PaulV.;Malhotra,Ashok.XMLSchemaPart2:Datatypes,W3CRecommendation[online].Second

Edition.Cambridge,Massachusetts,USA:W3C,28October2004.Availableat
http://www.w3.org/TR/xmlschema-2/

Malhotra,Ashoketal..XQueryandXPathFunctionsandOperators3,1,W3CRecommendation[online].

Cambridge,Massachusetts,USA:W3C,21March2017.Availableathttp://www.w3.org/TR/-
xpath-functions/

TheUnicodeConsortium.UnicodeRegularExpressions[online].21.MountainView,California,USA:

TheUnicodeConsortium,2020-06-17.Availableathttp://www.unicode.org/reports/tr18/-
tr18-21.html
2 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC 19075-1:2021(E)
3 Termsanddefinitions
Forthepurposesofthisdocument,thetermsanddefinitionsgiveninISO/IEC9075-1apply.

ISOandIECmaintainterminologicaldatabasesforuseinstandardizationatthefollowingaddresses:

— IECElectropedia:availableathttp://www.electropedia.org/
— ISOOnlinebrowsingplatform:availableathttp://www.iso.org/obp
© ISO/IEC 2021 – All rights reserved 3
---------------------- Page: 10 ----------------------
ISO/IEC 19075-1:2021(E)
4 XQueryregularexpressions
4.1 ContextofXQueryregularexpressions

TherequirementsforthematerialdiscussedinthisdocumentshallbeasspecifiedinISO/IEC9075-2,

ISO/IEC9075-14,XML1.0,XML1.1,XMLSchemaPart2:Datatypes,XQueryandXPathFunctionsand
Operators3.1,andUnicodeTechnicalStandard#18.
4.2 IntroductiontoXQueryregularexpressions

ThisdocumentexplainsthemannerinwhichXQueryregularexpressionsareusedbydatabaselanguage

SQLinISO/IEC9075-2andinISO/IEC9075-14.BothISO/IEC9075-2andISO/IEC9075-14specify
requirementsforthematerialdiscussedinthisdocument.

XQueryregularexpressionsyntaxisspecifiedinXQueryandXPathFunctionsandOperators3.1,section

5.6.1,“Regularexpressionsyntax”.ThispaperreferencestheXQueryspecification,withtwosmallmodi-

fications(requiredsincecharacterstringsinanRDBMSarenotnecessarilynormalizedaccordingtoXML

conventions).Thefollowingsubsectionsprovideanoverviewofthissyntax.

TheXQueryregularexpressionsyntaxisitselfamodificationofanotherregularexpressionsyntaxfound

inXMLSchemaPart2:Datatypes.

ThissectionpresentsanoverviewofthecapabilitiesofXQueryregularexpressionsyntax.Intheprocess,

thissectionwillillustratesomeoftheSQLoperators.TheSQLoperatorsthemselvesarepresentedin

Clause5,“Operatorsusingregularexpressions”.

ThefollowingdiscussiondoesnotcovereveryaspectofXQueryregularexpressions;forthis,XQuery

andXPathFunctionsandOperators3.1isthereference(thoughhardlyatutorial;avarietyofpopular

workscontaindetailedtreatmentsofregularexpressions).
4.3 Matchingaspecificcharacter

Perhapsthemostelementarypatternmatchingrequirementistheabilitytomatchasinglecharacteror

string.Formostcharacters,thisisdonebysimplywritingthecharacterintheregularexpression.For

example,supposeanapplicationneedstoknowifastringScontainstheletters“xyz”.Thiscouldbedone

withthefollowingpredicate:
S LIKE_REGEX 'xyz'

NotethattheSQLLIKEpredicatewouldrequireanexactmatchfor“xyz”.However,theconventionwith

regularexpressionsisthatSneedonlycontainasubstringthatis“xyz”.Forexample,allofthefollowing

valuesofSwouldyieldTruefortheimmediatelyprecedingpredicate:
xyz
abcxyz123
1 xyz 2 xyz 3 xyz
4 © ISO/IEC 2021– All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC 19075-1:2021(E)
4.3 Matching a specific character

Notethatinthelastexample,thereareactuallythreeoccurrencesoftheregularexpression“xyz”within

thetestedvalue.Theusermaywishtoknowthenumberofoccurrencesofamatch.Thiscanbedone
withOCCURRENCES_REGEX.Forexample:
OCCURRENCES_REGEX ('xyz' IN '1 xyz 2 xyz 3 xyz') = 3

Theusermightalsowishtoknowthepositionofaspecificmatch.ThiscanbedoneusingPOSITION_REGEX.

Forexample,tolearnthestartingcharacterpositionofthesecondoccurrence,
POSITION_REGEX ( 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 9

Itisalsopossibletoaskforthecharacterpositionofthefirstcharacterafterthematch.Forexample:

POSITION_REGEX ( AFTER 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 12

IfAFTERisusedandthelastcharacterofthesubjectstringisconsumed,thentheresultisthelengthof

thestringplus1(one):
POSITION_REGEX ( AFTER 'xyz' IN 'xyz' ) = 4
4.4 Metacharactersandescapesequences

Asmentioned,mostcharacterscanbematchedbysimplywritingthecharacterintheregularexpression.

However,certaincharactersarereservedasmetacharacters.Thecompletelistofmetacharactersis:

. \ ? * + { } ( ) | [ ] ^ $

TheuseofeachofthesemetacharactersisexplainedinsubsequentSubclauses.Tomatchametacharacter,

thenanescapesequence,consistingofabackslash(“\”)followedbythemetacharacterisused.For

example,thisexpressiontestswhetherastringcontainsadollarsign:
S LIKE_REGEX '\$'

Inparticular,theescapesequencerepresentingabackslashistwoconsecutivebackslashes.Thereare

variousotherdefinedescapesequences,matchingeitherasinglecharacter,oranyofagroupofcharacters.

Thesinglecharacterescapesequencesare:
\n newline(U+000A)
\r return(U+000D)
\t tab(U+0009)
\- minussign('-')

Theso-calledcategoryescapesareexemplifiedby“\p{L}”or“\p{Lu}”.Acategoryescapebeginswith

“\p{”followedbyoneuppercaseletter,optionallyalowercaseletter,andthentheclosingbrace.Inthese

example,“\p{L}”matchesanyletter(asdefinedbyUnicode)and“\p{Lu}”matchesanyuppercaseletter.

Someinterestingcategoryescapesare:
\p{L} Anyletter.
\p{Lu} Anyuppercaseletter.
\p{Ll} Anylowercaseletter.
\p{Nd} Anydecimaldigit.
\p{P} Anypunctuationmark.
© ISO/IEC 2021 – All rights reserved 5
---------------------- Page: 12 ----------------------
ISO/IEC 19075-1:2021(E)
4.4 Metacharacters and escape sequences
\p{Z} Any separator (space, line, paragraph, etc.).

The complete list of category escapes is found in XML Schema Part 2: Datatypes, section F.1.1, “Character

class escapes”.

There are also complementary category escapes, which are exemplified by “\P{L}” or “\P{Lu}”. A comple-

mentary category escape matches any character that would not be matched by the corresponding category

escape. The difference is that the (positive) character escape is written with a lowercase “p” whereas the

complementary character escape is written with an uppercase “P”.

The so-called block escapes match any character in a block of Unicode, that is, a predefined consecutive

range of code points. For example, “\p{IsBasicLatin}” matches the ASCII character set. There are also

complementary block escapes, such as “\P{IsBasicLatin}”, which matches any single character that is

not an ASCII character.
Finally, there are the following multi-character escape sequences:
\s AsdefinedbyXMLSchemaPart2:Datatypes,thisescapematchesspace(U+0020),tab
(U+0009),newline(U+000A),orreturn(U+000D).SincecharacterstringsinanRDBMShave
notundergoneXMLlineterminationnormalization,theescapeisbroadenedtoincludeany
characterortwo-charactersequencethatisrecognizedbyUnicodeTechnicalStandard#18
asalineterminator.Subclause4.7,“Lineterminators”,discussesthisissuefurther.
\S Anysinglecharacternotmatchedby\s.

\i Underscore(“_”),colon(“:”)orletter(thisisalotmorethanjusttheLatinletters;seeXML

1.0appendixB,rule[84]).
\I Anysinglecharacternotmatchedby\i.
\c AnysinglecharactermatchedbyNameChar,asdefinedinXML1.0section2.3,rule[4a].
\C Anysinglecharacternotmatchedby\c.
\d Anysingledigit
\D Anysinglecharacternotmatchedby\d.

\w AnysingleUnicodecharacterexceptthoseclassifiedas“punctuation”,“separator”,or“other”.

\W Thecomplementof\w.
4.5 Dot

Dot(period,“.”)isametacharacterthatisusedtomatchanysinglecharacter(thesamebehavioras“_”

inLIKEpredicates),oranysinglecharacterthatisnotalineterminator.Thedefaultistomatchanything

exceptalineterminator.Thealternative,calleddot-allmode,isspecifiedusingaflagthatcontainsa

lowercase“s”.
Forexample
S LIKE_REGEX 'a.b'
matchesthefollowing:
'xa0by'
butnotthefollowing:
6 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 13 ----------------------
ISO/IEC 19075-1:2021(E)
4.5 Dot
'xa
by'

becausethecharacterbetweenthe“a”andthe“b”isalineterminator.However,usingdot-allmodelike

this:
S LIKE_REGEX 'a.b' FLAG 's'
wouldmatchbothexamples.
4.6 Anchors

AsdiscussedinthisClause,regularexpressionslookforamatchanywherewithinastring,without

needingtomatchtheentirestring.Buthowcananapplicationrequireamatchoftheentirestring?For

this,theapplicationusesanchors.Theanchorsarethemetacharacters“^”forthestartofastring(or

line),and“$”fortheendofastring(orline).Forexample:
S LIKE_REGEX '^xyz$'
canonlymatchastringthatisprecisely'xyz'.
Anchorsmaybeusedseparatelytorequirea“beginswith”or“endwith”match.Forexample
S LIKE_REGEX '^xyz'
matchesanystringthatbeginswith“xyz”,and
S LIKE_REGEX 'xyz$'
matchesanystringthatendswith“xyz”.
Insteadofmatchingthebeginorendofthestring,theanchorsmaybeusedtoanchoramatchtothe

beginorendofaline,byperformingthematchinmulti-linemode.Multi-linemodeisspecifiedusinga

flagcontainingalowercase“m”.Forexample:
S LIKE_REGEX '^xyz' FLAG 'm'

performsananchoredsearchinmulti-linemode,matchinganystringcontainingalinethatbeginswith

“xyz”.Theimmediatelyprecedingexamplewouldmatchthefollowingstring:
'line one
xyz
line three'
4.7 Lineterminators

Themetacharacters“.”,“^”,and“$”andthemulti-characterescapesequences“\s”and“\S”aredefined

intermsofa“lineterminator”.Whatcountsasalineterminator?XQueryandXPathFunctionsand

Operators3.1onlyrecognizesalinefeed(U+000A)asalineterminator.Thisdefinitionworkswellfor

XQuery,becauseXMLnormalizesthelineterminatorsonvariousplatformstoalinefeed.

AcloserlookshowsthatXMLhastwodefinitionsoflinehandling,insection2.11,“End-of-linehandling”,

ofXML1.0andXML1.1.SowhichshouldbeusedforSQL?
© ISO/IEC 2021 – All rights reserved 7
---------------------- Page: 14 ----------------------
ISO/IEC 19075-1:2021(E)
4.7 Line terminators

A first stop in answering this is to look at the definition of in ISO/IEC 9075-14, which

requires XML 1.0 as a basic level of support, and permits XML 1.1 support in the form of Feature X211,

“XML 1.1 support”. So, character string can be normalized according to either XML 1.0 or XML 1.1 as an

implementation-defined choice, or perhaps via a conformance feature.

However, some of the line terminators, even in XML 1.0, are two-character sequences. XML normalizes

its input, which means that such two-character sequences are converted to a single character. This changes

the relative position of every subsequent character, which would cause unexpected results for POSI-

TION_REGEX.

Our solution is to look to Unicode Technical Standard #18, a Unicode standard containing guidelines for

regular expression processors. This provides a referenceable definition of line terminator that does not

require normalizing the subject character string.
4.8 Bracket expressions
4.8.1 Introduction to bracket expressions

So far, this Clause has shown how to match a specific character or any character from certain predefined

sets of characters. Using bracket expressions, applications can specify user-defined group of characters.

(XML Schema and XQuery call these character class expressions, but the term bracket expression is in

common use.)

A bracket expression is begun by a left bracket “[” and terminated by a right bracket “]”. Bracket

expressions have a different list of special characters, namely
^ [ ] \

Forclarity,thesearecalledspecialcharacters,incontrasttothemetacharacterslistedearlier.

4.8.2 Listingcharacters

Ifabracketexpressiondoesnotcontainanyofthespecialcharacters,thenthebracketexpressionmatches

anysinglecharacterthatislistedbetweenthebrackets.Forexample,
S LIKE_REGEX '[abc]'
matchesanyofthefollowing:
'say'
'boy'
'lack'

Allbackslashescapesequencesareavailableforusewithinabracketexpression.Forexample,thefollowing

expressionmatcheseitheracaretorabackslash:
S LIKE_REGEX '[\^\\]'
Thefollowingexpressionmatchesalllettersordigits:
S LIKE_REGEX '[\p{L}\p{Nd}]'
where“\p{L}”istheescapematchinganyletterand“\p{Nd}”istheescapematchinganydigit.
8 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 15 ----------------------
ISO/IEC 19075-1:2021(E)
4.8 Bracket expressions
4.8.3 Matchingarange
Aminussign“-”isusedtospecifyacharacterrange.Forexample:
S LIKE_REGEX '[sa-my]'

matchesthelowercaseletters“s”,allthelettersbetween“a”and“m”inclusive,and“y”.Rangesaredefined

intermsoftheUCScodepointordering.Whentherearemultipleranges,thebracketexpressionmatches

theunionoftheranges.Forexample:
S LIKE_REGEX '[a-me-z]'
matchesalllowercaseletters.

Usingaspecialcharacterinarangeissometimespermitted,buttricky.Ratherthanpresenttherules

here,oneshoulduseabackslashescapeifthestartorendpointofarangeisbeaspecialcharacter.

4.8.4 Negation

Acaret“^”isaspecialcharacterwhenitisthefirstcharacterofabracketexpression,whereitindicates

thatthesetofcharactersisanythingnotlistedbythefollowingbracketexpression.Forexample:

S LIKE_REGEX '[^aj-m]'
isTrueifScontainsanycharacterthatisnot“a”,“j”,“k”,“l”,or“m”.
4.8.5 Characterclasssubtraction

Abracketexpressionmayconcludewithaminussign“-”followedbyanestedbracketexpression.This

iscalledacharacterclasssubtraction,andindicatesthatanycharactermatchedbythenestedbracket

expressionistoberemovedfromthesetofcharactersthatmightbeamatch.Forexample:
S LIKE_REGEX '[a-z-[m-p]]'

matchesanythingbetween“a”and“z”,exceptforthelettersbetween“m”and“p”,inclusive.Thisexample

isequivalentto:
S LIKE_REGEX '[a-lq-z]'

Characterclasssubtractionscanbenestedindefinitely(althoughimplementation-definedlimitations

mayapply).
4.9 Alternation

Achoiceofregularexpressionsisspecifiedthroughtheuseofaverticalbar:“|”.Forexample:

S LIKE_REGEX 'a|b'
isTrueifScontainseitheran“a”ora“b”.
Alternationhaslowerprecedencethanconcatenation.Thus
© ISO/IEC 2021 – All rights reserved 9
---------------------- Page: 16 ----------------------
ISO/IEC 19075-1:2021(E)
4.9 Alternation
S LIKE_REGEX 'ab|xyz'

isTrueifScontainseither“ab”or“xyz”.Thisprecedencecanbeoverriddenthroughtheuseofparentheses,

suchasthisexample:
S LIKE_REGEX 'a(b|xy)z'
TheprecedingexampleisTrueifScontainseither“abz”or“axyz”.
4.10 Quantifiers

Quantifiersaremetacharactersthatspecifyamatchforsomenumberofrepetitionsofaregularexpression.

Therearetwosetsofquantifiers,thegreedyandthereluctant.Thegreedyquantifiersare:
{n} Exactlynrepetitions.
{n,} normorerepetitions.
{n,m} Betweennandmrepetitions,inclusive.
? 0(zero)or1(one)repetition;equivalentto{0,1}.
* 0(zero)ormorerepetitions;equivalentto{0,}.
+ 1(one)ormorerepetitions;equivalentto{1,}.

Thereluctantquantifiersareformedbysuffixingaquestionmarktoagreedyquantifier.Thus,“*?”isthe

reluctantformof“*”,and“??”isthereluctantformof“?”.Thegreedyquantifierstrytomatchasmuch

aspossible,whereasthereluctantquantifierstrytomatchaslittleaspossible(whilestillallowingthe

overallregularexpressiontomatch).Thereisnodifferenceinbehaviorbetweenthegreedyandreluctant

quantifiersforLIKE_REGEX.ThisdifferencefortheotheroperatorsisdiscussedinsubsequentSubclauses.

Example:
S LIKE_REGEX 'a{3}'
isequivalentto
S LIKE_REGEX 'aaa'

andmatchesanystringcontainingatleastthreeconsecutiveinstancesof“a”.NotethatifScontainsmore

thanthreeconsecutiveinstancesof“a”,itstillmatches;totestwhetherScontainsasubstringofthree

consecutiveinstancesof“a”andnomoreisalotharder,sincetheexpressionalsohastorequiresomething

otherthanan“a”atbothendsofthesubstring.
Anotherexample:
S LIKE_REGEX 'ab+c'
isequivalentto
S LIKE_REGEX 'ab{1,}c'

andmatchesanystringthatcontainsasubstringconsistingofan“a”,oneormore“b”s,andthena“c”.

10 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 17 ----------------------
ISO/IEC 19075-1:2021(E)
4.11 Locating a match
4.11 Locatingamatch

LIKE_REGEXonlycareswhetheramatchexists;theotheroperatorscareaboutwhereamatchislocated

inthestring.Considertheregularexpression“a+”andthestring“a1aa2aaa3”.Therearetenpossible

matchesfor“a+”,indicatedbytheunderliningonthefollowinglines:
'a1aa2aaa3' -- position 1, length 1
'a1aa2aaa3' -- position 3, length 1
'a1aa2aaa3' -- position 3, length 2
'a1aa2aaa3' -- position 4, length 1
'a1aa2aaa3' -- position 6, length 1
'a1aa2aaa3' -- position 6, length 2
'a1aa2aaa3' -- position 6, length 3
'a1aa2aaa3' -- position 7, length 1
'a1aa2aaa3' -- position 7, length 2
'a1aa2aaa3' -- position 8, length 1

Noticethatsomeofthematchesaresubstringsofothermatches.TherulesofXQueryregularexpressions

aredesignedtoignorecertainmatches,sothattherecognizedmatchesaremutuallydisjoint.Obviously

therearemanywaystodothis,sotherulesprovideprioritiesindeterminingtherecognizedmatches.

Therearethreepriorities:

1) Thetoppriorityistofindamatchasearlyinthestringaspossible.Thisiscommonlycalledtheleftmost

rule.

2) Thesecondpriorityistofindthefirstalternativeofanalternation,ifpossible.Theredoesnotappear

tobeacommonnameforthisrule.

3) Thelastpriorityistofindthelongestpossiblematchforgreedyquantifiers,andtheshortestmatch

forreluctantquantifiers.Inthecaseofgreedyquantifiers,thisiscommonlycalledthelongestrule;

theredoesnotappeartobeacommonnamefortheruleregardingreluctantquantifiers.

[Historicalnote:POSIXonlyhasaleftmostlongestrule.Therewerenoreluctantquantifiers,andthe

priorityformatchingalternationswasthelongestmatchratherthanthefirstalternative.]
TheserulesareillustratedbyexamplesinTable1,“Matchpriorities”.
Table1—Matchpriorities
Subject regular match(es) priority
string expression underlined
baaaaaa ba|a* baaaaaa firstalternative;
baaaaaa secondmatchalwaysstartsafterthefirst
match(eventhoughaaaaaawouldbelonger)
ab a|ab ab firstalternative(ratherthanmatchingab)
abcabbabc ab* abcabbabc leftmost
abcabbabc longest(greedyquantifierconsumestwo'b's)
abcabbabc longest
abcabbabc ab*? abcabbabc shortest(noneedtomatch'b')
abcabbabc shortest
abcabbabc shortest
© ISO/IEC 2021 – All rights reserved 11
---------------------- Page: 18 ----------------------
ISO/IEC 19075-1:2021(E)
4.12 Capture and back-reference
4.12 Capture and back-reference

A parenthesized sub-expression is a portion of a regular expression that is enclosed in parentheses.

Parenthesized sub-expressions are numbered in order of their left parenthesis. For example, in the reg-

ular expression
((a)|(b))
therearethreesub-expressions:
1) ((a)|(b))
2) (a)
3) (b)

Asub-expressioncanbereferencedlaterinaregularexpressionusingaback-reference,takingtheform

ofabackslashfollowedbyoneormoredigits.Thusthethreesub-expressionsintheexamplecanbe

referencedas“\1”,“\2”,and“\3”.Forexample,considertheregularexpression:
\p{Z}(\p{L}*)\p{Z}*\1\p{Z}

Thefirstandonlyparenthesizedsub-expression(“\p{L}*”)matchesanysequenceoflettersthatis

boundedbysomekindofspacecharacter(“\p{Z}”)beforeandafterthesequenceofletters.Theback-

reference(“\1”)matcheswhateversequenceofletterswascapturedbythefirstsub-expression.This

regularexpressionmightbeusedtosearchforoccurrencesofarepeatedword(perhapscausedbyacut-

and-pasteerror).Hereisanexampleofasubjectstring,withunderliningtoindicatethematchforthe

entireregularexpression:
Hello Dolly you’re looking looking swell

Whenaback-referencereferencesaparenthesizedgroupwithaquantifier,thentheback-reference

matchesthelastiterationofthequantifiedsub-expression.Forexample,considertheregularexpression:

'(ab*)*c*\1'
andthesubjectstring:
'abbbabbabcabbbbb'
Thematchesto“(ab*)”areshownbyunderlining:
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'

Thesethreeiterationsof“(ab*)”arematchedby“(ab*)*”andthenthe“c”ismatched.Next,amatchfor

“\1”isneeded.Thelastmatchforthefirstparenthesizedsub-expressionis“ab”,sotheoverallmatchis

indicatebyunderlining:
'abbbabbabcabbbbb'

Intheeventthatasub-expressionisunmatched,aback-referencetoitmatchesthezero-lengthstring.

Forexample,considertheregularexpression:
'((a*)|(b*))c??\3'
andthesubjectstring
...

INTERNATIONAL ISO/IEC
STANDARD 19075-1
First edition
Information technology — Guidance
for the use of database language
SQL —
Part 1:
XQuery regular expressions
PROOF/ÉPREUVE
Reference number
ISO/IEC 19075-1:2021(E)
ISO/IEC 2021
---------------------- Page: 1 ----------------------
ISO/IEC 19075-1:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 19075-1:2021(E)
Contents Page

Foreword......................................................................................v

Introduction..................................................................................vii

1 Scope....................................................................................1

2 Normativereferences....................................................................2

3 Termsanddefinitions....................................................................3

4 XQueryregularexpressions..............................................................4

4.1 ContextofXQueryregularexpressions......................................................4

4.2 IntroductiontoXQueryregularexpressions.................................................4

4.3 Matchingaspecificcharacter..............................................................4

4.4 Metacharactersandescapesequences......................................................5

4.5 Dot...................................................................................6

4.6 Anchors...............................................................................7

4.7 Lineterminators........................................................................7

4.8 Bracket expressions.....................................................................8

4.8.1 Introductiontobracketexpressions........................................................8

4.8.2 Listing characters.......................................................................8

4.8.3 Matchingarange........................................................................9

4.8.4 Negation...............................................................................9

4.8.5 Characterclasssubtraction...............................................................9

4.9 Alternation.............................................................................9

4.10 Quantifiers............................................................................10

4.11 Locatingamatch.......................................................................11

4.12 Captureandback-reference..............................................................12

4.13 Precedence...........................................................................13

4.14 Modes................................................................................13

5 Operatorsusingregularexpressions....................................................15

5.1 Introductiontooperatorsusingregularexpressions.........................................15

5.2 LIKE_REGEX..........................................................................15

5.3 OCCURRENCES_REGEX..................................................................16

5.4 POSITION_REGEX......................................................................17

5.5 SUBSTRING_REGEX....................................................................19

5.6 TRANSLATE_REGEX....................................................................20

Bibliography................................................................................23

Index........................................................................................24

© ISO/IEC 2021 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 19075-1:2021(E)
Tables
Table Page

1 Match priorities...........................................................................11

iv © ISO/IEC 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 19075-1:2021(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that are

members of ISO or IEC participate in the development of International Standards through technical

committees established by the respective organization to deal with particular fields of technical activity.

ISO and IEC technical committees collaborate in fields of mutual interest. Other international

organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the

work.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of document should be noted. This document was drafted in accordance with the editorial

rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details

of any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent

declarations received (see patents.iec.ch).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the World

Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see

www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.

This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,

Subcommittee SC 32, Data management and interchange.

This first edition of ISO/IEC 19075-1 cancels and replaces ISO/IEC TR 19075-1:2011.

This document is intended to be used in conjunction with the following editions of the parts of the

ISO/IEC 9075 series:
— ISO/IEC 9075-1, sixth edition or later;
— ISO/IEC 9075-2, sixth edition or later;
— ISO/IEC 9075-3, sixth edition or later;
— ISO/IEC 9075-4, seventh edition or later;
— ISO/IEC 9075-9, fifth edition or later;
— ISO/IEC 9075-10, fifth edition or later;
— ISO/IEC 9075-11, fifth edition or later;
— ISO/IEC 9075-13, fifth edition or later;
© ISO/IEC 2021 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/IEC 19075-1:2021(E)
— ISO/IEC 9075-14, sixth edition or later;
— ISO/IEC 9075-15, second edition or later;
— ISO/IEC 9075-16, first edition or later.

A list of all parts in the ISO/IEC 19075 series can be found on the ISO and IEC websites.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-

committees.
© ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/IEC 19075-1:2021(E)
Introduction
Theorganizationofthisdocumentisasfollows:
1) Clause1,“Scope”,specifiesthescopeofthisdocument.

2) Clause2,“Normativereferences”,identifiesadditionalstandardsthat,throughreferenceinthis

document,constituteprovisionsofthisdocument.

3) Clause3,“Termsanddefinitions”,definesthetermsanddefinitionsusedinthisdocument.

4) Clause4,“XQueryregularexpressions”,explainshowXQueryregularexpressionsareformed.

5) Clause5,“Operatorsusingregularexpressions”,explainshowtheSQLoperatorsuseregular

expressions.
© ISO/IEC 2021 – All rights reserved vii
---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19075-1:2021(E)
Informationtechnology—GuidancefortheuseofdatabaselanguageSQL—
Part1:
XQueryregularexpressions
1 Scope

ThisdocumentdescribestheregularexpressionsupportinSQL(ISO/IEC9075-2)adoptedfromthe

regularexpressionsyntaxofXQueryandXPathFunctionsandOperators3.1,whichisderivedfromPerl.

Thisdocumentdiscussesfiveoperatorsusingthisregularexpressionsyntax:
— LIKE_REGEXpredicate,todeterminetheexistenceofamatchtoaregularexpression.

— OCCURRENCES_REGEXnumericfunction,todeterminethenumberofmatchestoaregularexpression.

— POSITION_REGEXfunction,todeterminethepositionofamatch.
— SUBSTRING_REGEXfunction,toextractasubstringmatchingaregularexpression.
— TRANSLATE_REGEXfunction,toperformreplacementsusingaregularexpression.
© ISO/IEC 2021 – All rights reserved 1
---------------------- Page: 8 ----------------------
ISO/IEC 19075-1:2021(E)
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content consti-

tutes requirements of this document. For dated references, only the edition cited applies. For undated

references, the latest edition of the referenced document (including any amendments) applies.

ISO/IEC9075-1,Informationtechnology—Databaselanguages—SQL—Part1:Framework
(SQL/Framework)
ISO/IEC9075-2,Informationtechnology—Databaselanguages—SQL—Part2:Foundation
(SQL/Foundation)
ISO/IEC9075-14,Informationtechnology—Databaselanguages—SQL—Part14:XML-Related
Specifications(SQL/XML)

Bray,Timetal..ExtensibleMarkupLanguage(XML)Version1.0,W3CRecommendation[online].Fifth

Edition.Cambridge,Massachusetts,USA:W3C,26November2008.Availableat
http://www.w3.org/TR/xml

Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.1,W3CRecommendation[online].Second

Edition.Cambridge,Massachusetts,USA:W3C,16August2006.Availableat
http://www.w3.org/TR/xml11

Biron,PaulV.;Malhotra,Ashok.XMLSchemaPart2:Datatypes,W3CRecommendation[online].Second

Edition.Cambridge,Massachusetts,USA:W3C,28October2004.Availableat
http://www.w3.org/TR/xmlschema-2/

Malhotra,Ashoketal..XQueryandXPathFunctionsandOperators3,1,W3CRecommendation[online].

Cambridge,Massachusetts,USA:W3C,21March2017.Availableathttp://www.w3.org/TR/-
xpath-functions/

TheUnicodeConsortium.UnicodeRegularExpressions[online].21.MountainView,California,USA:

TheUnicodeConsortium,2020-06-17.Availableathttp://www.unicode.org/reports/tr18/-
tr18-21.html
2 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC 19075-1:2021(E)
3 Termsanddefinitions
Forthepurposesofthisdocument,thetermsanddefinitionsgiveninISO/IEC9075-1apply.

ISOandIECmaintainterminologicaldatabasesforuseinstandardizationatthefollowingaddresses:

— IECElectropedia:availableathttp://www.electropedia.org/
— ISOOnlinebrowsingplatform:availableathttp://www.iso.org/obp
© ISO/IEC 2021 – All rights reserved 3
---------------------- Page: 10 ----------------------
ISO/IEC 19075-1:2021(E)
4 XQueryregularexpressions
4.1 ContextofXQueryregularexpressions

TherequirementsforthematerialdiscussedinthisdocumentshallbeasspecifiedinISO/IEC9075-2,

ISO/IEC9075-14,XML1.0,XML1.1,XMLSchemaPart2:Datatypes,XQueryandXPathFunctionsand
Operators3.1,andUnicodeTechnicalStandard#18.
4.2 IntroductiontoXQueryregularexpressions

ThisdocumentexplainsthemannerinwhichXQueryregularexpressionsareusedbydatabaselanguage

SQLinISO/IEC9075-2andinISO/IEC9075-14.BothISO/IEC9075-2andISO/IEC9075-14specify
requirementsforthematerialdiscussedinthisdocument.

XQueryregularexpressionsyntaxisspecifiedinXQueryandXPathFunctionsandOperators3.1,section

5.6.1,“Regularexpressionsyntax”.ThispaperreferencestheXQueryspecification,withtwosmallmodi-

fications(requiredsincecharacterstringsinanRDBMSarenotnecessarilynormalizedaccordingtoXML

conventions).Thefollowingsubsectionsprovideanoverviewofthissyntax.

TheXQueryregularexpressionsyntaxisitselfamodificationofanotherregularexpressionsyntaxfound

inXMLSchemaPart2:Datatypes.

ThissectionpresentsanoverviewofthecapabilitiesofXQueryregularexpressionsyntax.Intheprocess,

thissectionwillillustratesomeoftheSQLoperators.TheSQLoperatorsthemselvesarepresentedin

Clause5,“Operatorsusingregularexpressions”.

ThefollowingdiscussiondoesnotcovereveryaspectofXQueryregularexpressions;forthis,XQuery

andXPathFunctionsandOperators3.1isthereference(thoughhardlyatutorial;avarietyofpopular

workscontaindetailedtreatmentsofregularexpressions).
4.3 Matchingaspecificcharacter

Perhapsthemostelementarypatternmatchingrequirementistheabilitytomatchasinglecharacteror

string.Formostcharacters,thisisdonebysimplywritingthecharacterintheregularexpression.For

example,supposeanapplicationneedstoknowifastringScontainstheletters“xyz”.Thiscouldbedone

withthefollowingpredicate:
S LIKE_REGEX 'xyz'

NotethattheSQLLIKEpredicatewouldrequireanexactmatchfor“xyz”.However,theconventionwith

regularexpressionsisthatSneedonlycontainasubstringthatis“xyz”.Forexample,allofthefollowing

valuesofSwouldyieldTruefortheimmediatelyprecedingpredicate:
xyz
abcxyz123
1 xyz 2 xyz 3 xyz
4 © ISO/IEC 2021– All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC 19075-1:2021(E)
4.3 Matching a specific character

Notethatinthelastexample,thereareactuallythreeoccurrencesoftheregularexpression“xyz”within

thetestedvalue.Theusermaywishtoknowthenumberofoccurrencesofamatch.Thiscanbedone
withOCCURRENCES_REGEX.Forexample:
OCCURRENCES_REGEX ('xyz' IN '1 xyz 2 xyz 3 xyz') = 3

Theusermightalsowishtoknowthepositionofaspecificmatch.ThiscanbedoneusingPOSITION_REGEX.

Forexample,tolearnthestartingcharacterpositionofthesecondoccurrence,
POSITION_REGEX ( 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 9

Itisalsopossibletoaskforthecharacterpositionofthefirstcharacterafterthematch.Forexample:

POSITION_REGEX ( AFTER 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 12

IfAFTERisusedandthelastcharacterofthesubjectstringisconsumed,thentheresultisthelengthof

thestringplus1(one):
POSITION_REGEX ( AFTER 'xyz' IN 'xyz' ) = 4
4.4 Metacharactersandescapesequences

Asmentioned,mostcharacterscanbematchedbysimplywritingthecharacterintheregularexpression.

However,certaincharactersarereservedasmetacharacters.Thecompletelistofmetacharactersis:

. \ ? * + { } ( ) | [ ] ^ $

TheuseofeachofthesemetacharactersisexplainedinsubsequentSubclauses.Tomatchametacharacter,

thenanescapesequence,consistingofabackslash(“\”)followedbythemetacharacterisused.For

example,thisexpressiontestswhetherastringcontainsadollarsign:
S LIKE_REGEX '\$'

Inparticular,theescapesequencerepresentingabackslashistwoconsecutivebackslashes.Thereare

variousotherdefinedescapesequences,matchingeitherasinglecharacter,oranyofagroupofcharacters.

Thesinglecharacterescapesequencesare:
\n newline(U+000A)
\r return(U+000D)
\t tab(U+0009)
\- minussign('-')

Theso-calledcategoryescapesareexemplifiedby“\p{L}”or“\p{Lu}”.Acategoryescapebeginswith

“\p{”followedbyoneuppercaseletter,optionallyalowercaseletter,andthentheclosingbrace.Inthese

example,“\p{L}”matchesanyletter(asdefinedbyUnicode)and“\p{Lu}”matchesanyuppercaseletter.

Someinterestingcategoryescapesare:
\p{L} Anyletter.
\p{Lu} Anyuppercaseletter.
\p{Ll} Anylowercaseletter.
\p{Nd} Anydecimaldigit.
\p{P} Anypunctuationmark.
© ISO/IEC 2021 – All rights reserved 5
---------------------- Page: 12 ----------------------
ISO/IEC 19075-1:2021(E)
4.4 Metacharacters and escape sequences
\p{Z} Any separator (space, line, paragraph, etc.).

The complete list of category escapes is found in XML Schema Part 2: Datatypes, section F.1.1, “Character

class escapes”.

There are also complementary category escapes, which are exemplified by “\P{L}” or “\P{Lu}”. A comple-

mentary category escape matches any character that would not be matched by the corresponding category

escape. The difference is that the (positive) character escape is written with a lowercase “p” whereas the

complementary character escape is written with an uppercase “P”.

The so-called block escapes match any character in a block of Unicode, that is, a predefined consecutive

range of code points. For example, “\p{IsBasicLatin}” matches the ASCII character set. There are also

complementary block escapes, such as “\P{IsBasicLatin}”, which matches any single character that is

not an ASCII character.
Finally, there are the following multi-character escape sequences:
\s AsdefinedbyXMLSchemaPart2:Datatypes,thisescapematchesspace(U+0020),tab
(U+0009),newline(U+000A),orreturn(U+000D).SincecharacterstringsinanRDBMShave
notundergoneXMLlineterminationnormalization,theescapeisbroadenedtoincludeany
characterortwo-charactersequencethatisrecognizedbyUnicodeTechnicalStandard#18
asalineterminator.Subclause4.7,“Lineterminators”,discussesthisissuefurther.
\S Anysinglecharacternotmatchedby\s.

\i Underscore(“_”),colon(“:”)orletter(thisisalotmorethanjusttheLatinletters;seeXML

1.0appendixB,rule[84]).
\I Anysinglecharacternotmatchedby\i.
\c AnysinglecharactermatchedbyNameChar,asdefinedinXML1.0section2.3,rule[4a].
\C Anysinglecharacternotmatchedby\c.
\d Anysingledigit
\D Anysinglecharacternotmatchedby\d.

\w AnysingleUnicodecharacterexceptthoseclassifiedas“punctuation”,“separator”,or“other”.

\W Thecomplementof\w.
4.5 Dot

Dot(period,“.”)isametacharacterthatisusedtomatchanysinglecharacter(thesamebehavioras“_”

inLIKEpredicates),oranysinglecharacterthatisnotalineterminator.Thedefaultistomatchanything

exceptalineterminator.Thealternative,calleddot-allmode,isspecifiedusingaflagthatcontainsa

lowercase“s”.
Forexample
S LIKE_REGEX 'a.b'
matchesthefollowing:
'xa0by'
butnotthefollowing:
6 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 13 ----------------------
ISO/IEC 19075-1:2021(E)
4.5 Dot
'xa
by'

becausethecharacterbetweenthe“a”andthe“b”isalineterminator.However,usingdot-allmodelike

this:
S LIKE_REGEX 'a.b' FLAG 's'
wouldmatchbothexamples.
4.6 Anchors

AsdiscussedinthisClause,regularexpressionslookforamatchanywherewithinastring,without

needingtomatchtheentirestring.Buthowcananapplicationrequireamatchoftheentirestring?For

this,theapplicationusesanchors.Theanchorsarethemetacharacters“^”forthestartofastring(or

line),and“$”fortheendofastring(orline).Forexample:
S LIKE_REGEX '^xyz$'
canonlymatchastringthatisprecisely'xyz'.
Anchorsmaybeusedseparatelytorequirea“beginswith”or“endwith”match.Forexample
S LIKE_REGEX '^xyz'
matchesanystringthatbeginswith“xyz”,and
S LIKE_REGEX 'xyz$'
matchesanystringthatendswith“xyz”.
Insteadofmatchingthebeginorendofthestring,theanchorsmaybeusedtoanchoramatchtothe

beginorendofaline,byperformingthematchinmulti-linemode.Multi-linemodeisspecifiedusinga

flagcontainingalowercase“m”.Forexample:
S LIKE_REGEX '^xyz' FLAG 'm'

performsananchoredsearchinmulti-linemode,matchinganystringcontainingalinethatbeginswith

“xyz”.Theimmediatelyprecedingexamplewouldmatchthefollowingstring:
'line one
xyz
line three'
4.7 Lineterminators

Themetacharacters“.”,“^”,and“$”andthemulti-characterescapesequences“\s”and“\S”aredefined

intermsofa“lineterminator”.Whatcountsasalineterminator?XQueryandXPathFunctionsand

Operators3.1onlyrecognizesalinefeed(U+000A)asalineterminator.Thisdefinitionworkswellfor

XQuery,becauseXMLnormalizesthelineterminatorsonvariousplatformstoalinefeed.

AcloserlookshowsthatXMLhastwodefinitionsoflinehandling,insection2.11,“End-of-linehandling”,

ofXML1.0andXML1.1.SowhichshouldbeusedforSQL?
© ISO/IEC 2021 – All rights reserved 7
---------------------- Page: 14 ----------------------
ISO/IEC 19075-1:2021(E)
4.7 Line terminators

A first stop in answering this is to look at the definition of in ISO/IEC 9075-14, which

requires XML 1.0 as a basic level of support, and permits XML 1.1 support in the form of Feature X211,

“XML 1.1 support”. So, character string can be normalized according to either XML 1.0 or XML 1.1 as an

implementation-defined choice, or perhaps via a conformance feature.

However, some of the line terminators, even in XML 1.0, are two-character sequences. XML normalizes

its input, which means that such two-character sequences are converted to a single character. This changes

the relative position of every subsequent character, which would cause unexpected results for POSI-

TION_REGEX.

Our solution is to look to Unicode Technical Standard #18, a Unicode standard containing guidelines for

regular expression processors. This provides a referenceable definition of line terminator that does not

require normalizing the subject character string.
4.8 Bracket expressions
4.8.1 Introduction to bracket expressions

So far, this Clause has shown how to match a specific character or any character from certain predefined

sets of characters. Using bracket expressions, applications can specify user-defined group of characters.

(XML Schema and XQuery call these character class expressions, but the term bracket expression is in

common use.)

A bracket expression is begun by a left bracket “[” and terminated by a right bracket “]”. Bracket

expressions have a different list of special characters, namely
^ [ ] \

Forclarity,thesearecalledspecialcharacters,incontrasttothemetacharacterslistedearlier.

4.8.2 Listingcharacters

Ifabracketexpressiondoesnotcontainanyofthespecialcharacters,thenthebracketexpressionmatches

anysinglecharacterthatislistedbetweenthebrackets.Forexample,
S LIKE_REGEX '[abc]'
matchesanyofthefollowing:
'say'
'boy'
'lack'

Allbackslashescapesequencesareavailableforusewithinabracketexpression.Forexample,thefollowing

expressionmatcheseitheracaretorabackslash:
S LIKE_REGEX '[\^\\]'
Thefollowingexpressionmatchesalllettersordigits:
S LIKE_REGEX '[\p{L}\p{Nd}]'
where“\p{L}”istheescapematchinganyletterand“\p{Nd}”istheescapematchinganydigit.
8 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 15 ----------------------
ISO/IEC 19075-1:2021(E)
4.8 Bracket expressions
4.8.3 Matchingarange
Aminussign“-”isusedtospecifyacharacterrange.Forexample:
S LIKE_REGEX '[sa-my]'

matchesthelowercaseletters“s”,allthelettersbetween“a”and“m”inclusive,and“y”.Rangesaredefined

intermsoftheUCScodepointordering.Whentherearemultipleranges,thebracketexpressionmatches

theunionoftheranges.Forexample:
S LIKE_REGEX '[a-me-z]'
matchesalllowercaseletters.

Usingaspecialcharacterinarangeissometimespermitted,buttricky.Ratherthanpresenttherules

here,oneshoulduseabackslashescapeifthestartorendpointofarangeisbeaspecialcharacter.

4.8.4 Negation

Acaret“^”isaspecialcharacterwhenitisthefirstcharacterofabracketexpression,whereitindicates

thatthesetofcharactersisanythingnotlistedbythefollowingbracketexpression.Forexample:

S LIKE_REGEX '[^aj-m]'
isTrueifScontainsanycharacterthatisnot“a”,“j”,“k”,“l”,or“m”.
4.8.5 Characterclasssubtraction

Abracketexpressionmayconcludewithaminussign“-”followedbyanestedbracketexpression.This

iscalledacharacterclasssubtraction,andindicatesthatanycharactermatchedbythenestedbracket

expressionistoberemovedfromthesetofcharactersthatmightbeamatch.Forexample:
S LIKE_REGEX '[a-z-[m-p]]'

matchesanythingbetween“a”and“z”,exceptforthelettersbetween“m”and“p”,inclusive.Thisexample

isequivalentto:
S LIKE_REGEX '[a-lq-z]'

Characterclasssubtractionscanbenestedindefinitely(althoughimplementation-definedlimitations

mayapply).
4.9 Alternation

Achoiceofregularexpressionsisspecifiedthroughtheuseofaverticalbar:“|”.Forexample:

S LIKE_REGEX 'a|b'
isTrueifScontainseitheran“a”ora“b”.
Alternationhaslowerprecedencethanconcatenation.Thus
© ISO/IEC 2021 – All rights reserved 9
---------------------- Page: 16 ----------------------
ISO/IEC 19075-1:2021(E)
4.9 Alternation
S LIKE_REGEX 'ab|xyz'

isTrueifScontainseither“ab”or“xyz”.Thisprecedencecanbeoverriddenthroughtheuseofparentheses,

suchasthisexample:
S LIKE_REGEX 'a(b|xy)z'
TheprecedingexampleisTrueifScontainseither“abz”or“axyz”.
4.10 Quantifiers

Quantifiersaremetacharactersthatspecifyamatchforsomenumberofrepetitionsofaregularexpression.

Therearetwosetsofquantifiers,thegreedyandthereluctant.Thegreedyquantifiersare:
{n} Exactlynrepetitions.
{n,} normorerepetitions.
{n,m} Betweennandmrepetitions,inclusive.
? 0(zero)or1(one)repetition;equivalentto{0,1}.
* 0(zero)ormorerepetitions;equivalentto{0,}.
+ 1(one)ormorerepetitions;equivalentto{1,}.

Thereluctantquantifiersareformedbysuffixingaquestionmarktoagreedyquantifier.Thus,“*?”isthe

reluctantformof“*”,and“??”isthereluctantformof“?”.Thegreedyquantifierstrytomatchasmuch

aspossible,whereasthereluctantquantifierstrytomatchaslittleaspossible(whilestillallowingthe

overallregularexpressiontomatch).Thereisnodifferenceinbehaviorbetweenthegreedyandreluctant

quantifiersforLIKE_REGEX.ThisdifferencefortheotheroperatorsisdiscussedinsubsequentSubclauses.

Example:
S LIKE_REGEX 'a{3}'
isequivalentto
S LIKE_REGEX 'aaa'

andmatchesanystringcontainingatleastthreeconsecutiveinstancesof“a”.NotethatifScontainsmore

thanthreeconsecutiveinstancesof“a”,itstillmatches;totestwhetherScontainsasubstringofthree

consecutiveinstancesof“a”andnomoreisalotharder,sincetheexpressionalsohastorequiresomething

otherthanan“a”atbothendsofthesubstring.
Anotherexample:
S LIKE_REGEX 'ab+c'
isequivalentto
S LIKE_REGEX 'ab{1,}c'

andmatchesanystringthatcontainsasubstringconsistingofan“a”,oneormore“b”s,andthena“c”.

10 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 17 ----------------------
ISO/IEC 19075-1:2021(E)
4.11 Locating a match
4.11 Locatingamatch

LIKE_REGEXonlycareswhetheramatchexists;theotheroperatorscareaboutwhereamatchislocated

inthestring.Considertheregularexpression“a+”andthestring“a1aa2aaa3”.Therearetenpossible

matchesfor“a+”,indicatedbytheunderliningonthefollowinglines:
'a1aa2aaa3' -- position 1, length 1
'a1aa2aaa3' -- position 3, length 1
'a1aa2aaa3' -- position 3, length 2
'a1aa2aaa3' -- position 4, length 1
'a1aa2aaa3' -- position 6, length 1
'a1aa2aaa3' -- position 6, length 2
'a1aa2aaa3' -- position 6, length 3
'a1aa2aaa3' -- position 7, length 1
'a1aa2aaa3' -- position 7, length 2
'a1aa2aaa3' -- position 8, length 1

Noticethatsomeofthematchesaresubstringsofothermatches.TherulesofXQueryregularexpressions

aredesignedtoignorecertainmatches,sothattherecognizedmatchesaremutuallydisjoint.Obviously

therearemanywaystodothis,sotherulesprovideprioritiesindeterminingtherecognizedmatches.

Therearethreepriorities:

1) Thetoppriorityistofindamatchasearlyinthestringaspossible.Thisiscommonlycalledtheleftmost

rule.

2) Thesecondpriorityistofindthefirstalternativeofanalternation,ifpossible.Theredoesnotappear

tobeacommonnameforthisrule.

3) Thelastpriorityistofindthelongestpossiblematchforgreedyquantifiers,andtheshortestmatch

forreluctantquantifiers.Inthecaseofgreedyquantifiers,thisiscommonlycalledthelongestrule;

theredoesnotappeartobeacommonnamefortheruleregardingreluctantquantifiers.

[Historicalnote:POSIXonlyhasaleftmostlongestrule.Therewerenoreluctantquantifiers,andthe

priorityformatchingalternationswasthelongestmatchratherthanthefirstalternative.]
TheserulesareillustratedbyexamplesinTable1,“Matchpriorities”.
Table1—Matchpriorities
Subject regular match(es) priority
string expression underlined
baaaaaa ba|a* baaaaaa firstalternative;
baaaaaa secondmatchalwaysstartsafterthefirst
match(eventhoughaaaaaawouldbelonger)
ab a|ab ab firstalternative(ratherthanmatchingab)
abcabbabc ab* abcabbabc leftmost
abcabbabc longest(greedyquantifierconsumestwo'b's)
abcabbabc longest
abcabbabc ab*? abcabbabc shortest(noneedtomatch'b')
abcabbabc shortest
abcabbabc shortest
© ISO/IEC 2021 – All rights reserved 11
---------------------- Page: 18 ----------------------
ISO/IEC 19075-1:2021(E)
4.12 Capture and back-reference
4.12 Capture and back-reference

A parenthesized sub-expression is a portion of a regular expression that is enclosed in parentheses.

Parenthesized sub-expressions are numbered in order of their left parenthesis. For example, in the reg-

ular expression
((a)|(b))
therearethreesub-expressions:
1) ((a)|(b))
2) (a)
3) (b)

Asub-expressioncanbereferencedlaterinaregularexpressionusingaback-reference,takingtheform

ofabackslashfollowedbyoneormoredigits.Thusthethreesub-expressionsintheexamplecanbe

referencedas“\1”,“\2”,and“\3”.Forexample,considertheregularexpression:
\p{Z}(\p{L}*)\p{Z}*\1\p{Z}

Thefirstandonlyparenthesizedsub-expression(“\p{L}*”)matchesanysequenceoflettersthatis

boundedbysomekindofspacecharacter(“\p{Z}”)beforeandafterthesequenceofletters.Theback-

reference(“\1”)matcheswhateversequenceofletterswascapturedbythefirstsub-expression.This

regularexpressionmightbeusedtosearchforoccurrencesofarepeatedword(perhapscausedbyacut-

and-pasteerror).Hereisanexampleofasubjectstring,withunderliningtoindicatethematchforthe

entireregularexpression:
Hello Dolly you’re looking looking swell

Whenaback-referencereferencesaparenthesizedgroupwithaquantifier,thentheback-reference

matchesthelastiterationofthequantifiedsub-expression.Forexample,considertheregularexpression:

'(ab*)*c*\1'
andthesubjectstring:
'abbbabbabcabbbbb'
Thematchesto“(ab*)”areshownbyunderlining:
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'

Thesethreeiterationsof“(ab*)”arematchedby“(ab*)*”andthenthe“c”ismatched.Next,amatchfor

“\1”isneeded.Thelastmatchforthefirstparenthesizedsub-expressionis“ab”,sotheoverallmatchis

indicatebyunderlining:
'abbbabbabcabbbbb'

Intheeventthatasub-expressionisunmatched,aback-referencetoitmatchesthezero-lengthstring.

Forexample,considertheregularexpression:
'((a*)|(b*))c??\3'
andthesubjectstring:
'xyzaaccb'
12 © ISO/IEC 2021 – All rights reserved
---------------------- Page: 19 ----------------------
ISO/IEC 19075-1:2021(E)
4.12 Capture an
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.