Information technology — Guidance for the use of database language SQL — Part 1: XQuery regular expressions

Technologies de l'information — Recommandations pour l'utilisation du langage de base de données SQL — Partie 1: Expressions régulières de XQuery en SQL

General Information

Status
Published
Publication Date
30-Aug-2021
Current Stage
6060 - International Standard published
Start Date
31-Aug-2021
Due Date
11-Feb-2024
Completion Date
31-Aug-2021
Ref Project

Relations

Buy Standard

Standard
ISO/IEC 19075-1:2021 - Information technology -- Guidance for the use of database language SQL
English language
25 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/IEC PRF 19075-1:Version 10-jul-2021 - Information technology -- Guidance for the use of database language SQL
English language
25 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 19075-1
First edition
2021-08
Information technology — Guidance
for the use of database language
SQL —
Part 1:
XQuery regular expressions
Technologies de l'information — Recommandations pour l'utilisation
du langage de base de données SQL —
Partie 1: Expressions régulières de XQuery en SQL
Reference number
ISO/IEC 19075-1:2021(E)
©
ISO/IEC 2021

---------------------- Page: 1 ----------------------
ISO/IEC 19075-1:2021(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2021 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 19075-1:2021(E)
Contents Page
Foreword.v
Introduction.vii
1 Scope.1
2 Normativereferences.2
3 Termsanddefinitions.3
4 XQueryregularexpressions.4
4.1 ContextofXQueryregularexpressions.4
4.2 IntroductiontoXQueryregularexpressions.4
4.3 Matchingaspecificcharacter.4
4.4 Metacharactersandescapesequences.5
4.5 Dot.6
4.6 Anchors.7
4.7 Lineterminators.7
4.8 Bracket expressions.8
4.8.1 Introductiontobracketexpressions.8
4.8.2 Listing characters.8
4.8.3 Matchingarange.9
4.8.4 Negation.9
4.8.5 Characterclasssubtraction.9
4.9 Alternation.9
4.10 Quantifiers.10
4.11 Locatingamatch.11
4.12 Captureandback-reference.12
4.13 Precedence.13
4.14 Modes.13
5 Operatorsusingregularexpressions.15
5.1 Introductiontooperatorsusingregularexpressions.15
5.2 LIKE_REGEX.15
5.3 OCCURRENCES_REGEX.16
5.4 POSITION_REGEX.17
5.5 SUBSTRING_REGEX.19
5.6 TRANSLATE_REGEX.20
Bibliography.23
Index.24
© ISO/IEC 2021 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 19075-1:2021(E)
Tables
Table Page
1 Match priorities.11
iv © ISO/IEC 2021 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 19075-1:2021(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of document should be noted. This document was drafted in accordance with the editorial
rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details
of any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent
declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the World
Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 32, Data management and interchange.
This first edition of ISO/IEC 19075-1 cancels and replaces ISO/IEC TR 19075-1:2011.
This document is intended to be used in conjunction with the following editions of the parts of the
ISO/IEC 9075 series:
— ISO/IEC 9075-1, sixth edition or later;
— ISO/IEC 9075-2, sixth edition or later;
— ISO/IEC 9075-3, sixth edition or later;
— ISO/IEC 9075-4, seventh edition or later;
— ISO/IEC 9075-9, fifth edition or later;
— ISO/IEC 9075-10, fifth edition or later;
— ISO/IEC 9075-11, fifth edition or later;
— ISO/IEC 9075-13, fifth edition or later;
v
© ISO/IEC 2021 – All rights reserved

---------------------- Page: 5 ----------------------
ISO/IEC 19075-1:2021(E)
— ISO/IEC 9075-14, sixth edition or later;
— ISO/IEC 9075-15, second edition or later;
— ISO/IEC 9075-16, first edition or later.
A list of all parts in the ISO/IEC 19075 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-
committees.
© ISO 2021 – All rights reserved
vi

---------------------- Page: 6 ----------------------
ISO/IEC 19075-1:2021(E)
Introduction
Theorganizationofthisdocumentisasfollows:
1) Clause1,“Scope”,specifiesthescopeofthisdocument.
2) Clause2,“Normativereferences”,identifiesadditionalstandardsthat,throughreferenceinthis
document,constituteprovisionsofthisdocument.
3) Clause3,“Termsanddefinitions”,definesthetermsanddefinitionsusedinthisdocument.
4) Clause4,“XQueryregularexpressions”,explainshowXQueryregularexpressionsareformed.
5) Clause5,“Operatorsusingregularexpressions”,explainshowtheSQLoperatorsuseregular
expressions.
© ISO/IEC 2021 – All rights reserved vii

---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19075-1:2021(E)
Informationtechnology—GuidancefortheuseofdatabaselanguageSQL—
Part1:
XQueryregularexpressions
1 Scope
ThisdocumentdescribestheregularexpressionsupportinSQL(ISO/IEC9075-2)adoptedfromthe
regularexpressionsyntaxofXQueryandXPathFunctionsandOperators3.1,whichisderivedfromPerl.
Thisdocumentdiscussesfiveoperatorsusingthisregularexpressionsyntax:
— LIKE_REGEXpredicate,todeterminetheexistenceofamatchtoaregularexpression.
— OCCURRENCES_REGEXnumericfunction,todeterminethenumberofmatchestoaregularexpression.
— POSITION_REGEXfunction,todeterminethepositionofamatch.
— SUBSTRING_REGEXfunction,toextractasubstringmatchingaregularexpression.
— TRANSLATE_REGEXfunction,toperformreplacementsusingaregularexpression.
© ISO/IEC 2021 – All rights reserved 1

---------------------- Page: 8 ----------------------
ISO/IEC 19075-1:2021(E)
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content consti-
tutes requirements of this document. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC9075-1,Informationtechnology—Databaselanguages—SQL—Part1:Framework
(SQL/Framework)
ISO/IEC9075-2,Informationtechnology—Databaselanguages—SQL—Part2:Foundation
(SQL/Foundation)
ISO/IEC9075-14,Informationtechnology—Databaselanguages—SQL—Part14:XML-Related
Specifications(SQL/XML)
Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.0,W3CRecommendation[online].Fifth
Edition.Cambridge,Massachusetts,USA:W3C,26November2008.Availableat
http://www.w3.org/TR/xml
Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.1,W3CRecommendation[online].Second
Edition.Cambridge,Massachusetts,USA:W3C,16August2006.Availableat
http://www.w3.org/TR/xml11
Biron,PaulV.;Malhotra,Ashok.XMLSchemaPart2:Datatypes,W3CRecommendation[online].Second
Edition.Cambridge,Massachusetts,USA:W3C,28October2004.Availableat
http://www.w3.org/TR/xmlschema-2/
Malhotra,Ashoketal.XQueryandXPathFunctionsandOperators3,1,W3CRecommendation[online].
Cambridge,Massachusetts,USA:W3C,21March2017.Availableathttp://www.w3.org/TR/-
xpath-functions/
TheUnicodeConsortium.UnicodeRegularExpressions[online].21.MountainView,California,USA:
TheUnicodeConsortium,2020-06-17.Availableathttp://www.unicode.org/reports/tr18/-
tr18-21.html
2 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 19075-1:2021(E)
3 Termsanddefinitions
Forthepurposesofthisdocument,thetermsanddefinitionsgiveninISO/IEC9075-1apply.
ISOandIECmaintainterminologicaldatabasesforuseinstandardizationatthefollowingaddresses:
— IECElectropedia:availableathttp://www.electropedia.org/
— ISOOnlinebrowsingplatform:availableathttp://www.iso.org/obp
© ISO/IEC 2021 – All rights reserved 3

---------------------- Page: 10 ----------------------
ISO/IEC 19075-1:2021(E)
4 XQueryregularexpressions
4.1 ContextofXQueryregularexpressions
TherequirementsforthematerialdiscussedinthisdocumentshallbeasspecifiedinISO/IEC9075-2,
ISO/IEC9075-14,XML1.0,XML1.1,XMLSchemaPart2:Datatypes,XQueryandXPathFunctionsand
Operators3.1,andUnicodeTechnicalStandard#18.
4.2 IntroductiontoXQueryregularexpressions
ThisdocumentexplainsthemannerinwhichXQueryregularexpressionsareusedbydatabaselanguage
SQLinISO/IEC9075-2andinISO/IEC9075-14.BothISO/IEC9075-2andISO/IEC9075-14specify
requirementsforthematerialdiscussedinthisdocument.
XQueryregularexpressionsyntaxisspecifiedinXQueryandXPathFunctionsandOperators3.1,section
5.6.1,“Regularexpressionsyntax”.ThispaperreferencestheXQueryspecification,withtwosmallmodi-
fications(requiredsincecharacterstringsinanRDBMSarenotnecessarilynormalizedaccordingtoXML
conventions).Thefollowingsubsectionsprovideanoverviewofthissyntax.
TheXQueryregularexpressionsyntaxisitselfamodificationofanotherregularexpressionsyntaxfound
inXMLSchemaPart2:Datatypes.
ThissectionpresentsanoverviewofthecapabilitiesofXQueryregularexpressionsyntax.Intheprocess,
thissectionwillillustratesomeoftheSQLoperators.TheSQLoperatorsthemselvesarepresentedin
Clause5,“Operatorsusingregularexpressions”.
ThefollowingdiscussiondoesnotcovereveryaspectofXQueryregularexpressions;forthis,XQuery
andXPathFunctionsandOperators3.1isthereference(thoughhardlyatutorial;avarietyofpopular
workscontaindetailedtreatmentsofregularexpressions).
4.3 Matchingaspecificcharacter
Perhapsthemostelementarypatternmatchingrequirementistheabilitytomatchasinglecharacteror
string.Formostcharacters,thisisdonebysimplywritingthecharacterintheregularexpression.For
example,supposeanapplicationneedstoknowifastringScontainstheletters“xyz”.Thiscouldbedone
withthefollowingpredicate:
S LIKE_REGEX 'xyz'
NotethattheSQLLIKEpredicatewouldrequireanexactmatchfor“xyz”.However,theconventionwith
regularexpressionsisthatSneedonlycontainasubstringthatis“xyz”.Forexample,allofthefollowing
valuesofSwouldyieldTruefortheimmediatelyprecedingpredicate:
xyz
abcxyz123
1 xyz 2 xyz 3 xyz
4 © ISO/IEC 2021– All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 19075-1:2021(E)
4.3 Matching a specific character
Notethatinthelastexample,thereareactuallythreeoccurrencesoftheregularexpression“xyz”within
thetestedvalue.Theusermaywishtoknowthenumberofoccurrencesofamatch.Thiscanbedone
withOCCURRENCES_REGEX.Forexample:
OCCURRENCES_REGEX ('xyz' IN '1 xyz 2 xyz 3 xyz') = 3
Theusermightalsowishtoknowthepositionofaspecificmatch.ThiscanbedoneusingPOSITION_REGEX.
Forexample,tolearnthestartingcharacterpositionofthesecondoccurrence,
POSITION_REGEX ( 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 9
Itisalsopossibletoaskforthecharacterpositionofthefirstcharacterafterthematch.Forexample:
POSITION_REGEX ( AFTER 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 12
IfAFTERisusedandthelastcharacterofthesubjectstringisconsumed,thentheresultisthelengthof
thestringplus1(one):
POSITION_REGEX ( AFTER 'xyz' IN 'xyz' ) = 4
4.4 Metacharactersandescapesequences
Asmentioned,mostcharacterscanbematchedbysimplywritingthecharacterintheregularexpression.
However,certaincharactersarereservedasmetacharacters.Thecompletelistofmetacharactersis:
. \ ? * + { } ( ) | [ ] ^ $
TheuseofeachofthesemetacharactersisexplainedinsubsequentSubclauses.Tomatchametacharacter,
thenanescapesequence,consistingofabackslash(“\”)followedbythemetacharacterisused.For
example,thisexpressiontestswhetherastringcontainsadollarsign:
S LIKE_REGEX '\$'
Inparticular,theescapesequencerepresentingabackslashistwoconsecutivebackslashes.Thereare
variousotherdefinedescapesequences,matchingeitherasinglecharacter,oranyofagroupofcharacters.
Thesinglecharacterescapesequencesare:
\n newline(U+000A)
\r return(U+000D)
\t tab(U+0009)
\- minussign('-')
Theso-calledcategoryescapesareexemplifiedby“\p{L}”or“\p{Lu}”.Acategoryescapebeginswith
“\p{”followedbyoneuppercaseletter,optionallyalowercaseletter,andthentheclosingbrace.Inthese
example,“\p{L}”matchesanyletter(asdefinedbyUnicode)and“\p{Lu}”matchesanyuppercaseletter.
Someinterestingcategoryescapesare:
\p{L} Anyletter.
\p{Lu} Anyuppercaseletter.
\p{Ll} Anylowercaseletter.
\p{Nd} Anydecimaldigit.
\p{P} Anypunctuationmark.
© ISO/IEC 2021 – All rights reserved 5

---------------------- Page: 12 ----------------------
ISO/IEC 19075-1:2021(E)
4.4 Metacharacters and escape sequences
\p{Z} Any separator (space, line, paragraph, etc.).
The complete list of category escapes is found in XML Schema Part 2: Datatypes, section F.1.1, “Character
class escapes”.
There are also complementary category escapes, which are exemplified by “\P{L}” or “\P{Lu}”. A comple-
mentary category escape matches any character that would not be matched by the corresponding category
escape. The difference is that the (positive) character escape is written with a lowercase “p” whereas the
complementary character escape is written with an uppercase “P”.
The so-called block escapes match any character in a block of Unicode, that is, a predefined consecutive
range of code points. For example, “\p{IsBasicLatin}” matches the ASCII character set. There are also
complementary block escapes, such as “\P{IsBasicLatin}”, which matches any single character that is
not an ASCII character.
Finally, there are the following multi-character escape sequences:
\s AsdefinedbyXMLSchemaPart2:Datatypes,thisescapematchesspace(U+0020),tab
(U+0009),newline(U+000A),orreturn(U+000D).SincecharacterstringsinanRDBMShave
notundergoneXMLlineterminationnormalization,theescapeisbroadenedtoincludeany
characterortwo-charactersequencethatisrecognizedbyUnicodeTechnicalStandard#18
asalineterminator.Subclause4.7,“Lineterminators”,discussesthisissuefurther.
\S Anysinglecharacternotmatchedby\s.
\i Underscore(“_”),colon(“:”)orletter(thisisalotmorethanjusttheLatinletters;seeXML
1.0appendixB,rule[84]).
\I Anysinglecharacternotmatchedby\i.
\c AnysinglecharactermatchedbyNameChar,asdefinedinXML1.0section2.3,rule[4a].
\C Anysinglecharacternotmatchedby\c.
\d Anysingledigit
\D Anysinglecharacternotmatchedby\d.
\w AnysingleUnicodecharacterexceptthoseclassifiedas“punctuation”,“separator”,or“other”.
\W Thecomplementof\w.
4.5 Dot
Dot(period,“.”)isametacharacterthatisusedtomatchanysinglecharacter(thesamebehavioras“_”
inLIKEpredicates),oranysinglecharacterthatisnotalineterminator.Thedefaultistomatchanything
exceptalineterminator.Thealternative,calleddot-allmode,isspecifiedusingaflagthatcontainsa
lowercase“s”.
Forexample
S LIKE_REGEX 'a.b'
matchesthefollowing:
'xa0by'
butnotthefollowing:
6 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC 19075-1:2021(E)
4.5 Dot
'xa
by'
becausethecharacterbetweenthe“a”andthe“b”isalineterminator.However,usingdot-allmodelike
this:
S LIKE_REGEX 'a.b' FLAG 's'
wouldmatchbothexamples.
4.6 Anchors
AsdiscussedinthisClause,regularexpressionslookforamatchanywherewithinastring,without
needingtomatchtheentirestring.Buthowcananapplicationrequireamatchoftheentirestring?For
this,theapplicationusesanchors.Theanchorsarethemetacharacters“^”forthestartofastring(or
line),and“$”fortheendofastring(orline).Forexample:
S LIKE_REGEX '^xyz$'
canonlymatchastringthatisprecisely'xyz'.
Anchorsmaybeusedseparatelytorequirea“beginswith”or“endwith”match.Forexample
S LIKE_REGEX '^xyz'
matchesanystringthatbeginswith“xyz”,and
S LIKE_REGEX 'xyz$'
matchesanystringthatendswith“xyz”.
Insteadofmatchingthebeginorendofthestring,theanchorsmaybeusedtoanchoramatchtothe
beginorendofaline,byperformingthematchinmulti-linemode.Multi-linemodeisspecifiedusinga
flagcontainingalowercase“m”.Forexample:
S LIKE_REGEX '^xyz' FLAG 'm'
performsananchoredsearchinmulti-linemode,matchinganystringcontainingalinethatbeginswith
“xyz”.Theimmediatelyprecedingexamplewouldmatchthefollowingstring:
'line one
xyz
line three'
4.7 Lineterminators
Themetacharacters“.”,“^”,and“$”andthemulti-characterescapesequences“\s”and“\S”aredefined
intermsofa“lineterminator”.Whatcountsasalineterminator?XQueryandXPathFunctionsand
Operators3.1onlyrecognizesalinefeed(U+000A)asalineterminator.Thisdefinitionworkswellfor
XQuery,becauseXMLnormalizesthelineterminatorsonvariousplatformstoalinefeed.
AcloserlookshowsthatXMLhastwodefinitionsoflinehandling,insection2.11,“End-of-linehandling”,
ofXML1.0andXML1.1.SowhichshouldbeusedforSQL?
© ISO/IEC 2021 – All rights reserved 7

---------------------- Page: 14 ----------------------
ISO/IEC 19075-1:2021(E)
4.7 Line terminators
A first stop in answering this is to look at the definition of in ISO/IEC 9075-14, which
requires XML 1.0 as a basic level of support, and permits XML 1.1 support in the form of Feature X211,
“XML 1.1 support”. So, character string can be normalized according to either XML 1.0 or XML 1.1 as an
implementation-defined choice, or perhaps via a conformance feature.
However, some of the line terminators, even in XML 1.0, are two-character sequences. XML normalizes
its input, which means that such two-character sequences are converted to a single character. This changes
the relative position of every subsequent character, which would cause unexpected results for POSI-
TION_REGEX.
Our solution is to look to Unicode Technical Standard #18, a Unicode standard containing guidelines for
regular expression processors. This provides a referenceable definition of line terminator that does not
require normalizing the subject character string.
4.8 Bracket expressions
4.8.1 Introduction to bracket expressions
So far, this Clause has shown how to match a specific character or any character from certain predefined
sets of characters. Using bracket expressions, applications can specify user-defined group of characters.
(XML Schema and XQuery call these character class expressions, but the term bracket expression is in
common use.)
A bracket expression is begun by a left bracket “[” and terminated by a right bracket “]”. Bracket
expressions have a different list of special characters, namely
^ [ ] \
Forclarity,thesearecalledspecialcharacters,incontrasttothemetacharacterslistedearlier.
4.8.2 Listingcharacters
Ifabracketexpressiondoesnotcontainanyofthespecialcharacters,thenthebracketexpressionmatches
anysinglecharacterthatislistedbetweenthebrackets.Forexample,
S LIKE_REGEX '[abc]'
matchesanyofthefollowing:
'say'
'boy'
'lack'
Allbackslashescapesequencesareavailableforusewithinabracketexpression.Forexample,thefollowing
expressionmatcheseitheracaretorabackslash:
S LIKE_REGEX '[\^\\]'
Thefollowingexpressionmatchesalllettersordigits:
S LIKE_REGEX '[\p{L}\p{Nd}]'
where“\p{L}”istheescapematchinganyletterand“\p{Nd}”istheescapematchinganydigit.
8 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 15 ----------------------
ISO/IEC 19075-1:2021(E)
4.8 Bracket expressions
4.8.3 Matchingarange
Aminussign“-”isusedtospecifyacharacterrange.Forexample:
S LIKE_REGEX '[sa-my]'
matchesthelowercaseletters“s”,allthelettersbetween“a”and“m”inclusive,and“y”.Rangesaredefined
intermsoftheUCScodepointordering.Whentherearemultipleranges,thebracketexpressionmatches
theunionoftheranges.Forexample:
S LIKE_REGEX '[a-me-z]'
matchesalllowercaseletters.
Usingaspecialcharacterinarangeissometimespermitted,buttricky.Ratherthanpresenttherules
here,oneshoulduseabackslashescapeifthestartorendpointofarangeisbeaspecialcharacter.
4.8.4 Negation
Acaret“^”isaspecialcharacterwhenitisthefirstcharacterofabracketexpression,whereitindicates
thatthesetofcharactersisanythingnotlistedbythefollowingbracketexpression.Forexample:
S LIKE_REGEX '[^aj-m]'
isTrueifScontainsanycharacterthatisnot“a”,“j”,“k”,“l”,or“m”.
4.8.5 Characterclasssubtraction
Abracketexpressionmayconcludewithaminussign“-”followedbyanestedbracketexpression.This
iscalledacharacterclasssubtraction,andindicatesthatanycharactermatchedbythenestedbracket
expressionistoberemovedfromthesetofcharactersthatmightbeamatch.Forexample:
S LIKE_REGEX '[a-z-[m-p]]'
matchesanythingbetween“a”and“z”,exceptforthelettersbetween“m”and“p”,inclusive.Thisexample
isequivalentto:
S LIKE_REGEX '[a-lq-z]'
Characterclasssubtractionscanbenestedindefinitely(althoughimplementation-definedlimitations
mayapply).
4.9 Alternation
Achoiceofregularexpressionsisspecifiedthroughtheuseofaverticalbar:“|”.Forexample:
S LIKE_REGEX 'a|b'
isTrueifScontainseitheran“a”ora“b”.
Alternationhaslowerprecedencethanconcatenation.Thus
© ISO/IEC 2021 – All rights reserved 9

---------------------- Page: 16 ----------------------
ISO/IEC 19075-1:2021(E)
4.9 Alternation
S LIKE_REGEX 'ab|xyz'
isTrueifScontainseither“ab”or“xyz”.Thisprecedencecanbeoverriddenthroughtheuseofparentheses,
suchasthisexample:
S LIKE_REGEX 'a(b|xy)z'
TheprecedingexampleisTrueifScontainseither“abz”or“axyz”.
4.10 Quantifiers
Quantifiersaremetacharactersthatspecifyamatchforsomenumberofrepetitionsofaregularexpression.
Therearetwosetsofquantifiers,thegreedyandthereluctant.Thegreedyquantifiersare:
{n} Exactlynrepetitions.
{n,} normorerepetitions.
{n,m} Betweennandmrepetitions,inclusive.
? 0(zero)or1(one)repetition;equivalentto{0,1}.
* 0(zero)ormorerepetitions;equivalentto{0,}.
+ 1(one)ormorerepetitions;equivalentto{1,}.
Thereluctantquantifiersareformedbysuffixingaquestionmarktoagreedyquantifier.Thus,“*?”isthe
reluctantformof“*”,and“??”isthereluctantformof“?”.Thegreedyquantifierstrytomatchasmuch
aspossible,whereasthereluctantquantifierstrytomatchaslittleaspossible(whilestillallowingthe
overallregularexpressiontomatch).Thereisnodifferenceinbehaviorbetweenthegreedyandreluctant
quantifiersforLIKE_REGEX.ThisdifferencefortheotheroperatorsisdiscussedinsubsequentSubclauses.
Example:
S LIKE_REGEX 'a{3}'
isequivalentto
S LIKE_REGEX 'aaa'
andmatchesanystringcontainingatleastthreeconsecutiveinstancesof“a”.NotethatifScontainsmore
thanthreeconsecutiveinstancesof“a”,itstillmatches;totestwhetherScontainsasubstringofthree
consecutiveinstancesof“a”andnomoreisalotharder,sincetheexpressionalsohastorequiresomething
otherthanan“a”atbothendsofthesubstring.
Anotherexample:
S LIKE_REGEX 'ab+c'
isequivalentto
S LIKE_REGEX 'ab{1,}c'
andmatchesanystringthatcontainsasubstringconsistingofan“a”,oneormore“b”s,andthena“c”.
10 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 17 ----------------------
ISO/IEC 19075-1:2021(E)
4.11 Locating a match
4.11 Locatingamatch
LIKE_REGEXonlycareswhetheramatchexists;theotheroperatorscareaboutwhereamatchislocated
inthestring.Considertheregularexpression“a+”andthestring“a1aa2aaa3”.Therearetenpossible
matchesfor“a+”,indicatedbytheunderliningonthefollowinglines:
'a1aa2aaa3' -- position 1, length 1
'a1aa2aaa3' -- position 3, length 1
'a1aa2aaa3' -- position 3, length 2
'a1aa2aaa3' -- position 4, length 1
'a1aa2aaa3' -- position 6, length 1
'a1aa2aaa3' -- position 6, length 2
'a1aa2aaa3' -- position 6, length 3
'a1aa2aaa3' -- position 7, length 1
'a1aa2aaa3' -- position 7, length 2
'a1aa2aaa3' -- position 8, length 1
Noticethatsomeofthematchesaresubstringsofothermatches.TherulesofXQueryregularexpressions
aredesignedtoignorecertainmatches,sothattherecognizedmatchesaremutuallydisjoint.Obviously
therearemanywaystodothis,sotherulesprovideprioritiesindeterminingtherecognizedmatches.
Therearethreepriorities:
1) Thetoppriorityistofindamatchasearlyinthestringaspossible.Thisiscommonlycalledtheleftmost
rule.
2) Thesecondpriorityistofindthefirstalternativeofanalternation,ifpossible.Theredoesnotappear
tobeacommonnameforthisrule.
3) Thelastpriorityistofindthelongestpossiblematchforgreedyquantifiers,andtheshortestmatch
forreluctantquantifiers.Inthecaseofgreedyquantifiers,thisiscommonlycalledthelongestrule;
theredoesnotappeartobeacommonnamefortheruleregardingreluctantquantifiers.
[Historicalnote:POSIXonlyhasaleftmostlongestrule.Therewerenoreluctantquantifiers,andthe
priorityformatchingalternationswasthelongestmatchratherthanthefirstalternative.]
TheserulesareillustratedbyexamplesinTable1,“Matchpriorities”.
Table1—Matchpriorities
Subject regular match(es) priority
string expression underlined
baaaaaa ba|a* baaaaaa firstalternative;
baaaaaa secondmatchalwaysstartsafterthefirst
match(eventhoughaaaaaawouldbelonger)
ab a|ab ab firstalternative(ratherthanmatchingab)
abcabbabc ab* abcabbabc leftmost
abcabbabc longest(greedyquantifierconsumestwo'b's)
abcabbabc longest
abcabbabc ab*? abcabbabc shortest(noneedtomatch'b')
abcabbabc shortest
abcabbabc shortest
© ISO/IEC 2021 – All rights reserved 11

---------------------- Page: 18 ----------------------
ISO/IEC 19075-1:2021(E)
4.12 Capture and back-reference
4.12 Capture and back-reference
A parenthesized sub-expression is a portion of a regular expression that is enclosed in parentheses.
Parenthesized sub-expressions are numbered in order of their left parenthesis. For example, in the reg-
ular expression
((a)|(b))
therearethreesub-expressions:
1) ((a)|(b))
2) (a)
3) (b)
Asub-expressioncanbereferencedlaterinaregularexpressionusingaback-reference,takingtheform
ofabackslashfollowedbyoneormoredigits.Thusthethreesub-expressionsintheexamplecanbe
referencedas“\1”,“\2”,and“\3”.Forexample,considertheregularexpression:
\p{Z}(\p{L}*)\p{Z}*\1\p{Z}
Thefirstandonlyparenthesizedsub-expression(“\p{L}*”)matchesanysequenceoflettersthatis
boundedbysomekindofspacecharacter(“\p{Z}”)beforeandafterthesequenceofletters.Theback-
reference(“\1”)matcheswhateversequenceofletterswascapturedbythefirstsub-expression.This
regularexpressionmightbeusedtosearchforoccurrencesofarepeatedword(perhapscausedbyacut-
and-pasteerror).Hereisanexampleofasubjectstring,withunderliningtoindicatethematchforthe
entireregularexpression:
Hello Dolly you’re looking looking swell
Whenaback-referencereferencesaparenthesizedgroupwithaquantifier,thentheback-reference
matchesthelastiterationofthequantifiedsub-expression.Forexample,considertheregularexpression:
'(ab*)*c*\1'
andthesubjectstring:
'abbbabbabcabbbbb'
Thematchesto“(ab*)”areshownbyunderlining:
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'
Thesethreeiterationsof“(ab*)”arematchedby“(ab*)*”andthenthe“c”ismatched.Next,amatchfor
“\1”isneeded.Thelastmatchforthefirstparenthesizedsub-expressionis“ab”,sotheoverallmatchis
indicatebyunderlining:
'abbbabbabcabbbbb'
Intheeventthatasub-expressionisunmatched,aback-referencetoitmatchesthezero-lengthstring.
Forexample,considertheregularexpression:
'((a*)|(b*))c??\3'
andthesubjectstring
...

INTERNATIONAL ISO/IEC
STANDARD 19075-1
First edition
Information technology — Guidance
for the use of database language
SQL —
Part 1:
XQuery regular expressions
PROOF/ÉPREUVE
Reference number
ISO/IEC 19075-1:2021(E)
©
ISO/IEC 2021

---------------------- Page: 1 ----------------------
ISO/IEC 19075-1:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2021 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 19075-1:2021(E)
Contents Page
Foreword.v
Introduction.vii
1 Scope.1
2 Normativereferences.2
3 Termsanddefinitions.3
4 XQueryregularexpressions.4
4.1 ContextofXQueryregularexpressions.4
4.2 IntroductiontoXQueryregularexpressions.4
4.3 Matchingaspecificcharacter.4
4.4 Metacharactersandescapesequences.5
4.5 Dot.6
4.6 Anchors.7
4.7 Lineterminators.7
4.8 Bracket expressions.8
4.8.1 Introductiontobracketexpressions.8
4.8.2 Listing characters.8
4.8.3 Matchingarange.9
4.8.4 Negation.9
4.8.5 Characterclasssubtraction.9
4.9 Alternation.9
4.10 Quantifiers.10
4.11 Locatingamatch.11
4.12 Captureandback-reference.12
4.13 Precedence.13
4.14 Modes.13
5 Operatorsusingregularexpressions.15
5.1 Introductiontooperatorsusingregularexpressions.15
5.2 LIKE_REGEX.15
5.3 OCCURRENCES_REGEX.16
5.4 POSITION_REGEX.17
5.5 SUBSTRING_REGEX.19
5.6 TRANSLATE_REGEX.20
Bibliography.23
Index.24
© ISO/IEC 2021 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 19075-1:2021(E)
Tables
Table Page
1 Match priorities.11
iv © ISO/IEC 2021 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 19075-1:2021(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of document should be noted. This document was drafted in accordance with the editorial
rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details
of any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent
declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the World
Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 32, Data management and interchange.
This first edition of ISO/IEC 19075-1 cancels and replaces ISO/IEC TR 19075-1:2011.
This document is intended to be used in conjunction with the following editions of the parts of the
ISO/IEC 9075 series:
— ISO/IEC 9075-1, sixth edition or later;
— ISO/IEC 9075-2, sixth edition or later;
— ISO/IEC 9075-3, sixth edition or later;
— ISO/IEC 9075-4, seventh edition or later;
— ISO/IEC 9075-9, fifth edition or later;
— ISO/IEC 9075-10, fifth edition or later;
— ISO/IEC 9075-11, fifth edition or later;
— ISO/IEC 9075-13, fifth edition or later;
v
© ISO/IEC 2021 – All rights reserved

---------------------- Page: 5 ----------------------
ISO/IEC 19075-1:2021(E)
— ISO/IEC 9075-14, sixth edition or later;
— ISO/IEC 9075-15, second edition or later;
— ISO/IEC 9075-16, first edition or later.
A list of all parts in the ISO/IEC 19075 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-
committees.
© ISO 2021 – All rights reserved
vi

---------------------- Page: 6 ----------------------
ISO/IEC 19075-1:2021(E)
Introduction
Theorganizationofthisdocumentisasfollows:
1) Clause1,“Scope”,specifiesthescopeofthisdocument.
2) Clause2,“Normativereferences”,identifiesadditionalstandardsthat,throughreferenceinthis
document,constituteprovisionsofthisdocument.
3) Clause3,“Termsanddefinitions”,definesthetermsanddefinitionsusedinthisdocument.
4) Clause4,“XQueryregularexpressions”,explainshowXQueryregularexpressionsareformed.
5) Clause5,“Operatorsusingregularexpressions”,explainshowtheSQLoperatorsuseregular
expressions.
© ISO/IEC 2021 – All rights reserved vii

---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19075-1:2021(E)
Informationtechnology—GuidancefortheuseofdatabaselanguageSQL—
Part1:
XQueryregularexpressions
1 Scope
ThisdocumentdescribestheregularexpressionsupportinSQL(ISO/IEC9075-2)adoptedfromthe
regularexpressionsyntaxofXQueryandXPathFunctionsandOperators3.1,whichisderivedfromPerl.
Thisdocumentdiscussesfiveoperatorsusingthisregularexpressionsyntax:
— LIKE_REGEXpredicate,todeterminetheexistenceofamatchtoaregularexpression.
— OCCURRENCES_REGEXnumericfunction,todeterminethenumberofmatchestoaregularexpression.
— POSITION_REGEXfunction,todeterminethepositionofamatch.
— SUBSTRING_REGEXfunction,toextractasubstringmatchingaregularexpression.
— TRANSLATE_REGEXfunction,toperformreplacementsusingaregularexpression.
© ISO/IEC 2021 – All rights reserved 1

---------------------- Page: 8 ----------------------
ISO/IEC 19075-1:2021(E)
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content consti-
tutes requirements of this document. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC9075-1,Informationtechnology—Databaselanguages—SQL—Part1:Framework
(SQL/Framework)
ISO/IEC9075-2,Informationtechnology—Databaselanguages—SQL—Part2:Foundation
(SQL/Foundation)
ISO/IEC9075-14,Informationtechnology—Databaselanguages—SQL—Part14:XML-Related
Specifications(SQL/XML)
Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.0,W3CRecommendation[online].Fifth
Edition.Cambridge,Massachusetts,USA:W3C,26November2008.Availableat
http://www.w3.org/TR/xml
Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.1,W3CRecommendation[online].Second
Edition.Cambridge,Massachusetts,USA:W3C,16August2006.Availableat
http://www.w3.org/TR/xml11
Biron,PaulV.;Malhotra,Ashok.XMLSchemaPart2:Datatypes,W3CRecommendation[online].Second
Edition.Cambridge,Massachusetts,USA:W3C,28October2004.Availableat
http://www.w3.org/TR/xmlschema-2/
Malhotra,Ashoketal.XQueryandXPathFunctionsandOperators3,1,W3CRecommendation[online].
Cambridge,Massachusetts,USA:W3C,21March2017.Availableathttp://www.w3.org/TR/-
xpath-functions/
TheUnicodeConsortium.UnicodeRegularExpressions[online].21.MountainView,California,USA:
TheUnicodeConsortium,2020-06-17.Availableathttp://www.unicode.org/reports/tr18/-
tr18-21.html
2 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 19075-1:2021(E)
3 Termsanddefinitions
Forthepurposesofthisdocument,thetermsanddefinitionsgiveninISO/IEC9075-1apply.
ISOandIECmaintainterminologicaldatabasesforuseinstandardizationatthefollowingaddresses:
— IECElectropedia:availableathttp://www.electropedia.org/
— ISOOnlinebrowsingplatform:availableathttp://www.iso.org/obp
© ISO/IEC 2021 – All rights reserved 3

---------------------- Page: 10 ----------------------
ISO/IEC 19075-1:2021(E)
4 XQueryregularexpressions
4.1 ContextofXQueryregularexpressions
TherequirementsforthematerialdiscussedinthisdocumentshallbeasspecifiedinISO/IEC9075-2,
ISO/IEC9075-14,XML1.0,XML1.1,XMLSchemaPart2:Datatypes,XQueryandXPathFunctionsand
Operators3.1,andUnicodeTechnicalStandard#18.
4.2 IntroductiontoXQueryregularexpressions
ThisdocumentexplainsthemannerinwhichXQueryregularexpressionsareusedbydatabaselanguage
SQLinISO/IEC9075-2andinISO/IEC9075-14.BothISO/IEC9075-2andISO/IEC9075-14specify
requirementsforthematerialdiscussedinthisdocument.
XQueryregularexpressionsyntaxisspecifiedinXQueryandXPathFunctionsandOperators3.1,section
5.6.1,“Regularexpressionsyntax”.ThispaperreferencestheXQueryspecification,withtwosmallmodi-
fications(requiredsincecharacterstringsinanRDBMSarenotnecessarilynormalizedaccordingtoXML
conventions).Thefollowingsubsectionsprovideanoverviewofthissyntax.
TheXQueryregularexpressionsyntaxisitselfamodificationofanotherregularexpressionsyntaxfound
inXMLSchemaPart2:Datatypes.
ThissectionpresentsanoverviewofthecapabilitiesofXQueryregularexpressionsyntax.Intheprocess,
thissectionwillillustratesomeoftheSQLoperators.TheSQLoperatorsthemselvesarepresentedin
Clause5,“Operatorsusingregularexpressions”.
ThefollowingdiscussiondoesnotcovereveryaspectofXQueryregularexpressions;forthis,XQuery
andXPathFunctionsandOperators3.1isthereference(thoughhardlyatutorial;avarietyofpopular
workscontaindetailedtreatmentsofregularexpressions).
4.3 Matchingaspecificcharacter
Perhapsthemostelementarypatternmatchingrequirementistheabilitytomatchasinglecharacteror
string.Formostcharacters,thisisdonebysimplywritingthecharacterintheregularexpression.For
example,supposeanapplicationneedstoknowifastringScontainstheletters“xyz”.Thiscouldbedone
withthefollowingpredicate:
S LIKE_REGEX 'xyz'
NotethattheSQLLIKEpredicatewouldrequireanexactmatchfor“xyz”.However,theconventionwith
regularexpressionsisthatSneedonlycontainasubstringthatis“xyz”.Forexample,allofthefollowing
valuesofSwouldyieldTruefortheimmediatelyprecedingpredicate:
xyz
abcxyz123
1 xyz 2 xyz 3 xyz
4 © ISO/IEC 2021– All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 19075-1:2021(E)
4.3 Matching a specific character
Notethatinthelastexample,thereareactuallythreeoccurrencesoftheregularexpression“xyz”within
thetestedvalue.Theusermaywishtoknowthenumberofoccurrencesofamatch.Thiscanbedone
withOCCURRENCES_REGEX.Forexample:
OCCURRENCES_REGEX ('xyz' IN '1 xyz 2 xyz 3 xyz') = 3
Theusermightalsowishtoknowthepositionofaspecificmatch.ThiscanbedoneusingPOSITION_REGEX.
Forexample,tolearnthestartingcharacterpositionofthesecondoccurrence,
POSITION_REGEX ( 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 9
Itisalsopossibletoaskforthecharacterpositionofthefirstcharacterafterthematch.Forexample:
POSITION_REGEX ( AFTER 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 12
IfAFTERisusedandthelastcharacterofthesubjectstringisconsumed,thentheresultisthelengthof
thestringplus1(one):
POSITION_REGEX ( AFTER 'xyz' IN 'xyz' ) = 4
4.4 Metacharactersandescapesequences
Asmentioned,mostcharacterscanbematchedbysimplywritingthecharacterintheregularexpression.
However,certaincharactersarereservedasmetacharacters.Thecompletelistofmetacharactersis:
. \ ? * + { } ( ) | [ ] ^ $
TheuseofeachofthesemetacharactersisexplainedinsubsequentSubclauses.Tomatchametacharacter,
thenanescapesequence,consistingofabackslash(“\”)followedbythemetacharacterisused.For
example,thisexpressiontestswhetherastringcontainsadollarsign:
S LIKE_REGEX '\$'
Inparticular,theescapesequencerepresentingabackslashistwoconsecutivebackslashes.Thereare
variousotherdefinedescapesequences,matchingeitherasinglecharacter,oranyofagroupofcharacters.
Thesinglecharacterescapesequencesare:
\n newline(U+000A)
\r return(U+000D)
\t tab(U+0009)
\- minussign('-')
Theso-calledcategoryescapesareexemplifiedby“\p{L}”or“\p{Lu}”.Acategoryescapebeginswith
“\p{”followedbyoneuppercaseletter,optionallyalowercaseletter,andthentheclosingbrace.Inthese
example,“\p{L}”matchesanyletter(asdefinedbyUnicode)and“\p{Lu}”matchesanyuppercaseletter.
Someinterestingcategoryescapesare:
\p{L} Anyletter.
\p{Lu} Anyuppercaseletter.
\p{Ll} Anylowercaseletter.
\p{Nd} Anydecimaldigit.
\p{P} Anypunctuationmark.
© ISO/IEC 2021 – All rights reserved 5

---------------------- Page: 12 ----------------------
ISO/IEC 19075-1:2021(E)
4.4 Metacharacters and escape sequences
\p{Z} Any separator (space, line, paragraph, etc.).
The complete list of category escapes is found in XML Schema Part 2: Datatypes, section F.1.1, “Character
class escapes”.
There are also complementary category escapes, which are exemplified by “\P{L}” or “\P{Lu}”. A comple-
mentary category escape matches any character that would not be matched by the corresponding category
escape. The difference is that the (positive) character escape is written with a lowercase “p” whereas the
complementary character escape is written with an uppercase “P”.
The so-called block escapes match any character in a block of Unicode, that is, a predefined consecutive
range of code points. For example, “\p{IsBasicLatin}” matches the ASCII character set. There are also
complementary block escapes, such as “\P{IsBasicLatin}”, which matches any single character that is
not an ASCII character.
Finally, there are the following multi-character escape sequences:
\s AsdefinedbyXMLSchemaPart2:Datatypes,thisescapematchesspace(U+0020),tab
(U+0009),newline(U+000A),orreturn(U+000D).SincecharacterstringsinanRDBMShave
notundergoneXMLlineterminationnormalization,theescapeisbroadenedtoincludeany
characterortwo-charactersequencethatisrecognizedbyUnicodeTechnicalStandard#18
asalineterminator.Subclause4.7,“Lineterminators”,discussesthisissuefurther.
\S Anysinglecharacternotmatchedby\s.
\i Underscore(“_”),colon(“:”)orletter(thisisalotmorethanjusttheLatinletters;seeXML
1.0appendixB,rule[84]).
\I Anysinglecharacternotmatchedby\i.
\c AnysinglecharactermatchedbyNameChar,asdefinedinXML1.0section2.3,rule[4a].
\C Anysinglecharacternotmatchedby\c.
\d Anysingledigit
\D Anysinglecharacternotmatchedby\d.
\w AnysingleUnicodecharacterexceptthoseclassifiedas“punctuation”,“separator”,or“other”.
\W Thecomplementof\w.
4.5 Dot
Dot(period,“.”)isametacharacterthatisusedtomatchanysinglecharacter(thesamebehavioras“_”
inLIKEpredicates),oranysinglecharacterthatisnotalineterminator.Thedefaultistomatchanything
exceptalineterminator.Thealternative,calleddot-allmode,isspecifiedusingaflagthatcontainsa
lowercase“s”.
Forexample
S LIKE_REGEX 'a.b'
matchesthefollowing:
'xa0by'
butnotthefollowing:
6 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC 19075-1:2021(E)
4.5 Dot
'xa
by'
becausethecharacterbetweenthe“a”andthe“b”isalineterminator.However,usingdot-allmodelike
this:
S LIKE_REGEX 'a.b' FLAG 's'
wouldmatchbothexamples.
4.6 Anchors
AsdiscussedinthisClause,regularexpressionslookforamatchanywherewithinastring,without
needingtomatchtheentirestring.Buthowcananapplicationrequireamatchoftheentirestring?For
this,theapplicationusesanchors.Theanchorsarethemetacharacters“^”forthestartofastring(or
line),and“$”fortheendofastring(orline).Forexample:
S LIKE_REGEX '^xyz$'
canonlymatchastringthatisprecisely'xyz'.
Anchorsmaybeusedseparatelytorequirea“beginswith”or“endwith”match.Forexample
S LIKE_REGEX '^xyz'
matchesanystringthatbeginswith“xyz”,and
S LIKE_REGEX 'xyz$'
matchesanystringthatendswith“xyz”.
Insteadofmatchingthebeginorendofthestring,theanchorsmaybeusedtoanchoramatchtothe
beginorendofaline,byperformingthematchinmulti-linemode.Multi-linemodeisspecifiedusinga
flagcontainingalowercase“m”.Forexample:
S LIKE_REGEX '^xyz' FLAG 'm'
performsananchoredsearchinmulti-linemode,matchinganystringcontainingalinethatbeginswith
“xyz”.Theimmediatelyprecedingexamplewouldmatchthefollowingstring:
'line one
xyz
line three'
4.7 Lineterminators
Themetacharacters“.”,“^”,and“$”andthemulti-characterescapesequences“\s”and“\S”aredefined
intermsofa“lineterminator”.Whatcountsasalineterminator?XQueryandXPathFunctionsand
Operators3.1onlyrecognizesalinefeed(U+000A)asalineterminator.Thisdefinitionworkswellfor
XQuery,becauseXMLnormalizesthelineterminatorsonvariousplatformstoalinefeed.
AcloserlookshowsthatXMLhastwodefinitionsoflinehandling,insection2.11,“End-of-linehandling”,
ofXML1.0andXML1.1.SowhichshouldbeusedforSQL?
© ISO/IEC 2021 – All rights reserved 7

---------------------- Page: 14 ----------------------
ISO/IEC 19075-1:2021(E)
4.7 Line terminators
A first stop in answering this is to look at the definition of in ISO/IEC 9075-14, which
requires XML 1.0 as a basic level of support, and permits XML 1.1 support in the form of Feature X211,
“XML 1.1 support”. So, character string can be normalized according to either XML 1.0 or XML 1.1 as an
implementation-defined choice, or perhaps via a conformance feature.
However, some of the line terminators, even in XML 1.0, are two-character sequences. XML normalizes
its input, which means that such two-character sequences are converted to a single character. This changes
the relative position of every subsequent character, which would cause unexpected results for POSI-
TION_REGEX.
Our solution is to look to Unicode Technical Standard #18, a Unicode standard containing guidelines for
regular expression processors. This provides a referenceable definition of line terminator that does not
require normalizing the subject character string.
4.8 Bracket expressions
4.8.1 Introduction to bracket expressions
So far, this Clause has shown how to match a specific character or any character from certain predefined
sets of characters. Using bracket expressions, applications can specify user-defined group of characters.
(XML Schema and XQuery call these character class expressions, but the term bracket expression is in
common use.)
A bracket expression is begun by a left bracket “[” and terminated by a right bracket “]”. Bracket
expressions have a different list of special characters, namely
^ [ ] \
Forclarity,thesearecalledspecialcharacters,incontrasttothemetacharacterslistedearlier.
4.8.2 Listingcharacters
Ifabracketexpressiondoesnotcontainanyofthespecialcharacters,thenthebracketexpressionmatches
anysinglecharacterthatislistedbetweenthebrackets.Forexample,
S LIKE_REGEX '[abc]'
matchesanyofthefollowing:
'say'
'boy'
'lack'
Allbackslashescapesequencesareavailableforusewithinabracketexpression.Forexample,thefollowing
expressionmatcheseitheracaretorabackslash:
S LIKE_REGEX '[\^\\]'
Thefollowingexpressionmatchesalllettersordigits:
S LIKE_REGEX '[\p{L}\p{Nd}]'
where“\p{L}”istheescapematchinganyletterand“\p{Nd}”istheescapematchinganydigit.
8 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 15 ----------------------
ISO/IEC 19075-1:2021(E)
4.8 Bracket expressions
4.8.3 Matchingarange
Aminussign“-”isusedtospecifyacharacterrange.Forexample:
S LIKE_REGEX '[sa-my]'
matchesthelowercaseletters“s”,allthelettersbetween“a”and“m”inclusive,and“y”.Rangesaredefined
intermsoftheUCScodepointordering.Whentherearemultipleranges,thebracketexpressionmatches
theunionoftheranges.Forexample:
S LIKE_REGEX '[a-me-z]'
matchesalllowercaseletters.
Usingaspecialcharacterinarangeissometimespermitted,buttricky.Ratherthanpresenttherules
here,oneshoulduseabackslashescapeifthestartorendpointofarangeisbeaspecialcharacter.
4.8.4 Negation
Acaret“^”isaspecialcharacterwhenitisthefirstcharacterofabracketexpression,whereitindicates
thatthesetofcharactersisanythingnotlistedbythefollowingbracketexpression.Forexample:
S LIKE_REGEX '[^aj-m]'
isTrueifScontainsanycharacterthatisnot“a”,“j”,“k”,“l”,or“m”.
4.8.5 Characterclasssubtraction
Abracketexpressionmayconcludewithaminussign“-”followedbyanestedbracketexpression.This
iscalledacharacterclasssubtraction,andindicatesthatanycharactermatchedbythenestedbracket
expressionistoberemovedfromthesetofcharactersthatmightbeamatch.Forexample:
S LIKE_REGEX '[a-z-[m-p]]'
matchesanythingbetween“a”and“z”,exceptforthelettersbetween“m”and“p”,inclusive.Thisexample
isequivalentto:
S LIKE_REGEX '[a-lq-z]'
Characterclasssubtractionscanbenestedindefinitely(althoughimplementation-definedlimitations
mayapply).
4.9 Alternation
Achoiceofregularexpressionsisspecifiedthroughtheuseofaverticalbar:“|”.Forexample:
S LIKE_REGEX 'a|b'
isTrueifScontainseitheran“a”ora“b”.
Alternationhaslowerprecedencethanconcatenation.Thus
© ISO/IEC 2021 – All rights reserved 9

---------------------- Page: 16 ----------------------
ISO/IEC 19075-1:2021(E)
4.9 Alternation
S LIKE_REGEX 'ab|xyz'
isTrueifScontainseither“ab”or“xyz”.Thisprecedencecanbeoverriddenthroughtheuseofparentheses,
suchasthisexample:
S LIKE_REGEX 'a(b|xy)z'
TheprecedingexampleisTrueifScontainseither“abz”or“axyz”.
4.10 Quantifiers
Quantifiersaremetacharactersthatspecifyamatchforsomenumberofrepetitionsofaregularexpression.
Therearetwosetsofquantifiers,thegreedyandthereluctant.Thegreedyquantifiersare:
{n} Exactlynrepetitions.
{n,} normorerepetitions.
{n,m} Betweennandmrepetitions,inclusive.
? 0(zero)or1(one)repetition;equivalentto{0,1}.
* 0(zero)ormorerepetitions;equivalentto{0,}.
+ 1(one)ormorerepetitions;equivalentto{1,}.
Thereluctantquantifiersareformedbysuffixingaquestionmarktoagreedyquantifier.Thus,“*?”isthe
reluctantformof“*”,and“??”isthereluctantformof“?”.Thegreedyquantifierstrytomatchasmuch
aspossible,whereasthereluctantquantifierstrytomatchaslittleaspossible(whilestillallowingthe
overallregularexpressiontomatch).Thereisnodifferenceinbehaviorbetweenthegreedyandreluctant
quantifiersforLIKE_REGEX.ThisdifferencefortheotheroperatorsisdiscussedinsubsequentSubclauses.
Example:
S LIKE_REGEX 'a{3}'
isequivalentto
S LIKE_REGEX 'aaa'
andmatchesanystringcontainingatleastthreeconsecutiveinstancesof“a”.NotethatifScontainsmore
thanthreeconsecutiveinstancesof“a”,itstillmatches;totestwhetherScontainsasubstringofthree
consecutiveinstancesof“a”andnomoreisalotharder,sincetheexpressionalsohastorequiresomething
otherthanan“a”atbothendsofthesubstring.
Anotherexample:
S LIKE_REGEX 'ab+c'
isequivalentto
S LIKE_REGEX 'ab{1,}c'
andmatchesanystringthatcontainsasubstringconsistingofan“a”,oneormore“b”s,andthena“c”.
10 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 17 ----------------------
ISO/IEC 19075-1:2021(E)
4.11 Locating a match
4.11 Locatingamatch
LIKE_REGEXonlycareswhetheramatchexists;theotheroperatorscareaboutwhereamatchislocated
inthestring.Considertheregularexpression“a+”andthestring“a1aa2aaa3”.Therearetenpossible
matchesfor“a+”,indicatedbytheunderliningonthefollowinglines:
'a1aa2aaa3' -- position 1, length 1
'a1aa2aaa3' -- position 3, length 1
'a1aa2aaa3' -- position 3, length 2
'a1aa2aaa3' -- position 4, length 1
'a1aa2aaa3' -- position 6, length 1
'a1aa2aaa3' -- position 6, length 2
'a1aa2aaa3' -- position 6, length 3
'a1aa2aaa3' -- position 7, length 1
'a1aa2aaa3' -- position 7, length 2
'a1aa2aaa3' -- position 8, length 1
Noticethatsomeofthematchesaresubstringsofothermatches.TherulesofXQueryregularexpressions
aredesignedtoignorecertainmatches,sothattherecognizedmatchesaremutuallydisjoint.Obviously
therearemanywaystodothis,sotherulesprovideprioritiesindeterminingtherecognizedmatches.
Therearethreepriorities:
1) Thetoppriorityistofindamatchasearlyinthestringaspossible.Thisiscommonlycalledtheleftmost
rule.
2) Thesecondpriorityistofindthefirstalternativeofanalternation,ifpossible.Theredoesnotappear
tobeacommonnameforthisrule.
3) Thelastpriorityistofindthelongestpossiblematchforgreedyquantifiers,andtheshortestmatch
forreluctantquantifiers.Inthecaseofgreedyquantifiers,thisiscommonlycalledthelongestrule;
theredoesnotappeartobeacommonnamefortheruleregardingreluctantquantifiers.
[Historicalnote:POSIXonlyhasaleftmostlongestrule.Therewerenoreluctantquantifiers,andthe
priorityformatchingalternationswasthelongestmatchratherthanthefirstalternative.]
TheserulesareillustratedbyexamplesinTable1,“Matchpriorities”.
Table1—Matchpriorities
Subject regular match(es) priority
string expression underlined
baaaaaa ba|a* baaaaaa firstalternative;
baaaaaa secondmatchalwaysstartsafterthefirst
match(eventhoughaaaaaawouldbelonger)
ab a|ab ab firstalternative(ratherthanmatchingab)
abcabbabc ab* abcabbabc leftmost
abcabbabc longest(greedyquantifierconsumestwo'b's)
abcabbabc longest
abcabbabc ab*? abcabbabc shortest(noneedtomatch'b')
abcabbabc shortest
abcabbabc shortest
© ISO/IEC 2021 – All rights reserved 11

---------------------- Page: 18 ----------------------
ISO/IEC 19075-1:2021(E)
4.12 Capture and back-reference
4.12 Capture and back-reference
A parenthesized sub-expression is a portion of a regular expression that is enclosed in parentheses.
Parenthesized sub-expressions are numbered in order of their left parenthesis. For example, in the reg-
ular expression
((a)|(b))
therearethreesub-expressions:
1) ((a)|(b))
2) (a)
3) (b)
Asub-expressioncanbereferencedlaterinaregularexpressionusingaback-reference,takingtheform
ofabackslashfollowedbyoneormoredigits.Thusthethreesub-expressionsintheexamplecanbe
referencedas“\1”,“\2”,and“\3”.Forexample,considertheregularexpression:
\p{Z}(\p{L}*)\p{Z}*\1\p{Z}
Thefirstandonlyparenthesizedsub-expression(“\p{L}*”)matchesanysequenceoflettersthatis
boundedbysomekindofspacecharacter(“\p{Z}”)beforeandafterthesequenceofletters.Theback-
reference(“\1”)matcheswhateversequenceofletterswascapturedbythefirstsub-expression.This
regularexpressionmightbeusedtosearchforoccurrencesofarepeatedword(perhapscausedbyacut-
and-pasteerror).Hereisanexampleofasubjectstring,withunderliningtoindicatethematchforthe
entireregularexpression:
Hello Dolly you’re looking looking swell
Whenaback-referencereferencesaparenthesizedgroupwithaquantifier,thentheback-reference
matchesthelastiterationofthequantifiedsub-expression.Forexample,considertheregularexpression:
'(ab*)*c*\1'
andthesubjectstring:
'abbbabbabcabbbbb'
Thematchesto“(ab*)”areshownbyunderlining:
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'
'abbbabbabcabbbbb'
Thesethreeiterationsof“(ab*)”arematchedby“(ab*)*”andthenthe“c”ismatched.Next,amatchfor
“\1”isneeded.Thelastmatchforthefirstparenthesizedsub-expressionis“ab”,sotheoverallmatchis
indicatebyunderlining:
'abbbabbabcabbbbb'
Intheeventthatasub-expressionisunmatched,aback-referencetoitmatchesthezero-lengthstring.
Forexample,considertheregularexpression:
'((a*)|(b*))c??\3'
andthesubjectstring:
'xyzaaccb'
12 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 19 ----------------------
ISO/IEC 19075-1:2021(E)
4.12 Capture an
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.