ISO/IEC 19075-1:2021
(Main)Information technology - Guidance for the use of database language SQL - Part 1: XQuery regular expressions
Information technology - Guidance for the use of database language SQL - Part 1: XQuery regular expressions
Technologies de l'information — Recommandations pour l'utilisation du langage de base de données SQL — Partie 1: Expressions régulières de XQuery en SQL
General Information
- Status
- Published
- Publication Date
- 30-Aug-2021
- Technical Committee
- ISO/IEC JTC 1/SC 32 - Data management and interchange
- Drafting Committee
- ISO/IEC JTC 1/SC 32/WG 3 - Database language
- Current Stage
- 6060 - International Standard published
- Start Date
- 31-Aug-2021
- Due Date
- 11-Feb-2024
- Completion Date
- 31-Aug-2021
Relations
- Effective Date
- 13-Aug-2022
- Effective Date
- 23-Apr-2020
Overview
ISO/IEC 19075-1:2021 - Information technology - Guidance for the use of database language SQL - Part 1: XQuery regular expressions - defines how SQL uses the XQuery/XPath regular expression syntax (derived from Perl) inside SQL environments. The standard documents the supported regular-expression constructs, escape sequences, matching rules and modes, and the SQL operators that use this syntax. It is intended as guidance for implementing and using XQuery regular expressions in SQL (ISO/IEC 9075).
Key topics and technical requirements
- XQuery regular expression syntax adopted from XQuery and XPath Functions and Operators 3.1, adapted for SQL string normalization differences.
- Metacharacters and escape sequences: complete list of metacharacters (. \ ? * + { } ( ) | [ ] ^ $) and how to escape them; single-character and multi-character escapes (for newline, tab, etc.).
- Character classes and Unicode support: category escapes such as \p{L}, \p{Lu}, \p{Nd}, block escapes like \p{IsBasicLatin}, and complementary \P{...} forms - consistent with XML Schema Part 2 and Unicode TR#18.
- Dot, anchors, line terminators, and modes: default dot behavior (not matching line terminators), dot-all (flag 's'), multi-line mode (flag 'm'), and the definition of line terminators (noting RDBMS strings may not be XML-normalized).
- Bracket expressions, alternation, quantifiers, precedence, capture/back-references: detailed treatment of pattern grouping, repetition, negation and subtraction in character classes.
- Operators using regular expressions in SQL:
- LIKE_REGEX - predicate to test existence of a match.
- OCCURRENCES_REGEX - numeric function returning count of matches.
- POSITION_REGEX - position of a match (supports AFTER and OCCURRENCE).
- SUBSTRING_REGEX - extract substring matching a pattern.
- TRANSLATE_REGEX - perform replacements using a regular expression.
Applications and practical value
- Enables robust pattern matching, search-and-extract, counting occurrences, position searching and regex-based replacements directly in SQL queries.
- Useful for data cleaning, validation, text parsing, log analysis, ETL transformations and SQL/XML processing where Unicode-aware regular expressions are required.
- Helps DBMS vendors implement consistent XQuery-compatible regex behavior and aids developers in writing portable SQL that uses advanced regex features.
Who should use this standard
- Database engine implementers and SQL standards teams.
- SQL developers, data engineers, architects and QA testers working with SQL/XML and Unicode text processing.
- Tool vendors integrating regex capabilities into SQL clients, middleware or ETL tools.
Related standards
- ISO/IEC 9075 (SQL - multiple parts: Foundation, XML-related specs)
- XQuery and XPath Functions and Operators 3.1 (W3C)
- XML 1.0 / 1.1, XML Schema Part 2: Datatypes
- Unicode Technical Standard #18
Keywords: ISO/IEC 19075-1:2021, XQuery regular expressions, SQL regex, LIKE_REGEX, POSITION_REGEX, OCCURRENCES_REGEX, SUBSTRING_REGEX, TRANSLATE_REGEX, SQL/XML, Unicode regular expressions.
Frequently Asked Questions
ISO/IEC 19075-1:2021 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Guidance for the use of database language SQL - Part 1: XQuery regular expressions". This standard covers: Information technology - Guidance for the use of database language SQL - Part 1: XQuery regular expressions
Information technology - Guidance for the use of database language SQL - Part 1: XQuery regular expressions
ISO/IEC 19075-1:2021 is classified under the following ICS (International Classification for Standards) categories: 35.060 - Languages used in information technology. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 19075-1:2021 has the following relationships with other standards: It is inter standard links to ISO/IEC 19075-10:2024, ISO/IEC TR 19075-1:2011. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC 19075-1:2021 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 19075-1
First edition
2021-08
Information technology — Guidance
for the use of database language
SQL —
Part 1:
XQuery regular expressions
Technologies de l'information — Recommandations pour l'utilisation
du langage de base de données SQL —
Partie 1: Expressions régulières de XQuery en SQL
Reference number
©
ISO/IEC 2021
© ISO/IEC 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2021 – All rights reserved
Contents Page
Foreword.v
Introduction.vii
1 Scope.1
2 Normativereferences.2
3 Termsanddefinitions.3
4 XQueryregularexpressions.4
4.1 ContextofXQueryregularexpressions.4
4.2 IntroductiontoXQueryregularexpressions.4
4.3 Matchingaspecificcharacter.4
4.4 Metacharactersandescapesequences.5
4.5 Dot.6
4.6 Anchors.7
4.7 Lineterminators.7
4.8 Bracket expressions.8
4.8.1 Introductiontobracketexpressions.8
4.8.2 Listing characters.8
4.8.3 Matchingarange.9
4.8.4 Negation.9
4.8.5 Characterclasssubtraction.9
4.9 Alternation.9
4.10 Quantifiers.10
4.11 Locatingamatch.11
4.12 Captureandback-reference.12
4.13 Precedence.13
4.14 Modes.13
5 Operatorsusingregularexpressions.15
5.1 Introductiontooperatorsusingregularexpressions.15
5.2 LIKE_REGEX.15
5.3 OCCURRENCES_REGEX.16
5.4 POSITION_REGEX.17
5.5 SUBSTRING_REGEX.19
5.6 TRANSLATE_REGEX.20
Bibliography.23
Index.24
© ISO/IEC 2021 – All rights reserved iii
Tables
Table Page
1 Match priorities.11
iv © ISO/IEC 2021 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of document should be noted. This document was drafted in accordance with the editorial
rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details
of any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent
declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the World
Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 32, Data management and interchange.
This first edition of ISO/IEC 19075-1 cancels and replaces ISO/IEC TR 19075-1:2011.
This document is intended to be used in conjunction with the following editions of the parts of the
ISO/IEC 9075 series:
— ISO/IEC 9075-1, sixth edition or later;
— ISO/IEC 9075-2, sixth edition or later;
— ISO/IEC 9075-3, sixth edition or later;
— ISO/IEC 9075-4, seventh edition or later;
— ISO/IEC 9075-9, fifth edition or later;
— ISO/IEC 9075-10, fifth edition or later;
— ISO/IEC 9075-11, fifth edition or later;
— ISO/IEC 9075-13, fifth edition or later;
v
© ISO/IEC 2021 – All rights reserved
— ISO/IEC 9075-14, sixth edition or later;
— ISO/IEC 9075-15, second edition or later;
— ISO/IEC 9075-16, first edition or later.
A list of all parts in the ISO/IEC 19075 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-
committees.
vi
Introduction
Theorganizationofthisdocumentisasfollows:
1) Clause1,“Scope”,specifiesthescopeofthisdocument.
2) Clause2,“Normativereferences”,identifiesadditionalstandardsthat,throughreferenceinthis
document,constituteprovisionsofthisdocument.
3) Clause3,“Termsanddefinitions”,definesthetermsanddefinitionsusedinthisdocument.
4) Clause4,“XQueryregularexpressions”,explainshowXQueryregularexpressionsareformed.
5) Clause5,“Operatorsusingregularexpressions”,explainshowtheSQLoperatorsuseregular
expressions.
© ISO/IEC 2021 – All rights reserved vii
INTERNATIONAL STANDARD ISO/IEC 19075-1:2021(E)
Informationtechnology—GuidancefortheuseofdatabaselanguageSQL—
Part1:
XQueryregularexpressions
1 Scope
ThisdocumentdescribestheregularexpressionsupportinSQL(ISO/IEC9075-2)adoptedfromthe
regularexpressionsyntaxofXQueryandXPathFunctionsandOperators3.1,whichisderivedfromPerl.
Thisdocumentdiscussesfiveoperatorsusingthisregularexpressionsyntax:
— LIKE_REGEXpredicate,todeterminetheexistenceofamatchtoaregularexpression.
— OCCURRENCES_REGEXnumericfunction,todeterminethenumberofmatchestoaregularexpression.
— POSITION_REGEXfunction,todeterminethepositionofamatch.
— SUBSTRING_REGEXfunction,toextractasubstringmatchingaregularexpression.
— TRANSLATE_REGEXfunction,toperformreplacementsusingaregularexpression.
© ISO/IEC 2021 – All rights reserved 1
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content consti-
tutes requirements of this document. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC9075-1,Informationtechnology—Databaselanguages—SQL—Part1:Framework
(SQL/Framework)
ISO/IEC9075-2,Informationtechnology—Databaselanguages—SQL—Part2:Foundation
(SQL/Foundation)
ISO/IEC9075-14,Informationtechnology—Databaselanguages—SQL—Part14:XML-Related
Specifications(SQL/XML)
Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.0,W3CRecommendation[online].Fifth
Edition.Cambridge,Massachusetts,USA:W3C,26November2008.Availableat
http://www.w3.org/TR/xml
Bray,Timetal.ExtensibleMarkupLanguage(XML)Version1.1,W3CRecommendation[online].Second
Edition.Cambridge,Massachusetts,USA:W3C,16August2006.Availableat
http://www.w3.org/TR/xml11
Biron,PaulV.;Malhotra,Ashok.XMLSchemaPart2:Datatypes,W3CRecommendation[online].Second
Edition.Cambridge,Massachusetts,USA:W3C,28October2004.Availableat
http://www.w3.org/TR/xmlschema-2/
Malhotra,Ashoketal.XQueryandXPathFunctionsandOperators3,1,W3CRecommendation[online].
Cambridge,Massachusetts,USA:W3C,21March2017.Availableathttp://www.w3.org/TR/-
xpath-functions/
TheUnicodeConsortium.UnicodeRegularExpressions[online].21.MountainView,California,USA:
TheUnicodeConsortium,2020-06-17.Availableathttp://www.unicode.org/reports/tr18/-
tr18-21.html
2 © ISO/IEC 2021 – All rights reserved
3 Termsanddefinitions
Forthepurposesofthisdocument,thetermsanddefinitionsgiveninISO/IEC9075-1apply.
ISOandIECmaintainterminologicaldatabasesforuseinstandardizationatthefollowingaddresses:
— IECElectropedia:availableathttp://www.electropedia.org/
— ISOOnlinebrowsingplatform:availableathttp://www.iso.org/obp
© ISO/IEC 2021 – All rights reserved 3
4 XQueryregularexpressions
4.1 ContextofXQueryregularexpressions
TherequirementsforthematerialdiscussedinthisdocumentshallbeasspecifiedinISO/IEC9075-2,
ISO/IEC9075-14,XML1.0,XML1.1,XMLSchemaPart2:Datatypes,XQueryandXPathFunctionsand
Operators3.1,andUnicodeTechnicalStandard#18.
4.2 IntroductiontoXQueryregularexpressions
ThisdocumentexplainsthemannerinwhichXQueryregularexpressionsareusedbydatabaselanguage
SQLinISO/IEC9075-2andinISO/IEC9075-14.BothISO/IEC9075-2andISO/IEC9075-14specify
requirementsforthematerialdiscussedinthisdocument.
XQueryregularexpressionsyntaxisspecifiedinXQueryandXPathFunctionsandOperators3.1,section
5.6.1,“Regularexpressionsyntax”.ThispaperreferencestheXQueryspecification,withtwosmallmodi-
fications(requiredsincecharacterstringsinanRDBMSarenotnecessarilynormalizedaccordingtoXML
conventions).Thefollowingsubsectionsprovideanoverviewofthissyntax.
TheXQueryregularexpressionsyntaxisitselfamodificationofanotherregularexpressionsyntaxfound
inXMLSchemaPart2:Datatypes.
ThissectionpresentsanoverviewofthecapabilitiesofXQueryregularexpressionsyntax.Intheprocess,
thissectionwillillustratesomeoftheSQLoperators.TheSQLoperatorsthemselvesarepresentedin
Clause5,“Operatorsusingregularexpressions”.
ThefollowingdiscussiondoesnotcovereveryaspectofXQueryregularexpressions;forthis,XQuery
andXPathFunctionsandOperators3.1isthereference(thoughhardlyatutorial;avarietyofpopular
workscontaindetailedtreatmentsofregularexpressions).
4.3 Matchingaspecificcharacter
Perhapsthemostelementarypatternmatchingrequirementistheabilitytomatchasinglecharacteror
string.Formostcharacters,thisisdonebysimplywritingthecharacterintheregularexpression.For
example,supposeanapplicationneedstoknowifastringScontainstheletters“xyz”.Thiscouldbedone
withthefollowingpredicate:
S LIKE_REGEX 'xyz'
NotethattheSQLLIKEpredicatewouldrequireanexactmatchfor“xyz”.However,theconventionwith
regularexpressionsisthatSneedonlycontainasubstringthatis“xyz”.Forexample,allofthefollowing
valuesofSwouldyieldTruefortheimmediatelyprecedingpredicate:
xyz
abcxyz123
1 xyz 2 xyz 3 xyz
4 © ISO/IEC 2021– All rights reserved
4.3 Matching a specific character
Notethatinthelastexample,thereareactuallythreeoccurrencesoftheregularexpression“xyz”within
thetestedvalue.Theusermaywishtoknowthenumberofoccurrencesofamatch.Thiscanbedone
withOCCURRENCES_REGEX.Forexample:
OCCURRENCES_REGEX ('xyz' IN '1 xyz 2 xyz 3 xyz') = 3
Theusermightalsowishtoknowthepositionofaspecificmatch.ThiscanbedoneusingPOSITION_REGEX.
Forexample,tolearnthestartingcharacterpositionofthesecondoccurrence,
POSITION_REGEX ( 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 9
Itisalsopossibletoaskforthecharacterpositionofthefirstcharacterafterthematch.Forexample:
POSITION_REGEX ( AFTER 'xyz' IN '1 xyz 2 xyz 3 xyz' OCCURRENCE 2 ) = 12
IfAFTERisusedandthelastcharacterofthesubjectstringisconsumed,thentheresultisthelengthof
thestringplus1(one):
POSITION_REGEX ( AFTER 'xyz' IN 'xyz' ) = 4
4.4 Metacharactersandescapesequences
Asmentioned,mostcharacterscanbematchedbysimplywritingthecharacterintheregularexpression.
However,certaincharactersarereservedasmetacharacters.Thecompletelistofmetacharactersis:
. \ ? * + { } ( ) | [ ] ^ $
TheuseofeachofthesemetacharactersisexplainedinsubsequentSubclauses.Tomatchametacharacter,
thenanescapesequence,consistingofabackslash(“\”)followedbythemetacharacterisused.For
example,thisexpressiontestswhetherastringcontainsadollarsign:
S LIKE_REGEX '\$'
Inparticular,theescapesequencerepresentingabackslashistwoconsecutivebackslashes.Thereare
variousotherdefinedescapesequences,matchingeitherasinglecharacter,oranyofagroupofcharacters.
Thesinglecharacterescapesequencesare:
\n newline(U+000A)
\r return(U+000D)
\t tab(U+0009)
\- minussign('-')
Theso-calledcategoryescapesareexemplifiedby“\p{L}”or“\p{Lu}”.Acategoryescapebeginswith
“\p{”followedbyoneuppercaseletter,optionallyalowercaseletter,andthentheclosingbrace.Inthese
example,“\p{L}”matchesanyletter(asdefinedbyUnicode)and“\p{Lu}”matchesanyuppercaseletter.
Someinterestingcategoryescapesare:
\p{L} Anyletter.
\p{Lu} Anyuppercaseletter.
\p{Ll} Anylowercaseletter.
\p{Nd} Anydecimaldigit.
\p{P} Anypunctuationmark.
© ISO/IEC 2021 – All rights reserved 5
4.4 Metacharacters and escape sequences
\p{Z} Any separator (space, line, paragraph, etc.).
The complete list of category escapes is found in XML Schema Part 2: Datatypes, section F.1.1, “Character
class escapes”.
There are also complementary category escapes, which are exemplified by “\P{L}” or “\P{Lu}”. A comple-
mentary category escape matches any character that would not be matched by the corresponding category
escape. The difference is that the (positive) character escape is written with a lowercase “p” whereas the
complementary character escape is written with an uppercase “P”.
The so-called block escapes match any character in a block of Unicode, that is, a predefined consecutive
range of code points. For example, “\p{IsBasicLatin}” matches the ASCII character set. There are also
complementary block escapes, such as “\P{IsBasicLatin}”, which matches any single character that is
not an ASCII character.
Finally, there are the following multi-character escape sequences:
\s AsdefinedbyXMLSchemaPart2:Datatypes,thisescapematchesspace(U+0020),tab
(U+0009),newline(U+000A),orreturn(U+000D).SincecharacterstringsinanRDBMShave
notundergoneXMLlineterminationnormalization,theescapeisbroadenedtoincludeany
characterortwo-charactersequencethatisrecognizedbyUnicodeTechnicalStandard#18
asalineterminator.Subclause4.7,“Lineterminators”,discussesthisissuefurther.
\S Anysinglecharacternotmatchedby\s.
\i Underscore(“_”),colon(“:”)orletter(thisisalotmorethanjusttheLatinletters;seeXML
1.0appendixB,rule[84]).
\I Anysinglecharacternotmatchedby\i.
\c AnysinglecharactermatchedbyNameChar,asdefinedinXML1.0section2.3,rule[4a].
\C Anysinglecharacternotmatchedby\c.
\d Anysingledigit
\D Anysinglecharacternotmatchedby\d.
\w AnysingleUnicodecharacterexceptthoseclassifiedas“punctuation”,“separator”,or“other”.
\W Thecomplementof\w.
4.5 Dot
Dot(period,“.”)isametacharacterthatisusedtomatchanysinglecharacter(thesamebehavioras“_”
inLIKEpredicates),oranysinglecharacterthatisnotalineterminator.Thedefaultistomatchanything
exceptalineterminator.Thealternative,calleddot-allmode,isspecifiedusingaflagthatcontainsa
lowercase“s”.
Forexample
S LIKE_REGEX 'a.b'
matchesthefollowing:
'xa0by'
butnotthefollowing:
6 © ISO/IEC 2021 – All rights reserved
4.5 Dot
'xa
by'
becausethecharacterbetweenthe“a”andthe“b”isalineterminator.However,usingdot-allmodelike
this:
S LIKE_REGEX 'a.b' FLAG 's'
wouldmatchbothexamples.
4.6 Anchors
AsdiscussedinthisClause,regularexpressionslookforamatchanywherewithinastring,without
needingtomatchtheentirestring.Buthowcananapplicationrequireamatchoftheentirestring?For
this,theapplicationusesanchors.Theanchorsarethemetacharacters“^”forthestartofastring(or
line),and“$”fortheendofastring(orline).Forexample:
S LIKE_REGEX '^xyz$'
canonlymatchastringthatisprecisely'xyz'.
Anchorsmaybeusedseparatelytorequirea“beginswith”or“endwith”match.Forexample
S LIKE_REGEX '^xyz'
matchesanystringthatbeginswith“xyz”,and
S LIKE_REGEX 'xyz$'
matchesanystringthatendswith“xyz”.
Insteadofmatchingthebeginorendofthestring,theanchorsmaybeusedtoanchoramatchtothe
beginorendofaline,byperformingthematchinmulti-linemode.Multi-linemodeisspecifiedusinga
flagcontainingalowercase“m”.Forexample:
S LIKE_REGEX '^xyz' FLAG 'm'
performsananchoredsearchinmulti-linemode,matchinganystringcontainingalinethatbeginswith
“xyz”.Theimmediatelyprecedingexamplewouldmatchthefollowingstring:
'line one
xyz
line three'
4.7 Lineterminators
Themetacharacters“.”,“^”,and“$”andthemulti-characterescapesequences“\s”and“\S”aredefined
intermsofa“lineterminator”.Whatcountsasalineterminator?XQueryandXPathFunctionsand
Operators3.1onlyrecognizesalinefeed(U+000A)asalineterminator.Thisdefinitionworkswellfor
XQuery,becauseXMLnormalizesthelineterminatorsonvariousplatformstoalinefeed.
AcloserlookshowsthatXMLhastwodefinitionsoflinehandling,insection2.11,“End-of-linehandling”,
ofXML1.0andXML1.1.SowhichshouldbeusedforSQL?
© ISO/IEC 2021 – All rights reserved 7
4.7 Line terminators
A first stop in answering this is to look at the definition of in ISO/IEC 9075-14, which
requires XML 1.0 as a basic level of support, and permits XML 1.1 support in the form of Feature X211,
“XML 1.1 support”. So, character string can be normalized according to either XML 1.0 or XML 1.1 as an
implementation-defined choice, or perhaps via a conformance feature.
However, some of the line terminators, even in XML 1.0, are two-character sequences. XML normalizes
its input, which means that such two-character sequences are converted to a single character. This changes
the relative position of every subsequent character, which would cause unexpected results for POSI-
TION_REGEX.
Our solution is to look to Unicode Technical Standard #18, a Unicode standard containing guidelines for
regular expression processors. This provides a referenceable definition of line terminator that does not
require normalizing the subject character string.
4.8 Bracket expressions
4.8.1 Introduction to bracket expressions
So far, this Clause has shown how to match a specific character or any character from certain predefined
sets of characters. Using bracket expressions, applications can specify user-defined group of characters.
(XML Schema and XQuery call these character class expressions, but the term bracket expression is in
common use.)
A bracket expression is begun by a left bracket “[” and terminated by a right bracket “]”. Bracket
expressions have a different list of special characters, namely
^ [ ] \
Forclarity,thesearecalledspecialcharacters,incontrasttothemetacharacterslistedearlier.
4.8.2 Listingcharacters
Ifabracketexpressiondoesnotcontainanyofthespecialcharacters,thenthebracketexpressionmatches
anysinglecharacterthatislistedbetweenthebrackets.Forexample,
S LIKE_REGEX '[abc]'
matchesanyofthefollowing:
'say'
'boy'
'lack'
Allbackslashescapesequencesareavailableforusewithinabracketexpression.Forexample,thefollowing
expressionmatcheseitheracaretorabackslash:
S LIKE_REGEX '[\^\\]'
Thefollowingexpressionmatchesalllettersordigits:
S LIKE_REGEX '[\p{L}\p{Nd}]'
where“\p{L}”istheescapematchinganyletterand“\p{Nd}”istheescapematchinganydigit.
8 © ISO/IEC 2021 – All rights reserved
4.8 Bracket expressions
4.8.3 Matchingarange
Aminussign“-”isusedtospecifyacharacterrange.Forexample:
S LIKE_REGEX '[sa-my]'
matchesthelowercaseletters“s”,allthelettersbetween“a”and“m”inclusive,and“y”.Rangesaredefined
intermsoftheUCScodepointordering.Whentherearemultipleranges,thebracketexpressionmatches
theunionoftheranges.Forexample:
S LIKE_REGEX '[a-me-z]'
matchesalllowercaseletters.
Usingaspecialcharacterinarangeissometimespermitted,buttricky.Ratherthanpresenttherules
here,oneshoulduseabackslashescapeifthestartorendpointofarangeisbeaspecialcharacter.
4.8.4 Negation
Acaret“^”isaspecialcharacterwhenitisthefirstcharacterofabracketexpression,whereitindicates
thatthesetofcharactersisanythingnotlistedbythefollowingbracketexpression.Forexample:
S LIKE_REGEX '[^aj-m]'
isTrueifScontainsanycharacterthatisnot“a”,“j”,“k”,“l”,or“m”.
4.8.5 Characterclasssubtraction
Abracketexpressionmayconcludewithaminussign“-”followedbyanestedbracketexpression.This
iscalledacharacterclasssubtraction,andindicatesthatanycharactermatchedbythenestedbracket
expressionistoberemovedfromthesetofcharactersthatmightbeamatch.Forexample:
S LIKE_REGEX '[a-z-[m-p]]'
matchesanythingbetween“a”and“z”,exceptforthelettersbetween“m”and“p”,inclusive.Thisexample
isequivalentto:
S LIKE_REGEX '[a-lq-z]'
Characterclasssubtractionscanbenestedindefinitely(althoughimplementation-definedlimitations
mayapply).
4.9 Alternation
Achoiceofregularexpressionsisspecifiedthroughtheuseofaverticalbar:“|”.Forexample:
S LIKE_REGEX 'a|b'
isTrueifScontainseitheran“a”ora“b”.
Alternationhaslowerprecedencethanconcatenation.Thus
© ISO/IEC 2021 – All rights reserved 9
4.9 Alternation
S LIKE_REGEX 'ab|xyz'
isTrueifScontainseither“ab”or“xyz”.Thisprecedencecanbeoverriddenthroughtheuseofparentheses,
suchasthisexample:
S LIKE_REGEX 'a(b|xy)z'
TheprecedingexampleisTrueifScontainseither“abz”or“axyz”.
4.10 Quantifiers
Quantifiersaremetacharactersthatspecifyamatchforsomenumberofrepetitionsofaregularexpression.
Therearetwosetsofquantifiers,thegreedyandthereluctant.Thegreedyquantifiersare:
{n} Exactlynrepetitions.
{n,} normorerepetitions.
{n,m} Betweennandmrepetitions,inclusive.
? 0(zero)or1(one)repetition;equivalentto{0,1}.
* 0(zero)ormorerepetitions;equivalentto{0,}.
+ 1(one)ormorerepetitions;equivalentto{1,}.
Thereluctantquantifiersareformedbysuffixingaquestionmarktoagreedyquantifier.Thus,“*?”isthe
reluctantformof“*”,and“??”isthereluctantformof“?”.Thegreedyquantifierstrytomatchasmuch
aspossible,whereasthereluctantquantifierstrytomatchaslittleaspossible(whilestillallowingthe
overallregularexpressiontomatch).Thereisnodifferenceinbehaviorbetweenthegreedyandreluctant
quantifiersforLIKE_REGEX.ThisdifferencefortheotheroperatorsisdiscussedinsubsequentSubclauses.
Example:
S LIKE_REGEX 'a{3}'
isequivalentto
S LIKE_REGEX 'aaa'
andmatchesanystringcontainingatleastthreeconsecutiveinstancesof“a”.NotethatifScontainsmore
thanthreeconsecutiveinstancesof“a”,itstillmatches;totestwhetherScontainsasubstringofthree
consecutiveinstancesof“a”andnomoreisalotharder,sincetheexpressionalsohastorequiresomething
otherthanan“a”atbothendsofthesubstring.
Anotherexample:
S LIKE_REGEX 'ab+c'
isequivalentto
S LIKE_REGEX 'ab{1,}c'
andmatchesanystringthatcontainsasubstringconsistingofan“a”,oneormore“b”s,andthena“c”.
10 © ISO/IEC 2021 – All rights reserved
4.11 Locating a match
4.11 Locatingamatch
LIKE_REGEXonlycareswhetheramatchexists;theotheroperatorscareaboutwhereamatchislocated
inthestring.Considertheregularexpression“a+”andthestring“a1aa2aaa3”.Therearetenpossible
matchesfor“a+”,indicatedbytheunderliningonthefollowinglines:
'a1aa2aaa3' -- position 1, length 1
'a1aa2aaa3' -- position 3, length 1
'a1aa2aaa3' -- position 3, length 2
'a1aa2aaa3' -- pos
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...