Information technology -- Digital publishing -- EPUB 3.0.1

This specification, EPUB Canonical Fragment Identifier (epubcfi), defines a standardized method for referencing arbitrary content within an EPUB® Publication through the use of fragment identifiers. The Web has proven that the concept of hyperlinking is tremendously powerful, but EPUB Publications have been denied much of the benefit that hyperlinking makes possible because of the lack of a standardized scheme to link into them. Although proprietary schemes have been developed and implemented for individual Reading Systems, without a commonly-understood syntax there has been no way to achieve cross-platform interoperability. The functionality that can see significant benefit from breaking down this barrier, however, is varied: from reading location maintenance to annotation attachment to navigation, the ability to point into any Publication opens a whole new dimension not previously available to developers and Authors. This specification attempts to rectify this situation by defining an arbitrary structural reference that can uniquely identify any location, or simple range of locations, in an EPUB Publication: the EPUB CFI. The following considerations have strongly influenced the design and scope of this scheme: The mechanism used to reference content should be interoperable: references to a reading position created by one Reading System should be usable by another. Document references to EPUB content should be enabled in the same way that existing hyperlinks enable references throughout the Web. Each location in an EPUB file should be able to be identified without the need to modify the document. All fragment identifiers that reference the same logical location should be equal when compared. Comparison operations, including tests for sorting and comparison, should be able to be performed without accessing the referenced files. Simple manipulations should be possible without access to the original files (e.g., given a reference deep in a file, it should be possible to generate a reference to the start of the file). Identifier resolution should be reasonably efficient (e.g., processing of the first chapter is not necessary to resolve a fragment identifier that points to the last chapter). References should be able to recover their target locations through parser variations and document revisions. Expression of simple, contiguous ranges should be supported. An extensible mechanism to accommodate future reference recovery heuristics should be provided. In the case of both Standard EPUB CFIs and Intra-Publication EPUB CFI, this specification conforms with the guidelines expressed by W3C in Section 6. Best Practices for Fragid Structures [FragIDBestPractices]. In other words, both standard CFI URIs (e.g., "book.epub#epubcfi(?)", referred media type "application/epub+zip") and intra-publication CFI URIs (e.g., "package.opf#epubcfi(?)", referred media type "application/oebps-package+xml") make use of a fragment identifier syntax that does not overlap with existing schemes in the context of the aforementioned media types' suffix registrations (i.e., "-xml" and "zip").

Technologies de l'information -- Publications numériques -- EPUB 3.0.1

General Information

Status
Published
Publication Date
18-Feb-2020
Current Stage
6060 - International Standard published
Start Date
02-Jan-2020
Completion Date
19-Feb-2020
Ref Project

Buy Standard

Standard
ISO/IEC 23736-6:2020 - Information technology -- Digital publishing -- EPUB 3.0.1
English language
20 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

INTERNATIONAL ISO/IEC
STANDARD 23736-6
First edition
2020-02
Information technology — Digital
publishing — EPUB 3.0.1 —
Part 6:
Canonical fragment identifiers
Technologies de l'information — Publications numériques — EPUB
3.0.1 —
Partie 6: Identificateurs de fragment canoniques
Reference number
ISO/IEC 23736-6:2020(E)
ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC 23736-6:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 23736-6:2020(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that are

members of ISO or IEC participate in the development of International Standards through technical

committees established by the respective organization to deal with particular fields of technical activity.

ISO and IEC technical committees collaborate in fields of mutual interest. Other international

organizations, governmental and non‐governmental, in liaison with ISO and IEC, also take part in the

work.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of document should be noted (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.

Details of any patent rights identified during the development of the document will be in the

Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC

list of patent declarations received (see http://patents.iec.ch).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the World

Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT),
see www.iso.org/iso/foreword.html.

This document was prepared by the World Wide Web Consortium (W3C) (as EPUB Canonical Fragment

Identifiers 1.1) and drafted in accordance with its editorial rules. It was adopted, under the JTC 1

PAS procedure, by Joint Technical Committee ISO/IEC JTC 1, Information technology.

A list of all parts in the ISO/IEC 23736 series can be found on the ISO websitte.e

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO/IEC 2020 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 23736-6:2020(E)
EPUB Canonical Fragment Identifiers 1.1
Recommended Specification 5 January 2017
This version
http://www.idpf.org/epub/linking/cfi/epub-cfi-20170105.html
Latest version
http://www.idpf.org/epub/linking/cfi/epub-cfi.html
Previous version
http://www.idpf.org/epub/linking/cfi/epub-cfi-20161130.html
Previous recommendation
http://www.idpf.org/epub/linking/cfi/epub-cfi-20140628.html
Document history
Changes to this document
Issues addressed in this revision
Report an issue
Errata
Editors
Peter Sorotokin, Adobe
Garth Conboy, Google Inc.
Brady Duga, Google Inc.
John Rivlin, Google Inc.
Don Beaver, Apple Inc.
Kevin Ballard, Apple Inc.
Alastair Fettes, Apple Inc.
Daniel Weck, DAISY Consortium
Copyright © 2011 International Digital Publishing Forum™

All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and

dissemination of this work with changes is prohibited except with the written permission of the International

Digital Publishing Forum (IDPF).
EPUB is a registered trademark of the International Digital Publishing Forum.
Status of this Document

This section describes the status of this document at the time of its publication. Other

documents might supersede this document.
© ISO/IEC 2020 – All rights reserved 1
---------------------- Page: 4 ----------------------
ISO/IEC 23736-6:2020(E)
This document was produced by the EPUB Working Group under the EPUB Working
Group Charter approved on 8 July 2015.

This document has been reviewed by the IDPF membership and is endorsed by the IDPF

Board as a Recommended Specification. This document is considered stable and can be

referenced from other specifications and documents.

Feedback on this document can be provided to the EPUB Working Group's mailing list or

issue tracker.
This document is governed by the IDPF Policies and Procedures.
Table of Contents
1. Overview
1.1. Purpose and Scope
1.2. Terminology
1.3. Typographic Conventions
1.4. Conformance Statements
2. EPUB CFI Definition
2.1. Introduction
2.2. Syntax
2.3. Character Escaping
3. EPUB CFI Processing
3.1. Path Resolution
3.1.1. Step Reference to Child Element or Character Data (/)
3.1.2. XML ID Assertion ([)
3.1.3. Step Indirection (!)
3.1.4. Character Offset (:)
3.1.5. Temporal Offset (~)
3.1.6. Spatial Offset (@)
3.1.7. Temporal-Spatial Offset (~ + @)
3.1.8. Text Location Assertion ([)
3.1.9. Side Bias ([ + ;s=)
3.1.10. Examples
3.2. Sorting Rules
3.3. Intra-Publication CFIs
3.4. Simple Ranges
3.5. Intended Target Location Correction
4. Extending EPUB CFIs
References
› 1 Overview
› 1.1 Purpose and Scope
This section is informative

This specification, EPUB Canonical Fragment Identifier (epubcfi), defines a standardized

method for referencing arbitrary content within an EPUB® Publication through the use of

fragment identifiers.

The Web has proven that the concept of hyperlinking is tremendously powerful, but EPUB

Publications have been denied much of the benefit that hyperlinking makes possible

2 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/IEC 23736-6:2020(E)

because of the lack of a standardized scheme to link into them. Although proprietary

schemes have been developed and implemented for individual Reading Systems, without

a commonly-understood syntax there has been no way to achieve cross-platform

interoperability. The functionality that can see significant benefit from breaking down this

barrier, however, is varied: from reading location maintenance to annotation attachment to

navigation, the ability to point into any Publication opens a whole new dimension not

previously available to developers and Authors.

This specification attempts to rectify this situation by defining an arbitrary structural

reference that can uniquely identify any location, or simple range of locations, in an EPUB

Publication: the EPUB CFI. The following considerations have strongly influenced the

design and scope of this scheme:
The mechanism used to reference content should be interoperable: references to a
reading position created by one Reading System should be usable by another.
Document references to EPUB content should be enabled in the same way that
existing hyperlinks enable references throughout the Web.

Each location in an EPUB file should be able to be identified without the need to

modify the document.

All fragment identifiers that reference the same logical location should be equal

when compared.

Comparison operations, including tests for sorting and comparison, should be able

to be performed without accessing the referenced files.

Simple manipulations should be possible without access to the original files (e.g.,

given a reference deep in a file, it should be possible to generate a reference to the

start of the file).

Identifier resolution should be reasonably efficient (e.g., processing of the first

chapter is not necessary to resolve a fragment identifier that points to the last

chapter).

References should be able to recover their target locations through parser variations

and document revisions.
Expression of simple, contiguous ranges should be supported.
An extensible mechanism to accommodate future reference recovery heuristics
should be provided.
In the case of both Standard EPUB CFIs and Intra-Publication EPUB CFI, this

specification conforms with the guidelines expressed by W3C in Section 6. Best Practices

for Fragid Structures [FragIDBestPractices].

In other words, both standard CFI URIs (e.g., "book.epub#epubcfi(…)", referred media type

"application/epub+zip") and intra-publication CFI URIs (e.g., "package.opf#epubcfi(…)",

referred media type "application/oebps-package+xml") make use of a fragment identifier

syntax that does not overlap with existing schemes in the context of the aforementioned

media types' suffix registrations (i.e., "-xml" and "-zip").
› 1.2 Terminology
© ISO/IEC 2020 – All rights reserved 3
---------------------- Page: 6 ----------------------
ISO/IEC 23736-6:2020(E)

Please refer to [EPUB 3.1] for definitions of EPUB-specific terminology used in this

document.
Standard EPUB CFI
A publication-level EPUB CFI links into an EPUB Publication. The path
preceding the EPUB CFI references the location of the EPUB Publication.
Intra-Publication EPUB CFI
An intra-publication EPUB CFI allows one Content Document to reference
another within the same Rendition of an EPUB Publication. The path preceding
the EPUB CFI references the current Rendition's Package Document.
Refer to Intra-Publication CFIs for more information.
› 1.3 Typographic Conventions
The following typographic conventions are used in this specification:
markup
All markup (elements, attributes, properties), code (JavaScript, pseudo-code),

machine-readable values (string, characters, media types) and file names are in red

monospace font.
markup link
Links to markup and code definitions are in underlined red monospace font.
http://www.idpf.org/
URIs are in navy blue monospace font.
hyperlink
Hyperlinks are underlined and blue.
[reference]
Normative and informative references are enclosed in square brackets.
Term
Terms defined in the Terminology are in capital case.
Term Link
Links to term definitions have a dotted blue underline.
Normative element, attribute and property definitions are in blue boxes.
Informative markup examples are in light gray boxes.
4 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC 23736-6:2020(E)
NOTE
Informative notes are in green boxes with a "Note" header.
CAUTION
Informative cautionary notes are in red boxes with a "Caution" header.
› 1.4 Conformance Statements
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT,

RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in

[RFC2119].

All sections and appendixes of this specification are normative except where identified by

the informative status label "This section is informative". The application of informative

status to sections and appendixes applies to all child content and subsections they

contain.
All examples in this specification are informative.
› 2 EPUB CFI Definition
› 2.1 Introduction
This section is informative

A fragment identifier is the part of an IRI [RFC3987] that defines a location within a

resource. Syntactically, it is the segment attached to the end of the resource IRI starting

with a hash (#). For HTML documents, IDs and named anchors are used as fragment

identifiers, while for XML documents the Shorthand XPointer [XPTRSH] notation is used

to refer to a given ID.

A Canonical Fragment Identifier (CFI) is a similar construct to these, but expresses a

location within an EPUB Publication. For example:
book.epub#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)

The function-like string immediately following the hash (epubcfi(…)) indicates that this

fragment identifier conforms to the scheme defined by this specification, and the value

contained in the parentheses is the syntax used to reference the location within the

specified EPUB Publication (book.epub). Using the processing rules defined in Path

Resolution, any Reading System can parse this syntax, open the corresponding Content

Document in the EPUB Publication and load the specified location for the user.
A complete definition of the EPUB CFI syntax is provided in the next section.
© ISO/IEC 2020 – All rights reserved 5
---------------------- Page: 8 ----------------------
ISO/IEC 23736-6:2020(E)
NOTE
epub has been prepended to the name of the scheme, as a more generic CFI-like
scheme might be defined in the future for all XML+ZIP-based file formats.
› 2.2 Syntax
(EBNF productions ISO/IEC 14977)
All terminal symbols are in the Unicode Block 'Basic Latin' (U+0000 to U+007F).
fragment = "epubcfi(" , ( path , [ range ] ) , ")" ;
path = step , local_path ;
range = "," , local_path , "," , local_path ;
local_path = { step } , ( redirected_path | [ offset ] );
redirected_path = "!" , ( offset | path );
step = "/" , integer , [ "[" , assertion , "]" ] ;
offset = ( ( ":" , integer ) | ( "@" , number , ":" , number ) | ( "~" , number
, [ "@" , number , ":" , number ] ) ) , [ "[" , assertion , "]" ] ;

number = ( digit-non-zero , { digit } , [ "." , { digit } , digit-non-zero ] ) | (

zero , [ "." , { digit } , digit-non-zero ] ) ;
integer = zero | ( digit-non-zero , { digit } ) ;
assertion = ( ( value , [ "," , value ] ) | ( "," , value ) | ( parameter ) ) {
parameter } ;
parameter = ";" , value-no-space , "=" , csv ;
csv = value , { "," , value } ;
value = string-escaped-special-chars ;
value-no-space = value - ( [ value ] , space , [ value ] ) ;
special-chars = circumflex | square-brackets | parentheses | comma |
semicolon | equal ;
escaped- = ( circumflex , circumflex ) | ( circumflex , square-brackets ) | (
special-chars circumflex , parentheses ) | ( circumflex , comma ) | (
circumflex , semicolon ) | ( circumflex , equal ) ;
character- = ( character - special-chars ) | escaped-special-chars ;
escaped-
special
string-escaped- = character-escaped-special , { character-escaped-special } ;
special-chars
digit = zero | digit-non-zero ;
6 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC 23736-6:2020(E)
digit-non-zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
zero = "0" ;
space = " " ;
circumflex = "^" ;
square- = "[" | "]" ;
brackets
parentheses = "(" | ")" ;
comma = "," ;
semicolon = ";" ;
equal = "=" ;
character = ? Unicode Characters ? ;
› Unicode Characters

The definition of allowed Unicode characters is the same as [XML]. This excludes the

surrogate blocks, FFFE, and FFFF:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Document authors are encouraged to avoid "compatibility characters", as defined in

section 2.3 of [Unicode]. The characters defined in the following ranges are also

discouraged. They are either control characters or permanently undefined Unicode
characters:
[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

A Canonical Fragment Identifier (CFI) consists of an initial sequence epubcfi that

identifies this particular reference method, and a parenthesized path or range. A path is

built up as a sequence of structural steps to reference a location. A range is a path

followed by two local (or relative) paths that identify the start and end of the range.

Steps are denoted by the forward slash character (/), and are used to traverse XML

content. The last step in a CFI path represents a location within a document, either

structural (XML element), textual (character data), or aural-visual (image, audio, or video

media). Such terminating steps MAY be complemented by an OPTIONAL "offset", which

denotes a particular character position, temporal or spatial fragment.

Substrings in brackets are extensible assertions that improve the robustness of traversing

paths and migrating them from one revision of the document to another. These assertions

© ISO/IEC 2020 – All rights reserved 7
---------------------- Page: 10 ----------------------
ISO/IEC 23736-6:2020(E)

preserve additional information about traversed elements of the document, which makes it

possible to recover intended location even after some modifications are made to the

EPUB Publication.

Although the value definition in the syntax above allows any a sequence of characters, a

circumflex (^) MUST be used to escape the following characters to ensure their presence

does not interfere with parsing:
brackets ([,])
circumflex (^)
comma (,)
parentheses ((,))
semicolon (;)
Example of an EPUB CFI that points to a location after the text "2[1]".
epubcfi(/6/14[chap05ref]!/4[body01]/10/2/1:3[2^[1^]])

The following rules apply to the use of numbers and integers within the path or range:

leading zeros are not allowed for numbers or integers (to ensure uniqueness);
trailing zeros are not allowed in the fractional part of a number;
zero MUST be represented as the integer 0;
numbers in the range 1 > N > 0 MUST have a leading 0.;
integral numbers MUST be represented as integers.
› 2.3 Character Escaping

As described in Syntax, the EPUB CFI grammar contains characters that have a special

purpose as delimiters within a fragment identifier expression. These characters MUST be

escaped using the circumflex '^' character when not intended for use as delimiters, so that

they can appear within the EPUB CFI data without being mistaken for delimiters.

Depending on the usage context of such EPUB CFI, further character escaping MAY be

necessary in order to ensure that all potentially-conflicting text tokens are encoded

correctly.
IRI and URI references:
The EPUB CFI (fragment identifier) scheme is designed to be used within URI
and IRI references. The [RFC3986] specification defines a number of
"reserved" characters that have a specific purpose as delimiters, and which
MAY need to be escaped in cases when they would otherwise conflict with the
syntactical structure of the URI/IRI reference. The character used for escaping
is the percent sign '%', and escapable characters get percent-encoded. For
example, the percent character itself becomes "%25" when it gets escaped
8 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC 23736-6:2020(E)
(note the difference with EPUB CFI's circumflex '^', which gets escaped using
a double character '^^').
Unlike IRI references, URI references require unicode characters to be ASCII-
encoded. Although the EPUB specification itself is based on IRIs (i.e. authors
and production tools are expected to use IRIs), some systems or APIs might
only support URIs. As a result, implementors MAY still need too handle the
conversion of IRI to URI references, as defined in [RFC3987]. Disallowed
characters are escaped as follows:
Each disallowed character is converted to UTF-8 [RFC2279] as one or
more bytes. The disallowed characters in URI references include all non-
ASCII characters, plus the excluded characters listed in Section 2.4 of
[RFC2396], except for the number sign '#' and percent sign '%' and the
square bracket characters re-allowed in [RFC2732].
The resulting bytes are escaped with the URI escaping mechanism (that
is, converted to '%HH', where HH is the hexadecimal notation of the byte
value).
The original character is replaced by the resulting character sequence.
(X)HTML context:

IRI references are designed to be used in the various types of documents that EPUB

Publications comprise. XML and (X)HTML represent yet another insertion context
that requires specific character escaping rules. For example, double quote

characters or angle brackets conflict with significant delimiters in the markup syntax,

and MUST therefore be escaped using the &xxx; special sequence (character
reference).

When multiple layers of character escaping are applied to escape or unescape an EPUB

CFI, they MUST be applied in reverse order to revert back to the original form. For example,

[ EPUB-CFI -> IRI -> (X)HTML ] becomes [ (X)HTML -> IRI -> EPUB-CFI ]

The following example shows an EPUB CFI in its "raw" form (only with '^' circumflex

escaping). Note the assertion text at the end of it, with escaped square brackets as well as the

escaped circumflex character itself (the unescaped text is 'Ф-"spa ce"-99%-aa[bb]^'):

epubcfi(/6/4!/4/10/2/1:3[Ф-"spa ce"-99%-aa^[bb^]^^])

When taking part in an IRI, the space character within the assertion might become percent-

escaped ('%20'), and the percent character itself MUST be escaped ('%25'). Note that the

square brackets '[' ']' and semicolumn ':' are "reserved" characters (as per the URI

specification) but because they serve no purpose as delimiters when the IRI processor

extracts the fragment identifier, they do not need to be escaped (i.e. the fragment component

of the IRI can non-ambiguously be parsed by processing all the text after the '#' character).

The circumflex '^' also falls within the category of "unwise" (or "unsafe") characters, but the

EPUB fragment identifier scheme does not require escaping them. Here is the IRI-escaped

EPUB CFI:
#epubcfi(/6/4!/4/10/2/1:3[Ф-"spa%20ce"-99%25-aa^[bb^]^^])

When the IRI appears within an XML attribute, the double quote character (quotation mark) is

significant as a delimiter of the attribute value, so it becomes escaped with '"'. Note that

the Cyrillic "EF" character ('Ф') is directly supported in EPUB XML documents (which use the

© ISO/IEC 2020 – All rights reserved 9
---------------------- Page: 12 ----------------------
ISO/IEC 23736-6:2020(E)

UTF-8 encoding to represent the unicode character repertoire), so it doesn't need to be

encoded:
#epubcfi(/6/4!/4/10/2/1:3[Ф-"spa%20ce"-99%25-aa^[bb^]^^])

If the IRI need to be converted to URI, the non-ASCII Cyrillic "EF" character ('Ф') would get

percent-escaped with 2 bytes ('0xd0 0xa4', in hexadecimal). This would result in the

following URI:
#epubcfi(/6/4!/4/10/2/1:3[%d0%a4-%22spa%20ce%22-99%25-aa^[bb^]^^])

URI encoding / decoding APIs usually "aggressively" percent-encode characters, as

demonstrated in the following example. Note how the circumflexes '^' (%5E), square brackets

'[' (%5B) ']' (%5D) and double-quotes '"' (%22) are also percent-encoded (due to their

"unsafe" / "unwise" nature within URIs) :
#epubcfi(/6/4!/4/10/2/1:3%5B%D0%A4-%22spa%20ce%22-99%25-
aa%5E%5Bbb%5E%5D%5E%5E%5D)
› 3 EPUB CFI Processing
› 3.1 Path Resolution

The process of resolving an EPUB CFI to a location within an EPUB Publication begins

with the root package element of the Package Document. Each step in the CFI is then

processed one by one, left to right, applying the rules defined in the following subsections.

NOTE
The EPUB CFI examples in the following subsections are based on the sample
documents in Examples.
› 3.1.1 Step Reference to Child Element or Character Data (/)

A step with a slash (/) followed by a positive integer refers to either a child element or a

chunk of character data, as per the rules defined herein:

[XML] content other than element and character data is ignored. Note that as per the

[XML] specification, character data inside CDATA sections is included, and
conversely, XML comments are ignored.

[XML] character data that corresponds to insignificant white space (typically used for

markup formatting/indenting) is preserved. Character and entity references are
considered expanded, and character data is obtained from the "included
replacement text" (as per the terminology defined in the [XML] specification).

[XML] character data that is interspersed amongst sibling child elements (i.e., "mixed

content" context) is logically organised into (potentially-empty) chunks of contiguous

character data: the first chunk is located before the first child element (left sibling),

the last chunk is located after the last child element (right sibling), and there is one

10 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 13 ----------------------
ISO/IEC 23736-6:2020(E)

chunk between each pair of child elements. When there are no child elements, there

is one (potentially-empty) chunk of character data. Consecutive (potentially-empty)

chunks of character data are each assigned odd indices (i.e., starting at 1, followed

by 3, etc.).

Child [XML] elements are assigned even indices (i.e., starting at 2, followed by 4,

etc.). Additionally, 0 is a valid index that refers to a non-existing element which

virtually precedes the first potentially-empty chunk of character data within the

parent element's content. Similarly, n+2 is a valid index that refers to a non-existing

element which virtually follows the last potentially-empty chunk of character data,

where n is the even index of the last child element, or 0 if there are no child
elements. CFI processors (e.g., Reading Systems) MUST be capable of consuming

(e.g., parsing and interpreting) CFI expressions containing references to the 0 and

n+2 "virtual" elements, even when the first (or last, respectively) chunk of character

data is empty. Conversely, the *production* of such CFI expressions is governed by

the following conformance requirement: if the first chunk of character data is empty,

a CFI expression SHOULD NOT be constructed using a reference to the "virtual"

element at index 0, instead the "real" first child element (at index 2) SHOULD be

referred to. Similarly, if the last chunk of character data is empty, a CFI expression

SHOULD NOT be constructed using a reference to the "virtual" element at index n+2,

instead the "real" last child element (at index n) SHOULD be referred to.
NOTE

The "virtual" first / last elements mechanism might facilitate interoperability with certain

instances of DOM Ranges, whereby non-existing elements are used to span across

textual content without relying on character offsets at the start/end boundaries.

For a Standard EPUB CFI, the leading step in the CFI MUST start with a slash (/) followed

by an even number that references the spine child element of the Package Document's

root package element. The Package Document traversed by the CFI MUST be the one

specified as the Default Rendition in the EPUB Publication's META-INF/container.xml file

(i.e., the Package Document referenced by the first rootfile element in container.xml).

For an Intra-Publication EPUB CFI, the first step MUST start with a slash followed by a

node number that references a position in Package Document starting from the root

package element.
› 3.1.2 XML ID Assertion ([)

When an EPUB CFI references an element that contains an ID [XML], the corresponding

path step MUST include that ID in square brackets (i.e., after the slash (/) and even number

that identifies the element).

Specification of identifiers adds robustness to the CFI scheme: a Reading System can

determine that the location referenced by the CFI is not the original intended location, and

can use the identifier to compute the set of steps that reach the desired destination in the

content (see Intended Target Location Correction). The cost of this added robustness is

that comparison (and sorting) of CFI strings can be performed only after logically stripping

all bracketed substrings (see Sorting Rules).
› 3.1.3 Step Indirection (!)
© ISO/IEC 2020 – All rights reserved 11
---------------------- Page: 14 ----------------------
ISO/IEC 23736-6:2020(E)

If a step, or a sequence of steps, points to an element that references another document,

the exclamation mark (!) MUST be used whenever that step is immediately followed by an

expression that applies to the referenced document ("indirection"). The following

expression is then resolved from the root element of the referenced XML document, or

from the targeted XML fragment (when specified).
Only the following references are honored:
For itemref in the Package Document spine, the reference is defined by the href

attribute of the corresponding item element in the manifest (i.e., that the itemref's

idref attribute references).

For [HTML] iframes and embed elements, references are defined by the src attribute

For the [HTML] object element, the reference is defined by the data attribute
For [SVG] image and use elements, references are defined by the xlink:href
attribute
NOTE
This scheme does not take into account hyperlinks, only embedding references.

Consequently, it is illegal to follow links from the [HTML] (or [SVG]) a element.

› 3.1.4 Character Offset (:)

A path terminating with a leading colon (:) followed by an integer refers to a character

offset. The given character offset MAY apply to an element only if this element is the

[HTML] img element with an alt attribute containing the text to which the character offset

applies.

For XML character data, the offset is zero-based and always refers to a position between

characters, so 0 means before the first character and a number equal to the total UTF-16

length means after the last ch
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.