ISO/IEC TS 30135-5:2014
(Main)Information technology — Digital publishing — EPUB3 — Part 5: Media Overlay
Information technology — Digital publishing — EPUB3 — Part 5: Media Overlay
This specification, EPUB Media Overlays 3.0, defines a usage of [SMIL] (Synchronized Multimedia Integration Language), the Package Document, the EPUB® Style Sheet, and the EPUB Content Document for representation of audio synchronized with the EPUB Content Document. This specification is one of a family of related specifications that compose EPUB 3, the third major revision of an interchange and delivery format for digital publications based on XML and Web Standards. It is meant to be read and understood in concert with the other specifications that make up EPUB 3: · The EPUB 3 Overview [EPUB3Overview], which provides an informative overview of EPUB and a roadmap to the rest of the EPUB 3 documents. The Overview should be read first. · EPUB Publications 3.0 [Publications30], which defines publication-level semantics and overarching conformance requirements for EPUB Publications. · EPUB Content Documents 3.0 [ContentDocs30], which defines profiles of XHTML, SVG and CSS for use in the context of EPUB Publications. · EPUB Open Container Format (OCF) 3.0 [OCF3], which defines a file format and processing model for encapsulating a set of related resources into a single-file (ZIP) EPUB Container.
Technologies de l'information — Publications numériques — EPUB3 — Partie 5: Superposition de médias
General Information
Standards Content (Sample)
TECHNICAL ISO/IEC
SPECIFICATION TS
30135-5
First edition
2014-11-15
Information technology — Digital
publishing — EPUB3 —
Part 5:
Media Overlay
Technologies de l'information — Publications numériques — EPUB3 —
Partie 5: Superposition de médias
Reference number
ISO/IEC TS 30135-5:2014(E)
©
ISO/IEC 2014
---------------------- Page: 1 ----------------------
ISO/IEC TS 30135-5:2014(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any
means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission.
Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester.
ISO copyright office
Case postale 56 CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2014 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC TS 30135-5:2014(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
In other circumstances, particularly when there is an urgent market requirement for such documents, the joint
technical committee may decide to publish an ISO/IEC Technical Specification (ISO/IEC TS), which
represents an agreement between the members of the joint technical committee and is accepted for
publication if it is approved by 2/3 of the members of the committee casting a vote.
An ISO/IEC TS is reviewed after three years in order to decide whether it will be confirmed for a further three
years, revised to become an International Standard, or withdrawn. If the ISO/IEC TS is confirmed, it is
reviewed again after a further three years, at which time it must either be transformed into an International
Standard or be withdrawn.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC TS 30135 series were prepared by Korean Agency for Technology and Standards (as KS X 6070
series) with International Digital Publishing Forum and were adopted, under a special “fast-track procedure”,
by Joint Technical Committee ISO/IEC JTC 1, Information technology, in parallel with its approval by the
national bodies of ISO and IEC.
ISO/IEC TS 30135 consists of the following parts, under the general title Information technology — Document
description and processing languages — EPUB 3:
— Part 1: Overview
— Part 2: Publications
— Part 3: Content Documents
— Part 4: Open Container Format
— Part 5: Media Overlay
— Part 6: Canonical Fragment Identifier
— Part 7: Fixed-Layout Documents
---------------------- Page: 3 ----------------------
EPUB Media Overlays 3.0
Recommended Specification 11 October 2011
THIS VERSION
http://www.idpf.org/epub/30/spec/epub30-mediaoverlays-20111011.html
LATEST VERSION
http://www.idpf.org/epub/30/spec/epub30-mediaoverlays.html
PREVIOUS VERSION
http://www.idpf.org/epub/30/spec/epub30-mediaoverlays-20110908.html
A diff of changes from the previous draft is available at this link.
Please refer to the errata for this document, which may include some normative corrections.
Copyright © 2010, 2011 International Digital Publishing Forum™
All rights reserved. This work is protected under Title 17 of the United States Code. Reproduction and
dissemination of this work with changes is prohibited except with the written permission of the International
Digital Publishing Forum (IDPF).
EPUB is a registered trademark of the International Digital Publishing Forum.
Editors
Marisa DeMeglio, DAISY Consortium
Daniel Weck, DAISY Consortium
TAB LE O F CO NTENTS
1. Overview
1.1. Purpose and Scope
1.2. Relationship to Other Specifications
1.3. Terminology
1.4. Conformance Statements
2. Media Overlay Document Definition
2.1. Introduction
2.2. Content Conformance
2.3. Reading System Conformance
2.4. Media Overlay Document Definition
2.4.1. The smil Element
2.4.2. The head Element
2.4.3. The metadata Element
2.4.4. The body Element
2.4.5. The seq Element
2.4.6. The par Element
2.4.7. The text Element
2.4.8. The audio Element
3. Creating Media Overlays
3.1. Overview
3.2. Relationship to the EPUB Content Document
3.2.1. Structure
3.2.2. Granularity
3.2.3. Embedded Audio and Video
---------------------- Page: 4 ----------------------
3.2.4. Text-to-Speech
3.3. Semantic Inflection
3.4. Associating Style Information
3.5. Packaging
3.5.1. Including Media Overlays
3.5.2. Media Overlays Metadata Vocabulary
4. Playback Behaviors
4.1. Loading the Media Overlay
4.2. Basic Playback
4.2.1. Timing and Synchronization
4.2.2. Rendering Audio
4.2.3. Rendering EPUB Content Document Elements
4.3. Interacting with the EPUB Content Document
4.3.1. Navigation
4.3.2. Embedded Audio and Video
4.3.3. Text-to-Speech
4.4. Skippability and Escapability
4.4.1. Skippability
4.4.2. Escapability
A. Media Overlays Schema
A.1. Using the Media Overlays Schema
B. Examples of Clock Values
C. Acknowledgements and Contributors
References
› 1 Overview
› 1.1 Purpose and Scope
This section is informative
This specification, EPUB Media Overlays 3.0, defines a usage of [SMIL] (Synchronized Multimedia
Integration Language), the Package Document, the EPUB® Style Sheet, and the EPUB Content
Document for representation of audio synchronized with the EPUB Content Document.
This specification is one of a family of related specifications that compose EPUB 3, the third major
revision of an interchange and delivery format for digital publications based on XML and Web Standards. It
is meant to be read and understood in concert with the other specifications that make up EPUB 3:
The EPUB 3 Overview [EPUB3Overview], which provides an informative overview of EPUB and a
roadmap to the rest of the EPUB 3 documents. The Overview should be read first.
EPUB Publications 3.0 [Publications30], which defines publication-level semantics and
overarching conformance requirements for EPUB Publications.
EPUB Content Documents 3.0 [ContentDocs30], which defines profiles of XHTML, SVG and CSS
for use in the context of EPUB Publications.
EPUB Open Container Format (OCF) 3.0 [OCF3], which defines a file format and processing
model for encapsulating a set of related resources into a single-file (ZIP) EPUB Container.
› 1.2 Relationship to Other Specifications
This section is informative
---------------------- Page: 5 ----------------------
This specification relies on a subset of [SMIL], from which the EPUB Media Overlays elements and
attributes defined in Media Overlay Document Definition are derived.
1.3 Terminology
›
EPUB Publication (or Publication)
A logical document entity consisting of a set of interrelated resources and packaged in an
EPUB Container, as defined by this specification and its sibling specifications.
Publication Resource
A resource that contains content or instructions that contribute to the logic and rendering of
the EPUB Publication. In the absence of this resource, the Publication might not render as
intended by the Author. Examples of Publication Resources include the Package Document,
EPUB Content Documents, EPUB Style Sheets, audio, video, images, embedded fonts and
scripts.
With the exception of the Package Document itself, Publication Resources must be listed in
the manifest [Publications30] and must be bundled in the EPUB container file unless
specified otherwise in Publication Resource Locations [Publications30].
Examples of resources that are not Publication Resources include those identified by the
Package Document link [Publications30] element and those identified in outbound hyperlinks
that resolve outside the EPUB Container (e.g., referenced from an [HTML5] a element href
attribute).
EPUB Content Document
A Publication Resource that conforms to one of the EPUB Content Document definitions
(XHTML or SVG).
An EPUB Content Document is a Core Media Type, and may therefore be included in the
EPUB Publication without the provision of fallbacks [Publications30].
XHTML Content Document
An EPUB Content Document conforming to the profile of [HTML5] defined in XHTML Content
Documents [ContentDocs30].
XHTML Content Documents use the XHTML syntax of [HTML5].
SVG Content Document
An EPUB Content Document conforming to the constraints expressed in SVG Content
Documents [ContentDocs30].
EPUB Navigation Document
A specialization of the XHTML Content Document, containing human- and machine-readable
global navigation information, conforming to the constraints expressed in EPUB Navigation
Documents [ContentDocs30].
Core Media Type
A set of Publication Resource types for which no fallback is required. Refer to Publication
Resources [Publications30] for more information.
Package Document
A Publication Resource carrying bibliographical and structural metadata about the EPUB
---------------------- Page: 6 ----------------------
Publication, as defined in Package Documents [Publications30].
Manifest
A list of all Publication Resources that constitute the EPUB Publication.
Refer to manifest [Publications30] for more information.
Spine
An ordered list of Publication Resources, typically EPUB Content Documents, representing
the default reading order of the Publication.
Refer to spine [Publications30] for more information.
Media Overlay Document
An XML document that associates the XHTML Content Document with pre-recorded audio
narration in order to provide a synchronized playback experience, as defined in this
specification.
Text-to-Speech (TTS)
The rendering of the textual content of an EPUB Publication as artificial human speech using
a synthesized voice.
EPUB Style Sheet (or Style Sheet)
A CSS Style Sheet conforming to the CSS profile defined in EPUB Style Sheets
[ContentDocs30].
Viewport
The region of an EPUB Reading System in which the content of an EPUB Publication is
rendered visually to a User.
CSS Viewport
A Viewport capable of displaying CSS-styled content.
EPUB Container (or Container)
The ZIP-based packaging and distribution format for EPUB Publications defined in [OCF3].
Author
The person(s) or organization responsible for the creation of an EPUB Publication, which is
not necessarily the creator of the content and resources it contains.
User
An individual that consumes an EPUB Publication using an EPUB Reading System.
EPUB Reading System (or Reading System)
A system that processes EPUB Publications for presentation to a User in a manner
conformant with this specification and its sibling specifications.
› 1.4 Conformance Statements
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD
NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described
in [RFC2119].
---------------------- Page: 7 ----------------------
All sections of this specification are normative except where identified by the informative status label
"This section is informative". The application of informative status to sections and appendices applies to
all child content and subsections they may contain.
All examples in this specification are informative.
› 2 Media Overlay Document Definition
› 2.1 Introduction
This section is informative
Books featuring synchronized audio narration are found in mainstream e-books, educational tools and e-
books formatted for persons with print disabilities. In EPUB 3, these types of books are created by using
Media Overlay Documents to describe the timing for the pre-recorded audio narration and how it relates to
the EPUB Content Document markup. The file format for Media Overlays is defined as a subset of SMIL,
a W3C recommendation for representing synchronized multimedia information in XML.
The Media Overlays feature is designed to be transparent to EPUB Reading Systems that do not support
the feature. The inclusion of Media Overlays in an EPUB Publication has no impact on the ability of
Media Overlay-unaware Reading Systems to render that Publication as a "regular" EPUB Publication.
Although future versions of this specification may incorporate support for video media (e.g., synchronized
text/sign-language books), this version supports only synchronizing audio media with the EPUB Content
Document.
› 2.2 Content Conformance
A Media Overlay Document must meet all of the following criteria:
Document Properties
› It must meet the conformance constraints for XML documents defined in XML Conformance
[Publications30].
› It must be valid to the Media Overlays schema as defined in Appendix A, Media Overlays
Schema and conform to all content conformance constraints expressed in Media Overlay
Document Definition.
› It must be authored to reflect the structure of the EPUB Content Document with which it is
associated, as stated in Structure .
› Authors should avoid using scripts to control audio and video embedded in the EPUB Content
Document, as stated in Embedded Audio and Video.
› It should use semantic markup where appropriate, as described in Semantic Inflection.
› It must be packaged with the EPUB Publication as shown in Packaging.
File Properties
› The Media Overlay Document filename should use the file extension .smil.
› 2.3 Reading System Conformance
---------------------- Page: 8 ----------------------
EPUB Reading System support for Media Overlays is optional. A Reading System that supports Media
Overlays must meet the following criteria:
› It must process the Media Overlay Document in conformance with all Reading System
conformance constraints expressed in Media Overlay Document Definition.
› It must support XHTML Content Documents, and it may support SVG Content Documents.
› It must render Media Overlay elements as described in Basic Playback.
› It must allow User navigation while a Media Overlay is being played, as discussed in Navigation.
› It must adhere to rules regarding referenced audio and video embedded in the EPUB Content
Document, as stated in Embedded Audio and Video.
› Text-to-Speech (TTS)-capable Reading Systems should conform to Reading System Text-to-
Speech Conformance Requirements [Publications30].
› It should offer the skippability and escapability features described in Skippability and
Escapability.
A Reading System that does not support Media Overlays must meet the following criteria:
› It must ignore both the media-overlay attribute on manifest item elements and the manifest item
elements where the media-type attribute value equals application/smil+xml.
› 2.4 Media Overlay Document Definition
All elements [XML] defined in this section are in the http://www.w3.org/ns/SMIL namespace [XMLNS]
unless otherwise specified.
›
2.4.1 The smil Element
The smil element must be the root element of all Media Overlay Documents.
Element Name
smil
Usage
The smil element is the root element of the Media Overlay Document.
Attributes
version [required]
Specifies the version number of the [SMIL] specification to which the Media Overlay
adheres.
This attribute must have the value 3.0 to indicate compliance with this version of the
specification.
id [optional]
The ID [XML] of this element, which must be unique within the document scope.
epub:prefix [optional]
Declares additional metadata vocabulary prefixes.
---------------------- Page: 9 ----------------------
Refer to Semantic Inflection for more information.
Content Model
In this order: head [optional], body [required]
› 2.4.2 The head Element
The head element is the container for metadata in the Media Overlay Document, and consists of zero or
one child metadata element.
Element Name
head
Usage
The head element is the optional first child of the smil element.
Attributes
None.
Content Model
metadata [0 or 1].
As this specification defines no metadata properties that must occur in the Media Overlay Document, the
head element is optional.
› 2.4.3 The metadata Element
The metadata element represents metadata for the Media Overlay Document. The metadata element is an
extension point that allows the inclusion of metadata from any metainformation structuring language.
Element Name
metadata
Usage
As a child of the head element.
Attributes
None.
Content Model
[0 or more] elements from any namespace.
This specification defines no metadata properties that must occur in the Media Overlay Document; the
metadata element is provided for custom metadata requirements.
---------------------- Page: 10 ----------------------
› 2.4.4 The body Element
The body element is the starting point for the presentation contained in the Media Overlay Document. It
contains the main sequence of par and seq elements.
Element Name
body
Usage
The body element is the required second child of the smil element.
Attributes
epub:type [optional]
An expression of the structural semantics of the corresponding element in the EPUB
Content Document.
The value is a whitespace separated list of property [Publications30] types. Refer to
Semantic Inflection for more information.
id [optional]
The ID [XML] of this element, which must be unique within the document scope.
epub:textref [optional]
The relative IRI reference [RFC3987] of the corresponding EPUB Content Document,
including a fragment identifier that references the specific element as per the
[XPTRSH].
Content Model
In any order: seq [0 or more] or par [0 or more]
At least one par or seq is required.
›
2.4.5 The seq Element
The seq element contains media objects which are to be rendered sequentially.
Element Name
seq
Usage
One or more seq elements may occur as children of the body element and of the seq
element.
Attributes
epub:type [optional]
An expression of the structural semantics of the corresponding element in the EPUB
Content Document.
---------------------- Page: 11 ----------------------
The value is a whitespace separated list of property [Publications30] types. Refer to
Semantic Inflection for more information.
id [optional]
The ID [XML] of this element, which must be unique within the document scope.
epub:textref [required]
The relative IRI reference [RFC3987] of the corresponding EPUB Content Document,
including a fragment identifier that references the specific element as per the
[XPTRSH].
Content Model
In any order: seq [0 or more] or par [0 or more].
At least one par or seq is required.
› 2.4.6 The par Element
The par element contains media objects which are to be rendered in parallel.
Element Name
par
Usage
One or more par elements may occur as children of the body and seq elements.
Attributes
epub:type [optional]
An expression of the structural semantics of the corresponding element in the EPUB
Content Document.
The value is a whitespace separated list of property [Publications30] types. Refer to
Semantic Inflection for more information.
id [optional]
The ID [XML] of this element, which must be unique within the document scope.
Content Model
In any order: text [required] and audio [optional]
The audio element is optional only if its sibling text element refers to audio or video media
(see Embedded Audio and Video), or to textual content intended for rendering via Text-to-
Speech (TTS).
› 2.4.7 The text Element
The text element references an element in the EPUB Content Document. A text element typically refers
to a textual element, but can also refer to other EPUB Content Document media elements (see
---------------------- Page: 12 ----------------------
Embedded Audio and Video).
Element Name
text
Usage
As a required child of the par element.
Attributes
src [required]
The relative IRI reference [RFC3987] of the corresponding EPUB Content Document,
including a fragment identifier that references the specific element as per the
[XPTRSH].
id [optional]
The ID [XML] of this element, which must be unique within the document scope.
Content Model
Empty.
› 2.4.8 The audio Element
The audio element represents a clip of audio media.
Element Name
audio
Usage
A required child of the par element unless its sibling text element refers to audio or video
media, in which case it is optional (see Embedded Audio and Video).
Attributes
id [optional]
The ID [XML] of this element, which must be unique within the document scope.
src [required]
The relative or absolute IRI reference [RFC3987] of an audio file. The audio file must be
one of the audio formats listed in the Core Media Types [Publications30] table.
clipBegin [optional]
A clock value that specifies the offset into the physical media corresponding to the
start point of an audio clip.
Clock values are a subset of SMIL clock values, defined in [SMIL]. See Appendix B,
Examples of Clock Values.
clipEnd [optional]
A clock value that specifies the offset into the physical media corresponding to the
---------------------- Page: 13 ----------------------
end point of an audio clip.
Clock values are a subset of SMIL clock values, defined in [SMIL]. See Appendix B,
Examples of Clock Values.
The chronological offset of the terminating position must be after the starting offset
specified in the clipBegin attribute.
Content Model
Empty.
› 3 Creating Media Overlays
› 3.1 Overview
This section is informative
A pre-recorded narration of a publication can be represented as a series of audio clips, each
corresponding to part of the EPUB Content Document. A single audio clip, for example, typically
represents a single phrase or paragraph, but infers no order relative to the other clips or to the text of a
document. Media Overlays solve this problem of synchronization by tying the structured audio narration
to its corresponding text (or other media) in the EPUB Content Document using SMIL markup. Media
Overlays are, in fact, a simplified subset of SMIL 3.0 that allow the playback sequence of these clips to
be defined.
The SMIL elements primarily used for structuring Media Overlays are body (used for the main sequence),
seq (sequence) and par (parallel). (Refer to Media Overlay Document Definition for more information on
these and other SMIL elements.)
The par element is the basic building block of an Overlay and corresponds to a phrase in the EPUB
Content Document. The element provides two key pieces of information for synchronizing content: 1) the
audio clip containing the narration for the phrase; and 2) a pointer to the associated EPUB Content
Document fragment. The par element uses two media element children to represent this information: an
audio element and a text element. Since par elements render their children in parallel, the audio clip and
EPUB Content Document fragment are played at the same time, resulting in a synchronized
presentation.
The text element src attribute references the associated phrase, sentence, or other segment of the
EPUB Content Document by its IRI reference. The audio element src attribute similarly references the
location of the corresponding audio clip, and adds the optional clipBegin and clipEnd attributes to
indicate a specific offset within the clip.
The following example shows the Media Overlays markup for a single phrase or sentence.
par elements are placed together sequentially to form a series of phrases or sentences. Not every
element of the EPUB Content Document will have a corresponding par element in the Media Overlay,
only those relevant to the audio narration.
---------------------- Page: 14 ----------------------
The following example shows a basic Media Overlay Document containing a sequence of phrases. The body element
acts as the main sequence for the whole document.
version="3.0">
par elements can also be added to seq elements to define more complex structures such as parts and
chapters (see Structure ).
› 3.2 Relationship to the EPUB Content Document
NOTE
In this section, the EPUB Content Document is assumed to be an XHTML Content Document.
While Media Overlays can be used with SVG Content Documents, playback behavior might not be
consistent and therefore interoperability is not guaranteed.
› 3.2.1 S tructure
The ordering of the Media Overlay elements must match the default reading order of the EPUB Content
Document. The par element represents phrases, and the seq element (sequence) represents nested
EPUB Content Document containers such as sections, asides, headers, and footnotes. seq children
must be other seq or par elements. Each seq element must contain an epub:textref attribute which
references the corresponding EPUB Content Document element by IRI reference.
The following example shows a Media Overlay Document with nested seq elements, representing a chapter with both a
section header and a sidebar, which itself has a nested figure.
xmlns:epub="http://www.idpf.org/2007/ops"
version="3.0">
epub:type="chapter">
---------------------- Page: 15 ----------------------
epub:type="sidebar">
The reason for grouping structures like sidebars, section headers, figures, tables, and footnotes in a seq
element is so that their start and end positions can be identified during playback. Reading Systems can
then offer playback options tailored to the layout of the Publication, such as jumping past a long sidebar,
turning off rendering of page break announcements (see Skippability and Escapability), or customizing
the reading mode to suit structures such as tables.
The following example shows the EPUB
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.