Information technology — Coding of audio-visual objects — Part 30: Timed text and other visual overlays in ISO base media file format

ISO/IEC 14496-30:2014 defines a storage format based on, and compatible with, the ISO Base Media File Format (ISO/IEC 14496-12 and ISO/IEC 15444-12), which is used by the MP4 file format (ISO/IEC 14496-14) and the Motion JPEG 2000 file format (ISO/IEC 15444-3) among others. ISO/IEC 14496-30:2014 enables timed text and subtitle streams to be used in conjunction with other media streams, such as audio or video, be used in an MPEG-4 systems environment, if desired, be formatted for delivery by a streaming server, using hint tracks, and inherit all the use cases and features of the ISO Base Media File Format on which MP4 and MJ2 are based.

Technologies de l'information — Codage des objets audiovisuels — Partie 30: Texte temporisé et autres recouvrements visuels dans le format ISO de base pour les fichiers médias

General Information

Status
Withdrawn
Publication Date
10-Mar-2014
Withdrawal Date
10-Mar-2014
Current Stage
9599 - Withdrawal of International Standard
Completion Date
07-Nov-2018
Ref Project

Relations

Buy Standard

Standard
ISO/IEC 14496-30:2014 - Information technology -- Coding of audio-visual objects
English language
12 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 14496-30
First edition
2014-03-15
Information technology — Coding of
audio-visual objects —
Part 30:
Timed text and other visual overlays
in ISO base media file format
Technologies de l’information — Codage des objets audiovisuels —
Partie 30: Texte temporisé et autres recouvrements visuels dans le
format ISO de base pour les fichiers médias
Reference number
ISO/IEC 14496-30:2014(E)
©
ISO/IEC 2014

---------------------- Page: 1 ----------------------
ISO/IEC 14496-30:2014(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2014 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 14496-30:2014(E)

Contents Page
Foreword .iv
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 2
5 General Definitions . 2
5.1 Layout . 2
5.2 Timing . 2
5.3 Language . 3
5.4 Resources shared by multiple samples . 3
6 Timed Text Markup Language (TTML) . 3
6.1 Introduction . 3
6.2 Layout . 3
6.3 Timing . 3
6.4 Track format . 5
6.5 Sample entry format . 5
6.6 Sample format . 5
6.7 Additional Considerations . 6
7 Web Video Text Tracks (WebVTT) . 7
7.1 Introduction . 7
7.2 Layout . 7
7.3 Timing . 7
7.4 Track format . 7
7.5 Sample entry format . 7
7.6 Sample format . 8
7.7 Converting to or from a WebVTT text file (Informative) . 9
7.8 Example (Informative) .10
Bibliography .12
© ISO/IEC 2014 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 14496-30:2014(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75 % of the national bodies
casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 14496-30 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC 14496 consists of the following parts, under the general title Information technology — Coding
of audio-visual objects:
— Part 1: Systems
— Part 2: Visual
— Part 3: Audio
— Part 4: Conformance testing
— Part 5: Reference software
— Part 6: Delivery Multimedia Integration Framework (DMIF)
— Part 7: Optimized reference software for coding of audio-visual objects [Technical Report]
— Part 8: Carriage of ISO/IEC 14496 contents over IP networks
— Part 9: Reference hardware description [Technical Report]
— Part 10: Advanced Video Coding
— Part 11: Scene description and application engine
— Part 12: ISO base media file format
— Part 13: Intellectual Property Management and Protection (IPMP) extensions
— Part 14: MP4 file format
— Part 15: Advanced Video Coding (AVC) file format
— Part 16: Animation Framework eXtension (AFX)
— Part 17: Streaming text format
— Part 18: Font compression and streaming
iv © ISO/IEC 2014 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 14496-30:2014(E)

— Part 19: Synthesized texture stream
— Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
— Part 21: MPEG-J Graphics Framework eXtensions (GFX)
— Part 22: Open Font Format
— Part 23: Symbolic Music Representation
— Part 24: Audio and systems interaction [Technical Report]
— Part 25: 3D Graphics Compression Model
— Part 26: Audio conformance
— Part 27: 3D Graphics conformance
— Part 28: Composite font representation
— Part 29: Web video coding
— Part 30: Timed text and other visual overlays in ISO base media file format
© ISO/IEC 2014 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC 14496-30:2014(E)

Introduction
This part of ISO/IEC 14496 defines a storage format based on, and compatible with, the ISO Base
Media File Format (ISO/IEC 14496-12 and ISO/IEC 15444-12), which is used by the MP4 file format
(ISO/IEC 14496-14) and the Motion JPEG 2000 file format (ISO/IEC 15444-3) among others. This part of
ISO/IEC 14496 enables timed text and subtitle streams to
— be used in conjunction with other media streams, such as audio or video,
— be used in an MPEG-4 systems environment, if desired,
— be formatted for delivery by a streaming server, using hint tracks, and
— inherit all the use cases and features of the ISO Base Media File Format on which MP4 and MJ2 are
based.
vi © ISO/IEC 2014 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 14496-30:2014(E)
Information technology — Coding of audio-visual
objects —
Part 30:
Timed text and other visual overlays in ISO base media file
format
1 Scope
This part of ISO/IEC 14496 describes the carriage of some forms of timed text and subtitle streams in
files based on the ISO base media file format (ISO/IEC 14496-12). The documentation of these forms
does not preclude other definition of carriage of timed text or subtitles; see, for example, 3GPP Timed
Text (3GPP TS 26.245).
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
1)
W3C Recommendation, Timed Text Markup Language 1.0, Second Edition
ISO/IEC 14496-12, Information technology — Coding of audio-visual objects — Part 12: ISO base media file
2)
format
3)
W3C Community Group Report, WebVTT:The Web Video Text Tracks Format
3GPP TS 26.245, Transparent end-to-end Packet switched Streaming Service (PSS); Timed text format
IETF RFC 3986, Uniform Resource Identifier (URI): Generic Syntax
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
timed text document
file-based representation of textual content, possibly XML, used to produce timed text streams and
possibly representing timed text track samples
3.2
timed text stream
stream of content, which when decoded results in textual content, possibly containing internal timing
values, to be presented at a given presentation time and for a certain duration
3.3
subtitle stream
timed text stream potentially also presenting images
1) http://www.w3.org/TR/ttaf1-dfxp/
2) ISO/IEC 14496-12 is technically identical to ISO/IEC 15444-12.
3) http://www.w3.org/2013/07/webvtt.html
© ISO/IEC 2014 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO/IEC 14496-30:2014(E)

3.4
internal timing value
value contained in the payload of a timed text stream sample representing a time, e.g. a start time, an
end time, or a duration, corresponding to a timed behaviour of a part or the whole of the sample
3.5
timed text track
ISOBMFF representation of a timed text stream
3.6
subtitle track
ISOBMFF representation of a subtitle stream
4 Abbreviated terms
For the purposes of this International Standard, the following abbreviated terms apply.
TTML Timed Text Markup Language
WebVTT Web Video Text Tracks
ISOBMFF ISO Base Media File Format
5 General Definitions
5.1 Layout
This subclause defines common layout behavior for processing of timed text or subtitle samples.
Unless specified by an embedding environment (e.g. an HTML page), the track header box information
(i.e. width, height) shall be used to size the subtitle or timed text track content with respect to the video;
otherwise, it may be ignored by the embedding environment. The width and height of the subtitle or
timed text track should be appropriate for the width and height of the video track (as declared in the
track header) it is intended to overlay, even if the video is not stored in an ISOBMFF file or stored as a
track in a different ISOBMFF file. A typical usage is that the timed text or subtitle track has the same
width and height as the underlying video, and no translation. For some timed text documents, the region
thus defined corresponds to the visual area filled by the rendering of the timed text documents.
Additional region positioning using the translation values tx and ty from the track header matrix, as
defined for 3GPP Timed Text tracks, may be used (see 3GPP TS 26.245, section 5.7, for the definition of
the text track region using tx, ty, and the track width and height).
NOTE The 3GPP region is not the same as a WebVTT region.
Unless specified by an embedding environment (e.g. an HTML page), visually composed tracks including
video, subtitle, and timed text shall be stacked or layered using the ‘layer’ value in the track header box.
The layer field provides the same functionality as z-index in TTML.
NOTE Timed text and subtitle tracks are normally stacked in front of the video.
5.2 Timing
This subclause defines common timing behavior for processing of timed text or subtitle samples.
The general processing of timed text or subtitle tracks is that the text content of the sample is delivered
to the decoder at the sample decode time, at the latest. The rendering of the sample happens at the
composition time, taking into account edit lists if any, and for the whole sample duration, without timing
behavior. However, timed text or subtitle sample data of specific formats may contain internal timing
2 © ISO/IEC 2014 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 14496-30:2014(E)

values. Internal timing values may alter the rendering of the sample during its duration as specified by
the timed text or subtitle format.
NOTE If an internal timing value does not fall in the time interval corresponding to the sample composition
time and sample composition time plus sample duration, the rendering of the sample may be different from
the rendering of the same sample data with a composition time such that the internal timing value lies in the
associated composition interval.
The subclauses defining the storage of specific formats in the ISOBMFF specify how internal timing
values relate to the track time or to the sample decode or composition time (see 6.3 and 7.3). For instance,
start or end times may be relative to the start of the sample, or the start of the track.
For sections of the track timeline that have no associated subtitles or timed text content, ‘empty’ samples
may be used, as defined for each format, or the duration of the preceding sample extended. Samples with
a size of zero are not used.
The timescale field in the media header box should be set appropriately to achieve the desired timing
accuracy; it is recommended to be set to the value of the timescale field in the corresponding video
track’s media header box.
5.3 Language
Timed text tracks should be marked with a suitable language in the media header box, indicating the
audience for whom the track is appropriate. In the case where it is suitable for a single language, the
media header must match that declared language. The value ‘mul’ may be used for a multi-lingual text.
5.4 Resources shared by multiple samples
Common resources, such as images and fonts, that are referred to by URLs, may be stored as items in a
MetaBox as defined by ISO/IEC 14496-12. These items may be addressed by using the item_name as a
relative URL in the timed text sample, as defined by 8.11.9 of ISO/IEC 14496-12.
NOTE A derived specification, with its applicable brand, may restrict this use of meta boxes for common
items.
Fonts not supplied with the content may be already present on the target system(s), or supplied using
[1]
any suitable supported mechanism (e.g. font streaming as defined in ISO/IEC 14496-18 ).
6 Timed Text Markup Language (TTML)
6.1 Introduction
This subclause describes how documents based on TTML, as defined by the W3C, and derived
[2]
specifications (for example SMPTE-TT ), are carried in files based on the ISO base media file format.
6.2 Layout
Subclause 5.1 defines the general layout behaviour for timed text and subtitle tracks. In particular, this
means for TTML tracks that the track width and height provide the spatial extent of the root container,
as defined in the TTML Recommendation. Any ‘extent’ attribute declared on the ‘tt’ element in the
contained TTML document shall match the track width and height.
6.3 Timing
The top-level internal timing values in the timed text samples based on TTML express times on the track
presentation timeline – that is, the track media time as optionally modified by the edit list. For example,
the begin and end attributes of the element, if used are relative to the start of the track, not
relative to the start of the sample. This is shown in the figure below, using W3C TTML syntax.
© ISO/IEC 2014 – All rights reserved 3

---------------------- Page: 9 ----------------------
ISO/IEC 14496-30:2014(E)

In Figure 1, the sample composition time of each samples are 0, 30 minutes, and 1 hour, which
correspond to the time at which the decoder will present the TTML content. The first sample, as per
the TTML Recommendation, will not display any content in the first minute or after 2 minutes, and
again, per TTML, will remain as such until the next sample is processed. The s
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.