ISO/IEC 23000-2:2006
Information technology — Multimedia application format (MPEG-A) — Part 2: MPEG music player application format
Information technology — Multimedia application format (MPEG-A) — Part 2: MPEG music player application format
ISO/IEC 23000-2:2006 specifies a method to import MPEG-1 Layer III coded media files (i.e. MP3) that may contain ID3V1 meta-data into the framework of MPEG-4 as an MPEG-4 file. Optionally, the media file can have an associated JPEG image (e.g. album cover art). The major components of the specification are: a method to convert the Layer III bitstream into a series of MPEG-4 Access Units suitable for storage in an MPEG-4 file; a method to represent the ID3V1 meta-data using a concise set of MPEG-7 constructs, and place that MPEG-7 description into the MPEG-4 file; a method to encapsulate a JPEG compressed image into the MPEG-4 file. Additional components permit using the MPEG-21 file format, which supports a more precise association of coded audio, meta-data and image information, and also permits the creation of a music album via encapsulation of a set of MPEG-4 files (i.e. songs) within a single MPEG-21 file.
Technologies de l'information — Format pour application multimédia (MPEG-A) — Partie 2: Format pour application musicienne MPEG
Information technology — Multimedia application format
(MPEG-A) —
Part 2:
MPEG music player application format
1 Scope
ISO/IEC 23000-2 specifies a method to import MPEG-1 Layer III coded media files (i.e. MP3) that may contain
ID3V1 meta-data into the framework of MPEG-4 as an MPEG-4 file. Optionally, the media file can have an
associated JPEG image (e.g. album cover art). The major components of this part of ISO/IEC 23000 are:
⎯ a method to convert the Layer III bitstream into a series of MPEG-4 Access Units suitable for storage in
an MPEG-4 file;
⎯ a method for representing the ID3V1 meta-data using a concise set of MPEG-7 constructs, and how to
place that MPEG-7 description into the MPEG-4 file;
⎯ how to encapsulate a JPEG compressed image into the MPEG-4 file.
Additional components permit using the MPEG-21 file format, which supports a more precise association of
coded audio, meta-data and image information, and also permits the creation of a “music album” via
encapsulation of a set of MPEG-4 files (i.e. songs) within a single MPEG-21 file.
2 Overview of MPEG Standards
2.1 MPEG-1 Layer III
ISO/IEC 11172-3:1993 specifies MPEG-1 Audio [1]. From that specification, MPEG-1 Layer III (or MP3) is one
of the most widely deployed MPEG audio standards ever. Its wide appeal is due to both its good compression
performance and its simplicity of implementation. The vast majority of compressed music archives use MP3
One aspect of the simplicity of Layer III is that it specifies a self-synchronizing transport, making it amenable
to both storage in a computer file and transmission over a channel without byte framing. In the context of
transmission channels, Layer III can operate over a constant-rate isocronous link, and has constant-rate
headers (as does Layer I and II). However Layer III is an instantaneously-variable-rate coder, which adapts to
the constant-rate channel by using a “bit buffer” and “back pointers.” Each of the headers signals the start of
another block of audio signal, however due to the Layer III syntax, the data associated with that next block of
audio signal may be in a prior segment of the bitstream, pointed to by the back pointer (see Figure 1,
specifically the curved arrow pointing to main_data_begin). We note that this is in contrast to the MPEG-4
view of data stream segmentation, in which one Access Unit contains all information necessary to decode one
segment of audio.
© ISO/IEC 2006 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO/IEC 23000-2:2006(E)
header header header header
frame 1 frame 2 frame 3 frame 4
main_data_begin 1 main_data_begin 2 main_data_begin 3 main_data_begin 4
main info 4
main info 1 main info 2 main info 3
Figure 1 — Layer III bitstream organization
2.2 MPEG-4 “MPEG-1/2 Audio in MPEG-4”
ISO/IEC 14496-3:2005 [3] specifies a method for segmenting and formatting Layer III bitstreams into MPEG-4
Access Units (MPEG-1/2 Audio in MPEG-4), and therefore is often referred to as “MP3onMP4”. This consists
primarily of re-arranging the compressed data associated with a given header such that it follows the header.
This typically results in new segments that are no longer of constant length, but that is perfectly in accordance
with the definition of MPEG-4 Access Units. See example in Figure 2.
Figure 2 — Converting an MPEG-1/2 Layer 3 bitstream into mp3_channel_elements
2.3 ISO Base Media File Format
The ISO Base Media File Format is designed to contain timed media information for a presentation in a
flexible, extensible format that facilitates interchange, management, editing, and presentation of the media.
The ISO Base Media File Format is a base format for media file formats. In particular, the MPEG-4 file format
derives from this base file format.
2 © ISO/IEC 2006 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC 23000-2:2006(E)
ISO file
Movie data Media data
trak (video)
…other boxes
Interleaved, time-ordered, video
trak (audio) and audio frames
Figure 3 — Example of a simple ISO file used for interchange, containing two streams
The file structure is object-oriented as shown in Figure 3 which means that a file can be decomposed into
constituent objects very simply, and the structure of the objects inferred directly from their type. The file format
is designed to be independent of any particular network protocol while enabling efficient support for them in
2.4 The ISO Base Media and MPEG-4 File Formats
ISO/IEC 14496-12:2005 [4], and ISO/IEC 14496-14:2003 [5] together specify the MPEG-4 File Format. This
supports storage of compressed audio data (e.g. MP3onMP4) in tracks. It also provides support for metadata
in the form of ‘meta’ boxes at the File, Movie and Track level. This allows support for static (un-timed) meta-
data. Figure 4 schematically illustrates the location of these un-timed MPEG-7 Metadata boxes. Subclause 3.3
provides details as to when the Metadata boxes at each level are used.
meta Mdat
Figure 4 — Support of Static un-timed Metadata in ISO/MP4 Files
2.5 MPEG-7 Multi-Media Description Scheme
ISO/IEC 15938-5:2003, the Multimedia Description Scheme (MDS) [6] specifies all non-Visual and non-Audio
specific metadata (e.g. Artist, Title, Date) in the MPEG-7 standard. As such it is able to represent all of the
information found in the popular ID3V1 [7] metadata specification system.
© ISO/IEC 2006 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO/IEC 23000-2:2006(E)
3 A System for Archiving and Playing a Music Library
3.1 Overview
This specification presents a simple architecture for constructing an annotated music library. It defines a
process, based completely on MPEG-4 and MPEG-7 standardized modules, for importing mp3 encoded
music files containing ID3 tags into this architecture. This is shown in Figure 5.
ISO/IEC 11172-3 Layer III (“MP3”) [1]-[2] specifies a music compression scheme that results in a sequence of
bits, or bitstream. In contrast, ISO/IEC 14496-3:2005 [3] specifies a music compression scheme that results in
a sequence of packets which can be stored directly into the MPEG-4 File Format, specified in
ISO/IEC 14496-14. [4] [5]
The first module required in this architecture is a specification to translate an MP3 bitstream into a series of
MP3 packets. This is accomplished by the MP3onMP4 formatter, specified in ISO/IEC 14496-3:2005 [3].This
formatter reads a standard MP3 file (i.e. a bitstream) and converts it to a series of packets (called Access
Units in MPEG-4 terminology) that can be loaded into an MPEG-4 File.
MP3 file with ID3 MP3onMP4
tags formatter
MP4 File
Extract ID3tags
and express in
Figure 5 — Encoder System Architecture
The MPEG-4 File supports both compressed media (i.e. MP3), and associated metadata, typically ID3V1 tags
[6]. This tag information is easily representable using MPEG-7 nomenclature, as specified in [8]. The specific
mapping from ID3V1.1 tags to MPEG-7 metadata is show in Table 1. Parenthetical comments under Artist
clarify that MPEG-7 is able to make a distinction between Artist as a person and and Artist as a group name.
Table 1 — Mapping from ID3 V1.1 Tags to MPEG-7
ID3 V1 Description MPEG-7 Path
Artist Artist performing CreationInformation/Creation/Creator[Role/@href=”urn:mpeg:m
the song peg7:RoleCS:2001:PERFORMER”]/Agent[@xsi:type=”PersonTy
pe”]/Name/{FamilyName, GivenName} (Artist Name)
oupType”]/Name (Group Name)
Album Title of the album CreationInformation/Creation/Title[@type=”albumTitle”]
Song Title Title of the song CreationInformation/Creation/Title[@type=”songTitle”]
Year of the
Year CreationInformation/CreationCoordinates/Date/TimePoint
4 © ISO/IEC 2006 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC 23000-2:2006(E)
ID3 V1 Description MPEG-7 Path
recording (Recording date.)
Comment Any comment of CreationInformation/Creation/Abstract/FreeTextAnnotation
any length
Track CD track number Semantics/Seman
