Digital publishing -- EPUB3 preservation

The ISO/IEC TS 22424 series supports long-term preservation of EPUB publications via a dual strategy. This document considers EPUB features from a long-term preservation point of view. Some EPUB features are forbidden and some others required, depending on how they relate to a long-term preservation. EPUB publications constructed according to these guidelines are suitable for preservation. ISO/IEC TS 22424-2 makes EPUB compliant with Open Archival Information System (OAIS) and current practices of OAIS archives.

Publications numériques -- EPUB3 preservation

General Information

Status
Published
Publication Date
28-Jan-2020
Current Stage
6060 - International Standard published
Start Date
13-Dec-2019
Completion Date
29-Jan-2020
Ref Project

Buy Standard

Technical specification
ISO/IEC TS 22424-1:2020 - Digital publishing -- EPUB3 preservation
English language
25 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

TECHNICAL ISO/IEC TS
SPECIFICATION 22424-1
First edition
2020-01
Digital publishing — EPUB3
preservation —
Part 1:
Principles
Publications numériques — EPUB3 preservation —
Partie 1: Principes
Reference number
ISO/IEC TS 22424-1:2020(E)
ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC TS 22424-1:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC TS 22424-1:2020(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Abbreviated terms .............................................................................................................................................................................................. 9

5 Packaging standards......................................................................................................................................................................................... 9

6 Construction of OAIS information packages .........................................................................................................................11

6.1 Overview ...................................................................................................................................................................................................11

6.2 General principles .............................................................................................................................................................................12

6.2.1 EPUB publications shall be sent to a repository system as well-formed

and complete submission information packages (SIPs) .............................................................12

6.2.2 Regardless of its type or format, it shall be possible to include any data or

metadata in SIPs ............................................................................................................................................................14

6.2.3 It should be possible to transfer SIPs by any means, methods, or tools

from the submitting organization to the repository system ...................................................16

6.2.4 The archive shall have a way to verify the identity of the submitting
organization/person, no matter how the information packages are transferred 16

6.2.5 There is no 1:1 relation between OAIS information packages ..............................................16

6.2.6 A SIP may contain 0-n EPUB 3 publications, and one EPUB 3 publication

may be submitted to the repository system in 1-n SIPs .............................................................16

6.2.7 The information package type (in this case, SIP) shall be indicated ................................16

6.2.8 SIP packaging method shall not restrict the application of any

preservation method .................................................................................................................................................17

6.2.9 The packaging method shall not limit the size of the SIP ..........................................................17

6.3 Identification of information packages and their content ..............................................................................17

6.3.1 It shall be possible to identify any SIP uniquely both during and after the

ingest process ..................................................................................................................................................................17

6.3.2 Information objects (EPUB publications, PREMIS preservation metadata

record, etc.) within SIPs shall be identified uniquely and persistently .........................17

6.3.3 EPUB Fragment Identifiers should not be used in EPUB publications sent
to a repository system, unless the submission agreement explicitly allows

their use ................................................................................................................................................................................18

6.4 Structure of information packages .....................................................................................................................................18

6.5 Generic Information package metadata .........................................................................................................................19

6.5.1 Metadata in information packages shall be based on standards ........................................19

6.5.2 Metadata should allow (automatic) validation of the structure and

content of SIPs in terms of integrity, fixity, and syntax ................................................................19

6.5.3 It shall be possible to edit metadata in information packages ..............................................19

Annex A (informative) EPUB and digital preservation: issues and recommendations.................................20

Bibliography .............................................................................................................................................................................................................................24

© ISO/IEC 2020 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC TS 22424-1:2020(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that

are members of ISO or IEC participate in the development of International Standards through

technical committees established by the respective organization to deal with particular fields of

technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other

international organizations, governmental and non-governmental, in liaison with ISO and IEC, also

take part in the work.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for

the different types of document should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject

of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent

rights. Details of any patent rights identified during the development of the document will be in the

Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC

list of patent declarations received (see http:// patents .iec .ch).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/

iso/ foreword .html.

This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,

Subcommittee SC 34, Document description and processing languages.

A list of all parts in the ISO/IEC TS 22424 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC TS 22424-1:2020(E)
Introduction
0.1 General

This document facilitates the long-term preservation of EPUB publications by specifying in general level

EPUB features which are mandatory for long-term preservation (such as font embedding) and features

which should be avoided if possible.

This document can be seen as a stepping stone towards a detailed specification which would be related

to EPUB in the same way as PDF/A, specified in ISO 19005-1 to ISO 19005-3, is related to the Portable

Document Format (PDF). If and when the EPUB community develops detailed guidelines for the

production of archivable EPUB publications, this document could be used as one of the starting points.

Long-term preservation in general requires two things:

— making the object such as EPUB publication fit for preservation – including features to be used and

features to avoid;

— packaging the object (and any metadata related to it) together with any additional data such as

other versions of the object and other documentation into an Open Archival Information System

(OAIS) submission information package (SIP).
Packaging is covered in ISO/IEC TS 22424-2.
0.2 EPUB
The EPUB standard

defines a distribution and interchange format for digital publications and documents. The EPUB® format

provides a means of representing, packaging and encoding structured and semantically enhanced Web

[17]

content — including HTML, CSS, SVG and other resources — for distribution in a single-file container.

EPUB format was developed by the International Digital Publishing Forum, IDPF, which merged with

the World Wide Web Consortium, W3C, in January 2017. Ongoing technical development of the standard,

related extension specifications and ancillary deliverables are the responsibility of the W3C EPUB 3

Community Group , which published its charter in February 2017. According to the charter,

work on any future major revision of EPUB, e.g. an EPUB 4, is initially out of scope on the presumption that

this will be taken up by a new W3C WG as a W3C Recommendation Track activity. The EPUB 3 CG will

coordinate its work with such new WG, and meanwhile with the existing W3C Digital Publishing Interest

[23]
Group (DPUB IG).

The International Digital Publishing Forum, IDPF, has ceased operations as a membership organization

in January 2017, and its website is now an archive. The latest version of the standard and information

about future EPUB developments is available at the Publishing@ W3C webpage, https:// www .w3 .org/

publishing/ .
3) 4)

The specification at hand covers EPUB 3 versions up to EPUB 3.0.1 . EPUB 3.1 was the first major

revision of EPUB 3.0.1, but there are no implementations of version 3.1 and therefore it is not covered

in this document. The most widely used version of the standard is still 3.0.1. EPUB 3.2, was published in

May 2019 . Unlike 3.1, it is fully backwards compatible with 3.0.1. It will be covered in the next edition

of this document.
1) https:// www .w3 .org/ publishing/ groups/ epub3 -cg/
2) http:// idpf .org/
3) http:// idpf .org/ epub/ 301
4) https:// www .w3 .org/ Submission/ epub31/

5) https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -spec .html

© ISO/IEC 2020 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO/IEC TS 22424-1:2020(E)
Differences between EPUB specifications 2.0.1-3.2 are well documented:
— EPUB 3 Changes from EPUB 2.0.1
— EPUB 3.0.1 Changes from EPUB 3.0
— EPUB 3.2 Changes from EPUB 3.0.1

All EPUB specifications are available in the Web; 2.01 at http:// idpf .org/ epub/ 201, EPUB 3.0.1 at http://

idpf .org/ epub/ 301 and 3.2 at https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -spec .html.

All EPUB publications, including ones using version 3.2, can be validated using EPUBCheck version

4.2.0, which was released in March 2019.

From long-term preservation point of view, lack of backward compatibility between successive versions

of a file format would be a problem because it makes migration more challenging. In addition, EPUB

3.1 has at least one feature which would have been problematic. In EPUB 3.1 foreign resources do not

require fallbacks if they are not in the spine and not embedded in EPUB Content Documents. In EPUB

3.0.1, fallback guarantees that there is a version of the document that can be rendered; in 3.1 such

guarantee no longer exists.
EPUB 3.0.1 was prepared by the IDPF. It consists of six interlinked documents:
— EPUB 3 Overview
— Publications 3.0.1
— Canonical fragment identifiers
— Content documents 3.0.1
— Media overlays 3.0.1
— Open Container Format 3.0.1

There are several extension specifications to these EPUB base standards. The list below is incomplete,

as it contains mainly specifications that are relevant from the long-term preservation point of view.

Some of them are still drafts:

— EPUB Accessibility specification 1.0 addresses evaluation and certification of accessible EPUB

publications, and discovery of the accessible qualities in such publications.
10)

— EPUB Previews 1.0 describes how content previews can be included in EPUB publications.

11)

— EPUB Distributable Objects 1.0 is a draft specification that defines a method for the encapsulation,

transportation, and integration of distributable objects in EPUB publications.
12)

— EPUB Scriptable Components 1.0 provides an interoperable publish and subscribe (pubsub)

pattern by which interactive content can be created and incorporated into EPUB publications. Same

as EPUB Distributable Objects, it is as of 2019-05-13 a draft.
6) http:// www .idpf .org/ epub/ 30/ spec/ epub30 -changes -20111011 .html
7) http:// www .idpf .org/ epub/ 301/ spec/ epub -changes -20140626 .html

8) https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -changes .html

9) http:// www .idpf .org/ epub/ a11y/ accessibility .html
10) http:// www .idpf .org/ epub/ previews/ epub -previews -20150826 .html
11) http:// www .idpf .org/ epub/ do/
12) http:// www .idpf .org/ epub/ sc/ api/
vi © ISO/IEC 2020 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/IEC TS 22424-1:2020(E)
13)

— EPUB Scriptable Components Packaging and Integration 1.0 is a draft that defines a method for

the creation and inclusion of dynamic and interactive components in EPUB publications.

14)

— EPUB Multiple-Rendition Publications 1.0 defines the creation and rendering of EPUB publications

consisting of more than one rendition of the same publication.
15)

— EPUB Dictionaries and Glossaries 1.0 provides a means for expressing dictionary and glossary

semantics in EPUB publications.

These extensions are not widely used and they have not been explicitly taken into account in this

document. As regards accessibility, all EPUB publications are supposed to be accessible. However,

accessibility features as such do not have an impact on long term preservation of EPUB publications and

therefore this document does not make accessibility-related requirements.

EPUB 3 core media types have been listed at https:// www .w3 .org/ publishing/ epub3/ epub -spec .html

#sec -core -media -types. As of 2019-05-13, the latest change has been made on April 1, 2018. Starting

from EPUB 3.2, core media types are part of the standard.

In 2014, EPUB 3.0 specifications were republished as ISO/IEC TS 30135-1 to ISO/IEC TS 30135-6. Each

of these six ISO specifications is identical to its IDPF equivalent, for example ISO/IEC TS 30135-1 has

exactly the same content as the EPUB 3.0 Overview.

ISO/IEC TS 30135-7 entitled "Part 7: EPUB3 Fixed-Layout Documents" is from EPUB 3.0.1 (EPUB 3.0

does not have fixed layout specification). ISO/IEC TS 30135 (all parts) is therefore a combination of

EPUB 3.0 and Fixed-Layout Documents specification from 3.0.1.

ISO/IEC JTC 1/SC 34 is currently updating the ISO standard to match fully the version 3.0.1.

EPUB is a rich document format with a lot of features. From the digital preservation point of view this

is a challenge, not least because long-term preservation has not been a priority in the development of

the standard. Preserving all aspects and features of EPUB publications may be difficult, since there are

features which are difficult to preserve. Moreover, EPUB reading systems usually do not support all

features of the specification and finding tools supporting rare features can be difficult.

In spite of these challenges EPUB is generally regarded as a suitable format for digital archiving. For

instance, the Finnish National Digital Library initiative has selected just eight archivable file formats

for text, EPUB being one of them. The selection criteria were openness/transparency, adoption as a

preservation standard, degree of forward/backward compatibility, degree of protection against file

corruption, frequency of version releases, dependencies/interoperability, and standardization. EPUB

got an A, the best grade, from everything else except the second and third criterion. For those, the

grade was the second best, a B (see Reference [19], p.40). Based on these generic criteria, EPUB seems

to provide a good basis for long-term preservation, although additional guidelines on how to use the

standard are needed to guarantee EPUB files can be preserved efficiently.

The British Library’s Digital Preservation Team has published an assessment of EPUB as a preservation

[15]

format . It covers EPUB versions 3.0.1 and 2 and the overall view of EPUB is positive (Reference

[15], p.2):

EPUB 3 is currently the closest thing available to an open standard for e-books. In 2013, Bläsi and Rothlauf

concluded that EPUB 3 had the “highest expressive power” of all formats in the e-book ecosystem, and that

it included the superset of all features used in proprietary formats like KF8, Fixed Layout EPUB, and iBooks.

EPUB long-term preservation issues uncovered in the assessment of the British Library are discussed

in Annex A.

EPUB is enjoying reasonable support in the e-book market. Many suppliers, publishers, and application

developers who have supported EPUB 2 have implemented version 3.0.1. According to the EPUBTest web

13) http:// www .idpf .org/ epub/ sc/ pkg/
14) http:// www .idpf .org/ epub/ renditions/ multiple/
15) http:// www .idpf .org/ epub/ dict/
© ISO/IEC 2020 – All rights reserved vii
---------------------- Page: 7 ----------------------
ISO/IEC TS 22424-1:2020(E)
16)

site , EPUB 3 support in reading systems is far from exhaustive, but market coverage is good – in January

2018, there were 59 reading systems supporting at least some of the features specified in EPUB 3.0.

E-book suppliers have produced EPUB 3 based formats that incorporate digital rights management

(DRM), and EPUB modifications that may restrict using the format on other than the suppliers’ own

platforms. For example, the Kindle Fire eReader, released in 2015, uses a new format called Kindle

Format 8 (KF8), which is partly based on EPUB 3, with Amazon’s DRM. See Reference [15], 3. Publisher/

supplier specific DRM often restricts the use of e-books to that publisher’s/supplier’s rendering devices

and/or applications, and is therefore a major obstacle to digital preservation (see Reference [15], p.7).

The EPUB specification does not enforce a particular digital rights management scheme, but DRM may

be layered on top of the EPUB specifications. A producer can, for instance, use one of the three major

rights management systems in the market (Amazon DRM, Apple FairPlay DRM for books bought from

iBooks, and Adobe DRM), or some other DRM system along with some additional platform-targeting.

DRM protection should be removed from EPUB publications during pre-ingest by the producer or as a

part of the ingest process by the OAIS archive. In practice, only national libraries may be able to do this,

provided that legal deposit act and / or copyright act guarantee them such privilege. If migration is the

chosen preservation strategy, existing EPUB publications will be converted into more modern EPUB

versions when rendering tools for old versions are no longer available, and (eventually) migrated into

other formats.

If preserved EPUB publications are not directly accessible by the public, removing DRM, digital

watermarking, and other protection mechanisms from the archived documents is not a risk. When

publications are delivered to the customers as dissemination information packages (DIPs), the archive

shall use a combination of administrative and technical means to protect the documents as required in

the submission agreement. These means may include adding DRM protection mechanism into the DIP

submitted to the user according to the requirements of the submission agreement. The agreement may

also specify the customers the archive is entitled to serve; for instance, it is possible to require that the

preserved documents can only be disseminated to the producer, and the producer will serve the end-

users who do not have direct access the OAIS archive.
0.3 Digital preservation

The information society is dependent on successful long-term digital preservation. When an increasing

percentage of information is produced and published only in a digital format, it is important to make

sure that this information remains available in the distant future.

Digital preservation is not about preserving just bits, but about preserving access. The “business logic”

is as follows:
— we need software and hardware to render content for human users;

— software changes over time; there are new versions from old applications, and entirely new

applications;

— new or updated applications may not be able to render outdated file formats or format versions

correctly

— digital preservation makes an effort to have all archived content in stable formats. Publications

should also contain the smallest possible amount of features which are not commonly supported

in software packages used to render the content in these formats, and also avoid adding links to

external resources since then the long-term access to the publication requires also persistence of

these external resources.

— when necessary, data in old formats may be migrated into more modern formats or updated versions

of the same format. For instance, an e-book in EPUB 3.0.1 format may be migrated to EPUB 3.2.

when version 3.0.1 is no longer widely supported by reading systems.
16) http:// epubtest .org/ results
viii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/IEC TS 22424-1:2020(E)

— since the aim is to preserve the content, not the bits, the bits may change as a result of version

updates and format migrations.

— Many OAIS archives preserve successive versions of archives publications, because migration may

change the look and feel of the original document, or even its intellectual content.

In many countries, national libraries are responsible for preserving the published cultural heritage for

the future generations, while national archives take care of governmental publications, irrespective of

which format they are available in. All of these resources have to be preserved for decades, centuries

even. Then again, publishers may guarantee continuous access to the subscribers of electronic serials

and other licensed content. If this is so, either the publisher or a third-party should look after the

publications and make sure they remain accessible or at least available.

Ordinary digital asset management systems are not suitable for long-term preservation; therefore it is a

normal practice to separate short-term and long-term information management into different systems.

However, this does not mean that digital archiving is independent of the routine life cycle of documents.

Digital preservation is a long process that begins when publications are created.

Preservation metadata, which allows the publication to be found, rendered and authenticated

correctly, is a prerequisite for digital preservation. Some preservation metadata elements can or

should be provided by the original creator of the publication. It is also important to keep preservation

requirements in mind when preparing a publication, if it is known that it has to be preserved for a long

time. Any feature in a file format can be either essential, useful, neutral, questionable, or even downright

counterproductive from a long-term preservation point of view. However, publishers are likely to use

the features that let them achieve their own goals, and preservation may not be among them.

There are archivable versions of some file formats. PDF/A (ISO 19005-1:2005) is probably the best known

example. It specifies how to use the PDF for long-term preservation. An example of a counterproductive

feature for preservation in PDF is font referencing; therefore in PDF/A all fonts shall be embedded in

order to guarantee that the document can be rendered correctly.

PDF/A forbids also the use of encryption, because encryption is generally regarded as a risk for long-

term preservation. But storing unencrypted documents is a risk as well, because if they are stolen, non-

[25]

authorized usage is easy. Therefore, according to the Digital preservation handbook :

Information security methods such as encryption add to the complexity of the preservation process and should

be avoided if possible for archival copies. Other security approaches may therefore need to be more rigorously

applied for sensitive unencrypted files; these might include restricting access to locked-down terminals in

controlled locations (secure rooms), or strong user authentication requirements for remote access.

In order to guarantee the correct processing of PDF/A files, there are specific requirements for PDF/A

reading systems, such as support for embedded fonts. There are three versions of the specification:

PDF/A-1 is based on PDF 1.4, PDF/A-2 adds features from PDF 1.5, 1.6 and 1.7, and PDF/A-3 contains all

the features of PDF/A-2 as well as allows the embedding of other file formats into PDF/A conforming

[21]
documents .

The TI/A (Tagged Image for Archival) standard initiative intended to create an ISO recommendation

to optimize the format specification for archival purposes. Unfortunately the project was disbanded

in 2016, and the TI/A draft the initiative completed in September 2016 is only available in the project

Intranet. However, the original TIFF/A (later TI/A) draft from February 2015 is a public document

17)

available on a PREFORMA project web site . Although this TIFF/A specification is only a draft, it is

probably a good idea to use in archival TIFF images features specified mandatory in the specification,

and avoid the ones which are forbidden.

The motivation behind the TI/A initiative can be applied to other image formats as well, and there are

also points the EPUB community might agree with Reference [22]:
17) http:// www .preforma -project .eu/ dpf -manager .html
© ISO/IEC 2020 – All rights reserved ix
---------------------- Page: 9 ----------------------
ISO/IEC TS 22424-1:2020(E)

The versatility of the TIFF format has made it very attractive for memory institutions for long-term archival

of their digital images. However, since the TIFF format offers such a great flexibility, it is not guaranteed

that in the future a standard TIFF reader will be able to read some TIFF images.

The limitations of the baseline TIFF are too severe for many applications in digital archiving. It is important

that, besides crucial technical metadata such as ICC color profiles (in case of color images) also important

descriptive metadata is stored within the image file. Having descriptive metadata available (such as content

description, iconography, copyright and ownership information etc.) is crucial for every archive. Having this

information in the same file as the image data guarantees that this information will always be associated

with the image.

TIFF is not an EPUB core media type, but four other image types have been listed; GIF, JPEG, PNG,

and SVG. It is significant from a digital preservation point of view how these formats and other core

media types are used in the EPUB context. Image and audio files embedded in an EPUB publication may

require migration before the EPUB publication itself has to be migrated into a more modern file fo

...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.