ISO/IEC TS 22424-1:2020
(Main)Digital publishing — EPUB3 preservation — Part 1: Principles
Digital publishing — EPUB3 preservation — Part 1: Principles
The ISO/IEC TS 22424 series supports long-term preservation of EPUB publications via a dual strategy. This document considers EPUB features from a long-term preservation point of view. Some EPUB features are forbidden and some others required, depending on how they relate to a long-term preservation. EPUB publications constructed according to these guidelines are suitable for preservation. ISO/IEC TS 22424-2 makes EPUB compliant with Open Archival Information System (OAIS) and current practices of OAIS archives.
Publications numériques — EPUB3 preservation — Partie 1: Principes
General Information
Relations
Standards Content (Sample)
TECHNICAL ISO/IEC TS
SPECIFICATION 22424-1
First edition
2020-01
Digital publishing — EPUB3
preservation —
Part 1:
Principles
Publications numériques — EPUB3 preservation —
Partie 1: Principes
Reference number
©
ISO/IEC 2020
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 9
5 Packaging standards. 9
6 Construction of OAIS information packages .11
6.1 Overview .11
6.2 General principles .12
6.2.1 EPUB publications shall be sent to a repository system as well-formed
and complete submission information packages (SIPs) .12
6.2.2 Regardless of its type or format, it shall be possible to include any data or
metadata in SIPs .14
6.2.3 It should be possible to transfer SIPs by any means, methods, or tools
from the submitting organization to the repository system .16
6.2.4 The archive shall have a way to verify the identity of the submitting
organization/person, no matter how the information packages are transferred 16
6.2.5 There is no 1:1 relation between OAIS information packages .16
6.2.6 A SIP may contain 0-n EPUB 3 publications, and one EPUB 3 publication
may be submitted to the repository system in 1-n SIPs .16
6.2.7 The information package type (in this case, SIP) shall be indicated .16
6.2.8 SIP packaging method shall not restrict the application of any
preservation method .17
6.2.9 The packaging method shall not limit the size of the SIP .17
6.3 Identification of information packages and their content .17
6.3.1 It shall be possible to identify any SIP uniquely both during and after the
ingest process .17
6.3.2 Information objects (EPUB publications, PREMIS preservation metadata
record, etc.) within SIPs shall be identified uniquely and persistently .17
6.3.3 EPUB Fragment Identifiers should not be used in EPUB publications sent
to a repository system, unless the submission agreement explicitly allows
their use .18
6.4 Structure of information packages .18
6.5 Generic Information package metadata .19
6.5.1 Metadata in information packages shall be based on standards .19
6.5.2 Metadata should allow (automatic) validation of the structure and
content of SIPs in terms of integrity, fixity, and syntax .19
6.5.3 It shall be possible to edit metadata in information packages .19
Annex A (informative) EPUB and digital preservation: issues and recommendations.20
Bibliography .24
© ISO/IEC 2020 – All rights reserved iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see http:// patents .iec .ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 34, Document description and processing languages.
A list of all parts in the ISO/IEC TS 22424 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved
Introduction
0.1 General
This document facilitates the long-term preservation of EPUB publications by specifying in general level
EPUB features which are mandatory for long-term preservation (such as font embedding) and features
which should be avoided if possible.
This document can be seen as a stepping stone towards a detailed specification which would be related
to EPUB in the same way as PDF/A, specified in ISO 19005-1 to ISO 19005-3, is related to the Portable
Document Format (PDF). If and when the EPUB community develops detailed guidelines for the
production of archivable EPUB publications, this document could be used as one of the starting points.
Long-term preservation in general requires two things:
— making the object such as EPUB publication fit for preservation – including features to be used and
features to avoid;
— packaging the object (and any metadata related to it) together with any additional data such as
other versions of the object and other documentation into an Open Archival Information System
(OAIS) submission information package (SIP).
Packaging is covered in ISO/IEC TS 22424-2.
0.2 EPUB
The EPUB standard
defines a distribution and interchange format for digital publications and documents. The EPUB® format
provides a means of representing, packaging and encoding structured and semantically enhanced Web
[17]
content — including HTML, CSS, SVG and other resources — for distribution in a single-file container.
EPUB format was developed by the International Digital Publishing Forum, IDPF, which merged with
the World Wide Web Consortium, W3C, in January 2017. Ongoing technical development of the standard,
related extension specifications and ancillary deliverables are the responsibility of the W3C EPUB 3
1)
Community Group , which published its charter in February 2017. According to the charter,
work on any future major revision of EPUB, e.g. an EPUB 4, is initially out of scope on the presumption that
this will be taken up by a new W3C WG as a W3C Recommendation Track activity. The EPUB 3 CG will
coordinate its work with such new WG, and meanwhile with the existing W3C Digital Publishing Interest
[23]
Group (DPUB IG).
The International Digital Publishing Forum, IDPF, has ceased operations as a membership organization
2)
in January 2017, and its website is now an archive. The latest version of the standard and information
about future EPUB developments is available at the Publishing@ W3C webpage, https:// www .w3 .org/
publishing/ .
3) 4)
The specification at hand covers EPUB 3 versions up to EPUB 3.0.1 . EPUB 3.1 was the first major
revision of EPUB 3.0.1, but there are no implementations of version 3.1 and therefore it is not covered
in this document. The most widely used version of the standard is still 3.0.1. EPUB 3.2, was published in
5)
May 2019 . Unlike 3.1, it is fully backwards compatible with 3.0.1. It will be covered in the next edition
of this document.
1) https:// www .w3 .org/ publishing/ groups/ epub3 -cg/
2) http:// idpf .org/
3) http:// idpf .org/ epub/ 301
4) https:// www .w3 .org/ Submission/ epub31/
5) https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -spec .html
© ISO/IEC 2020 – All rights reserved v
Differences between EPUB specifications 2.0.1-3.2 are well documented:
6)
— EPUB 3 Changes from EPUB 2.0.1
7)
— EPUB 3.0.1 Changes from EPUB 3.0
8)
— EPUB 3.2 Changes from EPUB 3.0.1
All EPUB specifications are available in the Web; 2.01 at http:// idpf .org/ epub/ 201, EPUB 3.0.1 at http://
idpf .org/ epub/ 301 and 3.2 at https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -spec .html.
All EPUB publications, including ones using version 3.2, can be validated using EPUBCheck version
4.2.0, which was released in March 2019.
From long-term preservation point of view, lack of backward compatibility between successive versions
of a file format would be a problem because it makes migration more challenging. In addition, EPUB
3.1 has at least one feature which would have been problematic. In EPUB 3.1 foreign resources do not
require fallbacks if they are not in the spine and not embedded in EPUB Content Documents. In EPUB
3.0.1, fallback guarantees that there is a version of the document that can be rendered; in 3.1 such
guarantee no longer exists.
EPUB 3.0.1 was prepared by the IDPF. It consists of six interlinked documents:
— EPUB 3 Overview
— Publications 3.0.1
— Canonical fragment identifiers
— Content documents 3.0.1
— Media overlays 3.0.1
— Open Container Format 3.0.1
There are several extension specifications to these EPUB base standards. The list below is incomplete,
as it contains mainly specifications that are relevant from the long-term preservation point of view.
Some of them are still drafts:
9)
— EPUB Accessibility specification 1.0 addresses evaluation and certification of accessible EPUB
publications, and discovery of the accessible qualities in such publications.
10)
— EPUB Previews 1.0 describes how content previews can be included in EPUB publications.
11)
— EPUB Distributable Objects 1.0 is a draft specification that defines a method for the encapsulation,
transportation, and integration of distributable objects in EPUB publications.
12)
— EPUB Scriptable Components 1.0 provides an interoperable publish and subscribe (pubsub)
pattern by which interactive content can be created and incorporated into EPUB publications. Same
as EPUB Distributable Objects, it is as of 2019-05-13 a draft.
6) http:// www .idpf .org/ epub/ 30/ spec/ epub30 -changes -20111011 .html
7) http:// www .idpf .org/ epub/ 301/ spec/ epub -changes -20140626 .html
8) https:// w3c .github .io/ publ -epub -revision/ epub32/ spec/ epub -changes .html
9) http:// www .idpf .org/ epub/ a11y/ accessibility .html
10) http:// www .idpf .org/ epub/ previews/ epub -previews -20150826 .html
11) http:// www .idpf .org/ epub/ do/
12) http:// www .idpf .org/ epub/ sc/ api/
vi © ISO/IEC 2020 – All rights reserved
13)
— EPUB Scriptable Components Packaging and Integration 1.0 is a draft that defines a method for
the creation and inclusion of dynamic and interactive components in EPUB publications.
14)
— EPUB Multiple-Rendition Publications 1.0 defines the creation and rendering of EPUB publications
consisting of more than one rendition of the same publication.
15)
— EPUB Dictionaries and Glossaries 1.0 provides a means for expressing dictionary and glossary
semantics in EPUB publications.
These extensions are not widely used and they have not been explicitly taken into account in this
document. As regards accessibility, all EPUB publications are supposed to be accessible. However,
accessibility features as such do not have an impact on long term preservation of EPUB publications and
therefore this document does not make accessibility-related requirements.
EPUB 3 core media types have been listed at https:// www .w3 .org/ publishing/ epub3/ epub -spec .html
#sec -core -media -types. As of 2019-05-13, the latest change has been made on April 1, 2018. Starting
from EPUB 3.2, core media types are part of the standard.
In 2014, EPUB 3.0 specifications were republished as ISO/IEC TS 30135-1 to ISO/IEC TS 30135-6. Each
of these six ISO specifications is identical to its IDPF equivalent, for example ISO/IEC TS 30135-1 has
exactly the same content as the EPUB 3.0 Overview.
ISO/IEC TS 30135-7 entitled "Part 7: EPUB3 Fixed-Layout Documents" is from EPUB 3.0.1 (EPUB 3.0
does not have fixed layout specification). ISO/IEC TS 30135 (all parts) is therefore a combination of
EPUB 3.0 and Fixed-Layout Documents specification from 3.0.1.
ISO/IEC JTC 1/SC 34 is currently updating the ISO standard to match fully the version 3.0.1.
EPUB is a rich document format with a lot of features. From the digital preservation point of view this
is a challenge, not least because long-term preservation has not been a priority in the development of
the standard. Preserving all aspects and features of EPUB publications may be difficult, since there are
features which are difficult to preserve. Moreover, EPUB reading systems usually do not support all
features of the specification and finding tools supporting rare features can be difficult.
In spite of these challenges EPUB is generally regarded as a suitable format for digital archiving. For
instance, the Finnish National Digital Library initiative has selected just eight archivable file formats
for text, EPUB being one of them. The selection criteria were openness/transparency, adoption as a
preservation standard, degree of forward/backward compatibility, degree of protection against file
corruption, frequency of version releases, dependencies/interoperability, and standardization. EPUB
got an A, the best grade, from everything else except the second and third criterion. For those, the
grade was the second best, a B (see Reference [19], p.40). Based on these generic criteria, EPUB seems
to provide a good basis for long-term preservation, although additional guidelines on how to use the
standard are needed to guarantee EPUB files can be preserved efficiently.
The British Library’s Digital Preservation Team has published an assessment of EPUB as a preservation
[15]
format . It covers EPUB versions 3.0.1 and 2 and the overall view of EPUB is positive (Reference
[15], p.2):
EPUB 3 is currently the closest thing available to an open standard for e-books. In 2013, Bläsi and Rothlauf
concluded that EPUB 3 had the “highest expressive power” of all formats in the e-book ecosystem, and that
it included the superset of all features used in proprietary formats like KF8, Fixed Layout EPUB, and iBooks.
EPUB long-term preservation issues uncovered in the assessment of the British Library are discussed
in Annex A.
EPUB is enjoying reasonable support in the e-book market. Many suppliers, publishers, and application
developers who have supported EPUB 2 have implemented version 3.0.1. According to the EPUBTest web
13) http:// www .idpf .org/ epub/ sc/ pkg/
14) http:// www .idpf .org/ epub/ renditions/ multiple/
15) http:// www .idpf .org/ epub/ dict/
© ISO/IEC 2020 – All rights reserved vii
16)
site , EPUB 3 support in reading systems is far from exhaustive, but market coverage is good – in January
2018, there were 59 reading systems supporting at least some of the features specified in EPUB 3.0.
E-book suppliers have produced EPUB 3 based formats that incorporate digital rights management
(DRM), and EPUB modifications that may restrict using the format on other than the suppliers’ own
platforms. For example, the Kindle Fire eReader, released in 2015, uses a new format called Kindle
Format 8 (KF8), which is partly based on EPUB 3, with Amazon’s DRM. See Reference [15], 3. Publisher/
supplier specific DRM often restricts the use of e-books to that publisher’s/supplier’s rendering devices
and/or applications, and is therefore a major obstacle to digital preservation (see Reference [15], p.7).
The EPUB specification does not enforce a particular digital rights management scheme, but DRM may
be layered on top of the EPUB specifications. A producer can, for instance, use one of the three major
rights management systems in the market (Amazon DRM, Apple FairPlay DRM for books bought from
iBooks, and Adobe DRM), or some other DRM system along with some additional platform-targeting.
DRM protection should be removed from EPUB publications during pre-ingest by the producer or as a
part of the ingest process by the OAIS archive. In practice, only national libraries may be able to do this,
provided that legal deposit act and / or copyright act guarantee them such privilege. If migration is the
chosen preservation strategy, existing EPUB publications will be converted into more modern EPUB
versions when rendering tools for old versions are no longer available, and (eventually) migrated into
other formats.
If preserved EPUB publications are not directly accessible by the public, removing DRM, digital
watermarking, and other protection mechanisms from the archived documents is not a risk. When
publications are delivered to the customers as dissemination information packages (DIPs), the archive
shall use a combination of administrative and technical means to protect the documents as required in
the submission agreement. These means may include adding DRM protection mechanism into the DIP
submitted to the user according to the requirements of the submission agreement. The agreement may
also specify the customers the archive is entitled to serve; for instance, it is possible to require that the
preserved documents can only be disseminated to the producer, and the producer will serve the end-
users who do not have direct access the OAIS archive.
0.3 Digital preservation
The information society is dependent on successful long-term digital preservation. When an increasing
percentage of information is produced and published only in a digital format, it is important to make
sure that this information remains available in the distant future.
Digital preservation is not about preserving just bits, but about preserving access. The “business logic”
is as follows:
— we need software and hardware to render content for human users;
— software changes over time; there are new versions from old applications, and entirely new
applications;
— new or updated applications may not be able to render outdated file formats or format versions
correctly
— digital preservation makes an effort to have all archived content in stable formats. Publications
should also contain the smallest possible amount of features which are not commonly supported
in software packages used to render the content in these formats, and also avoid adding links to
external resources since then the long-term access to the publication requires also persistence of
these external resources.
— when necessary, data in old formats may be migrated into more modern formats or updated versions
of the same format. For instance, an e-book in EPUB 3.0.1 format may be migrated to EPUB 3.2.
when version 3.0.1 is no longer widely supported by reading systems.
16) http:// epubtest .org/ results
viii © ISO/IEC 2020 – All rights reserved
— since the aim is to preserve the content, not the bits, the bits may change as a result of version
updates and format migrations.
— Many OAIS archives preserve successive versions of archives publications, because migration may
change the look and feel of the original document, or even its intellectual content.
In many countries, national libraries are responsible for preserving the published cultural heritage for
the future generations, while national archives take care of governmental publications, irrespective of
which format they are available in. All of these resources have to be preserved for decades, centuries
even. Then again, publishers may guarantee continuous access to the subscribers of electronic serials
and other licensed content. If this is so, either the publisher or a third-party should look after the
publications and make sure they remain accessible or at least available.
Ordinary digital asset management systems are not suitable for long-term preservation; therefore it is a
normal practice to separate short-term and long-term information management into different systems.
However, this does not mean that digital archiving is independent of the routine life cycle of documents.
Digital preservation is a long process that begins when publications are created.
Preservation metadata, which allows the publication to be found, rendered and authenticated
correctly, is a prerequisite for digital preservation. Some preservation metadata elements can or
should be provided by the original creator of the publication. It is also important to keep preservation
requirements in mind when preparing a publication, if it is known that it has to be preserved for a long
time. Any feature in a file format can be either essential, useful, neutral, questionable, or even downright
counterproductive from a long-term preservation point of view. However, publishers are likely to use
the features that let them achieve their own goals, and preservation may not be among them.
There are archivable versions of some file formats. PDF/A (ISO 19005-1:2005) is probably the best known
example. It specifies how to use the PDF for long-term preservation. An example of a counterproductive
feature for preservation in PDF is font referencing; therefore in PDF/A all fonts shall be embedded in
order to guarantee that the document can be rendered correctly.
PDF/A forbids also the use of encryption, because encryption is generally regarded as a risk for long-
term preservation. But storing unencrypted documents is a risk as well, because if they are stolen, non-
[25]
authorized usage is easy. Therefore, according to the Digital preservation handbook :
Information security methods such as encryption add to the complexity of the preservation process and should
be avoided if possible for archival copies. Other security approaches may therefore need to be more rigorously
applied for sensitive unencrypted files; these might include restricting access to locked-down terminals in
controlled locations (secure rooms), or strong user authentication requirements for remote access.
In order to guarantee the correct processing of PDF/A files, there are specific requirements for PDF/A
reading systems, such as support for embedded fonts. There are three versions of the specification:
PDF/A-1 is based on PDF 1.4, PDF/A-2 adds features from PDF 1.5, 1.6 and 1.7, and PDF/A-3 contains all
the features of PDF/A-2 as well as allows the embedding of other file formats into PDF/A conforming
[21]
documents .
The TI/A (Tagged Image for Archival) standard initiative intended to create an ISO recommendation
to optimize the format specification for archival purposes. Unfortunately the project was disbanded
in 2016, and the TI/A draft the initiative completed in September 2016 is only available in the project
Intranet. However, the original TIFF/A (later TI/A) draft from February 2015 is a public document
17)
available on a PREFORMA project web site . Although this TIFF/A specification is only a draft, it is
probably a good idea to use in archival TIFF images features specified mandatory in the specification,
and avoid the ones which are forbidden.
The motivation behind the TI/A initiative can be applied to other image formats as well, and there are
also points the EPUB community might agree with Reference [22]:
17) http:// www .preforma -project .eu/ dpf -manager .html
© ISO/IEC 2020 – All rights reserved ix
The versatility of the TIFF format has made it very attractive for memory institutions for long-term archival
of their digital images. However, since the TIFF format offers such a great flexibility, it is not guaranteed
that in the future a standard TIFF reader will be able to read some TIFF images.
The limitations of the baseline TIFF are too severe for many applications in digital archiving. It is important
that, besides crucial technical metadata such as ICC color profiles (in case of color images) also important
descriptive metadata is stored within the image file. Having descriptive metadata available (such as content
description, iconography, copyright and ownership information etc.) is crucial for every archive. Having this
information in the same file as the image data guarantees that this information will always be associated
with the image.
TIFF is not an EPUB core media type, but four other image types have been listed; GIF, JPEG, PNG,
and SVG. It is significant from a digital preservation point of view how these formats and other core
media types are used in the EPUB context. Image and audio files embedded in an EPUB publication may
require migration before the EPUB publication itself has to be migrated into a more modern file format,
if commonly available EPUB reading systems no longer support these file formats. This document does
not provide guidelines for creating archivable files in EPUB 3 core media types, due to the magnitude
of such task. But EPUB community should follow the archival file format lists of national archives or
18) 19)
libraries (for example the Library of Congress file format list and the U.S. National Archives list )
when the core media file format list is updated. Publishers should also consider the persistence of file
formats used when creating EPUBs for which the need for long-term preservation is foreseen.
This document does not require any changes to be made to the EPUB versions in production now or
to any future versions of it. However, with each new EPUB standard version it is necessary to check
if the ISO 22424 (all parts) needs to be revised, since any new EPUB features can be either useful,
counterproductive, or irrelevant from a long-term digital preservation point of view. A similar approach
is already in place for PDF/A: ISO 19005-1 applies to PDF 1.4, and ISO 19005-2 covers the subsequent
PDF versions up to 1.7.
0.4 OAIS and related standards
ISO 22424 (all parts) provides guidance on how to utilize the OAIS and current practices of OAIS
archives in preservation of EPUB publications. The OAIS (ISO 14721) is equally relevant to both parts of
the ISO 22424 series.
OAIS is a reference model for long-term data storage systems. It is used by memory institutions (libraries,
archives, and museums) and many other organizations that need to preserve digital resources in the
long-term. Although an ISO standard, the OAIS was originally developed by the Consultative Committee
20)
for Space Data Systems (CCSDS) , which still maintains the specification.
The model has five functional units (Ingest, Archival Storage, Access, Data management and
Administration) as shown in Figure 1.
18) http:// www .loc .gov/ preservation/ digital/ formats/
19) https:// www .archives .gov/ records -mgmt/ policy/ transfer -guidance -tables .html
20) https:// public .ccsds .org/ default .aspx
x © ISO/IEC 2020 – All rights reserved
[20]
Figure 1 — OAIS model
In the model, the ingest function is responsible for receiving information from producers and preparing
it for storage and management within the OAIS archive. The ingest accepts information – in this case,
EPUB publications – from producers in the form of SIPs, performs quality assurance checks on the SIP,
and generates an archival information package (AIP) from one or more SIPs (or multiple AIPs from a
single SIP). Finally, the ingest function transfers the new AIPs to Archival Storage and the associated
descriptive information (metadata) to Data Management.
Modifying an EPUB publication so that it is suitable for digital archiving is from the OAIS point of
view a part of pre-ingest and as such not a part of the OAIS model. The importance of the OAIS to
ISO 22424 (all parts) is that the model provides a terminology, information package data model and an
overall framework within which digital preservation can be performed.
Neither OAIS nor this document describe the interface between a repository system used by the
archive and systems used by producers. The Producer-Archive Interface Methodology Abstract
Standard, also known as PAIMAS (ISO 20652), covers the first stages of the ingest process defined by
the OAIS. It provides a basis for detailed specifications on how production systems communicate with
OAIS archives. One such specification is DEPIP, the Data Exchange Protocol for Interoperability and
Preservation (ISO 20614). The DEPIP is intended for systems used by libraries, archives, and museums.
Other domains are likely to create their own API specifications.
Of all the functional units of the OAIS model, this document covers only the ingest unit. In addition
there are tasks that are part of non-OAIS unit Pre-ingest, or things a producer shall take care of when
preparing a SIP. Other OAIS units are beyond the scope, and therefore archival or dissemination related
functions such as migration or creation of dissemination information packages are discussed only in
passing. It is assumed that ingest does not require any major changes, although if EPUB for some reason
were no longer approved as preservation format, the archive would be obliged to migrate the EPUB
publications into eligible file format. Even then the submission agreement might require the archive to
disseminate the publication back to consumers in the original EPUB format.
OAIS submission agreements specify the principles of how documents should be prepared and submitted
21)
to the repository system. If the archive uses migration as the preservation method , submission
agreements should specify file formats (and metadata formats) suitable for submission and/or archival,
or refer to external documents listing these formats. File formats suitable for submission but not for
archival are migrated during the ingest process, although the original files may be included in the AIP.
21) In this document, preservation method is assumed to be migration. In practice, emulation can also be applied
if it is important to preserve the original look and feel of the publication. In an ideal world such migrations between
the file formats would be lossless; in practice that is not the case. Migrated document could look different even if
the content is the same, and in the worst case semantics changes as well. Therefore archives often preserve also the
original version of the archived resource, alongside more modern versions.
© ISO/IEC 2020 – All rights reserved xi
Therefore archives often preserve also the original version of the archived resource, alongside more
modern versions.
The submission agreements may also refer to SIP schema specifications, which provide more guidelines
for document producers. Schemas may utilize long-term preservation standards such as METS (Metadata
Encoding and Transmission Standard). Together the submission agreement and related documents
should give a producer a clear idea on when and which publications should be sent to the repository
system, which file formats and metadata specifications should be used, means of data transfer available
etc. These requirements should cover both ingest and dissemination; that is, submission of documents
to the repository system by the producer, and retrieval of the archived documents by customers.
This document outlines the general principles for the submission of EPUB publications from digital
asset management systems to repository systems. The principles of archival storage or dissemination
of archived documents are not covered here, because OAIS archives may apply various methods and
processes to meet the requirements of submission agreements. Bit level preservation is also out of
scope; the purpose of this document is to make it easier for producers and OAIS archives to preserve
access to EPUB documents.
ISO/IEC TS 22424-2 provides a technical basis to meet the principles listed in this document by
specifying metadata required for long-term preservation, and a method for packaging this metadata
with the original EPUB container.
This document is applicable to EPUB versions 3 and 3.0.1 and as such it should be used cautiously with
other (previous or later) versions of the standard. If there is a need to preserve documents that are in
earlier EPUB versions, they do not need to be migrated, provided that a) submission agreement specifies
those EPUB versions as archivable formats, and b) there are reading systems for these EPUB versions.
Additional features in future EPUB versions should be analyzed from a long-term preservation point of
view. If such an analysis reveals that they may constitute a risk, they should be avoided in submitted
EPUB publications, or removed during ingest.
Annex A in this document provides a summary of issues and recommendations related to the EPUB
standard and its usage from long-term preservation point of view.
xii © ISO/IEC 2020 – All rights reserved
TECHNICAL SPECIFICATION ISO/IEC TS 22424-1:2020(E)
Digital publishing — EPUB3 preservation —
Part 1:
Principles
1 Scope
The ISO/IEC TS 22424 series supports long-term preservation of EPUB publications via a dual
strategy. This document considers EPUB features from a long-term preservation point of view. Some
EPUB features are forbidden and some others required, depending on how they relate to a long-term
preservation. EPUB publications constructed according to these guidelines are suitable for preservation.
ISO/IEC TS 22424-2 makes EPUB compliant with Open Archival Information System (OAIS) and current
practices of OAIS archives.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 14721, Space data and information transfer systems — Open archival information system (OAIS) —
Reference model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 14721 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
NOTE Unless stated otherwise, the terms have been adopted from ISO 14721:2012.
3.1
administrative metadata
metadata (3.33) that provides information (3.28) to help manage a resource, such as when and how it
was created, file type and other technical information, and access rights
Note 1 to entry: The definition is adapted from Reference [24].
3.2
archival information package
AIP
information package (3.29) consisting of content information (3.6) and associated preservation
description information (PDI) (3.41), which is preserved within an OAIS (3.36)
3.3
archive
OAIS archive
organization that intends to preserve information (3.28) for access and use by a designated
community (3.11)
© ISO/IEC 2020 – All rights reserved 1
3.4
authenticity
property that an entity is what it claims to be
Note 1 to entry: Authenticity is judged on the basis of evidence.
[SOURCE: ISO/IEC 27000:2018, 3.6, modified — Note 1 to entry has been added.]
3.5
consumer
role played by those persons or client systems, who interact with OAIS (3.36) services to find preserved
information (3.28) of interest and to access that information in detail
Note 1 to entry: This can include other OAISs, as well as internal OAIS persons or systems.
3.6
content information
set of information (3.28) that is the original target of preservation or that includes part or all of that
information
Note 1 to entry: It is an Information Object composed of its Content Data Object and its Representation
Information.
3.7
context information
information (3.28) that documents the relationships of the content information (3.6) to its environment
Note 1 to entry: This includes reasons why the content information was created and how it relates to other
content information objects.
3.8
core media type
set of publication resource (3.45) for which no fallback (3.23) is required
Note 1 to entry: The definition is adapted from Reference [18]. Core media types have been specified in chapter
5.1 of Reference [18].
EXAMPLE Core media types for still images are image/gif, image/jpg, image/png and image/svg+xml. Any
other still image file format is foreign and requires a fallback, meaning the same resource expressed in another
foreign format or core media type.
3.9
data, pl
reinterpretable representation of information (3.28) in a formalized manner suitable for communication,
interpretation, or processing
Note 1 to entry: Data are often understood as taking the form of a set of values of qualitative or quantitative
variables.
[SOURCE: ISO 5127:2017, 3.1.1.15]
3.10
descriptive metadata
descriptive information
metadata (3.33) about a resource for example for discovery and identification
Note 1 to entry: These can include elements such as title, abstract, author, and keywords.
Note 2 to entry: The definition is adapted from Reference [24].
2 © ISO/IEC 2020 – All rights reserved
3.11
designated community
identified group of potential consumers (3.5) who should be able to understand a particular set of
information (3.28)
Note 1 to entry: A designated community may be composed of multiple user communities. The community is
defined by an archive (3.3), though this definition may change later on.
3.12
digital preservation
series of managed activities necessary to ensure continued access to digital materials for as long as
necessary
Note 1 to entry: Digital preservation refers to all of the actions required to maintain access to digital materials
beyond the limits of media failure or technological and organizational change
Note 2 to entry: Those materials may be records created during the day-to-day business of an organization;
"born-digital" materials created for a specific purpose (e.g. teaching resources); or the products of digitisation
projects.
Note 3 to entry: The definition is adapted from Reference [25].
EXAMPLE 1 Short-term preservation - Access to digital materials either for a defined period of time while
use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible
because of changes in technology.
EXAMPLE 2 Medium-term preservation - Access to digital materials beyond changes in technology for a
defined period of time but not indefinitely.
EXAMPLE 3 Long-term preservation (3.31) - Access to digital materials, or at least to the information (3.28)
contained in them, indefinitely.
3.13
digital rights management
DRM
packaging, distributing, controlling, and tracking content based on rights and licensing information (3.28)
3.14
digital signature
signature
data (3.9) appended to, or a cryptographic transformation of, a data unit that allows the recipient of
the data unit to prove the source and integrity of the data unit and protect against forgery, e.g. by the
recipient
[SOURCE: ISO/IEC 19784-1:2018, 4.34, modified — Note 1 to entry has been removed.]
3.15
dissemination information package
DIP
information package (3.29), derived from one or more AIPs (3.2), sent by an archive (3.3) to a cons
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...