Document management — Digital file format recommendations for long-term storage

This document gives guidelines for selecting the most appropriate file format(s) for the storage, usability, and exchange of data with a long-term management objective. It is applicable to the selection of file formats to be used to store electronic documents. It provides guidance that takes into account: — the durability of documents in a readable form; — fidelity to the original and data integrity; — interoperability, i.e. independence from creation applications, information systems and rendition platforms; — compliance with relevant laws and regulations; — compliance with format specifications; — reducing costs by reducing the number of conversions/migrations over time. This document is applicable to all office activities (e.g. text processing, spreadsheets, presentations), email and static web pages, as well as all types of electronic components, including images, video and sound. It does not apply to database formats.

Gestion électronique — Recommandations de format de fichier numérique pour le stockage à long terme

General Information

Status
Published
Publication Date
04-Nov-2018
Current Stage
6060 - International Standard published
Completion Date
05-Nov-2018
Ref Project

Buy Standard

Technical report
ISO/TR 22299:2018 - Document management -- Digital file format recommendations for long-term storage
English language
12 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

TECHNICAL ISO/TR
REPORT 22299
First edition
2018-11
Document management — Digital file
format recommendations for long-
term storage
Gestion électronique — Recommandations de format de fichier
numérique pour le stockage à long terme
Reference number
ISO/TR 22299:2018(E)
©
ISO 2018

---------------------- Page: 1 ----------------------
ISO/TR 22299:2018(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2018
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2018 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/TR 22299:2018(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Basic selection criteria according to the content type description .1
4.1 General . 1
4.2 Selection methodology . 2
4.2.1 File format description . 2
4.2.2 Long-term availability of readers or players . 2
4.2.3 File presentation stability . 3
4.2.4 Software and/or operating system migrations . 3
4.2.5 File format selection . 3
5 File formats . 4
5.1 General . 4
5.2 Coded text . 4
5.3 Vector graphics . 4
5.3.1 2D graphics . 4
5.3.2 3D graphics . 4
5.3.3 Technical drawings . 5
5.4 Images . 5
5.5 Sound . 5
5.5.1 Linear formats for sound files . 5
5.5.2 Lossless compression formats . 6
5.5.3 Lossy compression formats . 6
5.5.4 Container formats . 6
5.6 Video . 6
5.6.1 General. 6
5.6.2 Coding . 7
5.6.3 Digitalization . 7
5.6.4 Compression . 7
5.6.5 Video container formats . 8
5.7 Office automation . 8
5.8 Formats suitable for preservation . 8
Bibliography .10
© ISO 2018 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/TR 22299:2018(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso
.org/iso/foreword .html.
This document was prepared by Technical Committee ISO/TC 171, Document management applications,
Subcommittee SC 2, Document file formats, EDMS systems and authenticity of information.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2018 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/TR 22299:2018(E)

Introduction
The document management industry is heavily reliant on standardized file formats for both long-term
storage and interoperability purposes.
Effective document management often requires the selection of an appropriate storage file format and
eventually conversion between the native digital document format and the selected storage file format.
This document provides information and guidelines on file formats to assist in the selection of file
formats.
© ISO 2018 – All rights reserved v

---------------------- Page: 5 ----------------------
TECHNICAL REPORT ISO/TR 22299:2018(E)
Document management — Digital file format
recommendations for long-term storage
1 Scope
This document gives guidelines for selecting the most appropriate file format(s) for the storage,
usability, and exchange of data with a long-term management objective.
It is applicable to the selection of file formats to be used to store electronic documents. It provides
guidance that takes into account:
— the durability of documents in a readable form;
— fidelity to the original and data integrity;
— interoperability, i.e. independence from creation applications, information systems and rendition
platforms;
— compliance with relevant laws and regulations;
— compliance with format specifications;
— reducing costs by reducing the number of conversions/migrations over time.
This document is applicable to all office activities (e.g. text processing, spreadsheets, presentations), email
and static web pages, as well as all types of electronic components, including images, video and sound.
It does not apply to database formats.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 12651-1, Electronic document management — Vocabulary — Part 1: Electronic document imaging
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 12651-1 apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
4 Basic selection criteria according to the content type description
4.1 General
The following criteria can be considered when selecting a file format:
— the file format functionality, i.e. the type of content it is able to support (e.g. text only, enhanced text
with images or style sheets, images, video, sound);
© ISO 2018 – All rights reserved 1

---------------------- Page: 6 ----------------------
ISO/TR 22299:2018(E)

— the file format specifications that are made available as an open standard;
— the file format that can be used in the intended application;
— the metadata that can be incorporated into the file;
— the likelihood that a reader or a player will still be available on a long-term basis;
— whether the file format has widespread support by the industry and vendors.
4.2 Selection methodology
4.2.1 File format description
The unconstrained availability of a file format specification is essential for the development of software
products, now and in the future, that are capable of correctly representing the content of files of this type.
End users should seek assurance of the openness (free availability) of a file format specification before
using this format for long-term storage. If file format specifications are not freely available, the file
format is not recommended for long-term retention and could be used only after a comprehensive risk
analysis.
The file format should be available as an open standard, which has been developed and is maintained
by an authoritative, neutral standardization body with no copyright restrictions or fees for use.
Electronic content can be stored in a document management environment so that software and/
or users can use the content. There are standardized and non-standardized file formats that can be
considered. Non-standardized formats should only be used with caution and only if the file format is
fully documented. Examples of standardized file formats include JPG and PDF (and the PDF sub-sets).
Non-standardized, but widespread (and commonly used) formats include TIFF, which is a proprietary
format. The decision to use (select) standardized formats versus non-standardized formats should
be considered by the end-user organization and is dependent on other aspects of the document
management system. For example, a document may be received in PDF format, but then its pages may
be extracted into TIFF or JPG for further processing, such as data extraction, etc.
4.2.2 Long-term availability of readers or players
From a long-term storage/archival perspective, the organization should always take into account
the potential need to migrate and/or convert existing formats. As technology continues to mature
and expand, file formats are being updated as required. For example, the PDF subsets that are now
available. As a result, formats that are in use today may need to be updated to ensure the usability of
the information they contain is retained in the future.
An organization may need to maintain the originals of documents that contain essential information for
authenticity and integrity, such as digital signatures, seals or timestamps, recognizing that migration/
conversion to another format could invalidate those elements. The organization should recognize the
existence of different use cases for file formats and take this into account when selecting long-term file
formats.
It is also important to take into consideration that a tool or application has to be available that can
properly open and display the contents of the file. These “readers” should be kept up-to-date so that
they are able to function in the current operating environment. In cases where non-standardized
formats are used, it is important that the organization is able to maintain a reader to open/read the
files. As technologies change and expand (e.g. a new sub-set of PDF), the organization should verify that
the reader is not only able to open/display new files, but also legacy files.
There are three strategies for managing reader applications:
— porting the existing software to new operating systems;
2 © ISO 2018 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/TR 22299:2018(E)

— developing new software for new operating systems;
— emulation supporting the continuous usage of old software in new computing environments.
The first two options are suitable for widely adopted file formats. Porting is considered when the
corresponding cost is relatively low. Developing new software allows the user to add new functionalities
and to improve usability.
4.2.3 File presentation stability
Content retained for legal or records management purposes should be stored using tamper-resistant
file formats.
Files should not depend on external resources that could be modified or become unavailable in the future.
Files should not contain embedded code (e.g. macros) or other features that could change the
representation of the file content.
Enhanced text is characterized by the fact that:
— letters can be presented using different fonts;
— images can be represented using different file formats.
A reader may only support a reduced set of fonts. If there is a need to use one or more fonts in addition
to those of that reduced set, the additional fonts should be embedded inside the file. Since this can
increase the file size, it can be preferable to only use the fonts that are supported in the reduced set.
Where different fonts are supported by the reader, it is preferable to allow only embedded fonts, in
order to avoid external dependencies.
A reader may only support a reduced set of image formats. It may support additional formats using
external readers. However, the availability of these external readers should be demonstrated in the
same way as those of the text readers.
4.2.4 Software and/or operating system migrations
Tests should be performed to provide assurance of the fidelity of the rendering when:
— porting the existing software to new operating systems;
— developing new software for new operating systems;
— emulation supporting continuous usage of old software in new computing environments.
4.2.5 File format selection
Different file formats may be considered where the content to be stored is coded text, enhanced text,
2D graphics, 3D graphics, images, sound or video. These formats are addressed in Clause 4.
Consideration should be given to the following criteria when selecting a file format:
— any intellectual property associated with the use of the format;
— available software tools for reading and writing the format;
— long-term access to the technical specification(s) defining the format;
— certification and/or compliance related to the format.
© ISO 2018 – All rights reserved 3

---------------------- Page: 8 ----------------------
ISO/TR 22299:2018(E)

5 File formats
5.1 General
To reduce the volume of information processing, it is important to consider compressing the data (e.g.
images, sound and video) while preserving the required quality and usability (e.g. evaluating the sound
quality for the listener). For digitizing analogue materials or digital recordings for the purposes of
long-term preservation, any lossy compression process should be avoided. Only a few of the numerous
compression methods are identified below. It is important to understand that the same format name
may be shared by a family of sub-formats with different compression characteristics.
5.2 Coded text
Plain text file contains only characters and special symbols. Different encodings can be used. See ISO/
IEC 646, ISO 1073 (all pa
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.