Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals

This document specifies fundamental concepts of the reference model for textual documents and provides guidance to support long-term preservation from the perspectives of its five layers. It defines: the layers that constitute the reference model for textual documents; the types of elements incorporated within textual documents; property types associated with textual documents; classifications of properties by type; and properties inherent to textual documents relevant to long-term preservation. This document does not cover: specific technical methods for checking whether the properties exist within a specific textual document; specific technical methods for analysing particular textual document format (e.g. DOC, DOCX, ODT, TXT, PDF); specific metadata items for the long-term preservation of textual documents; processes, procedures, or management practices related to long-term preservation or records management.

Gestion des documents — Modèle de référence pour une conservation à long terme des documents textuels — Partie 2: Principes essentiels

General Information

Status
Published
Publication Date
21-May-2026
Current Stage
6060 - International Standard published
Start Date
22-May-2026
Due Date
03-Mar-2026
Completion Date
22-May-2026

Buy Documents

Standard

ISO 20271-2:2026 - Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals

Release Date:22-May-2026
English language (22 pages)
sale 15% off
Preview
sale 15% off
Preview

Overview

ISO 20271-2: Document management - Reference model for long-term preservation of textual documents - Part 2: Fundamentals is an international standard developed by ISO. This document defines the fundamental concepts related to the long-term preservation of textual documents, focusing on key aspects such as document structure, critical elements, and intrinsic properties. The standard provides a multi-layered reference model that serves as a foundation for analyzing, evaluating, and improving the technical viability of text documents for long-term digital preservation.

As organizations, institutions, and governments increasingly rely on digital documentation, the risk of format obsolescence and information loss grows. ISO 20271-2 aims to address these challenges by supporting consistent document structure, interoperability, and continued accessibility across changing technologies and file formats.

Key Topics

  • Reference Model Layers: ISO 20271-2 specifies multiple abstraction layers for textual documents, typically including content, structure, presentation, interaction, and metadata. These layers support detailed technical analysis for preservation planning.
  • Elements and Components: The standard outlines various document elements such as text, images, tables, domain-specific notations, and embedded objects. Understanding these components is essential for effective long-term preservation.
  • Property Types and Classifications: ISO 20271-2 defines property types associated with textual documents, classifying them to facilitate systematic preservation evaluation.
  • Preservation Approaches: The document discusses major preservation strategies, such as:
    • Retaining original formats
    • Virtualization of usage environments
    • Conversion to standardized formats (e.g., PDF/A)
  • Risks of Obsolescence: Highlighting the need for dependable methods to keep digital documents accessible and analyzable over time, regardless of technological advancements or changes in software.

Applications

By adopting ISO 20271-2, organizations can strengthen their digital preservation policies and ensure that valuable textual content remains accessible in the long run. Typical areas of application include:

  • Archives and Libraries: Ensuring institutional memory by preserving records and manuscripts in formats that withstand technological changes.
  • Records Management: Supporting compliance with legal and regulatory requirements by establishing standardized preservation practices for documents.
  • Technical and Regulatory Documentation: Maintaining long-term accessibility of manuals, specifications, and legal documents that are critical for future reference.
  • Standard Development: Providing guidelines for creating or selecting document formats with robust long-term preservation features.
  • Comparative Analysis: Facilitating comparative evaluation of different file formats to inform decisions on archival strategies or digital migration.

This standard serves as a practical reference for archivists, records managers, software developers, and policy makers working on long-term document management strategies, directly influencing the usability and authenticity of digital records over decades.

Related Standards

Organizations engaged in document management and archiving will benefit from familiarity with related standards, such as:

  • ISO 20271-1: Overview and contextual background for the reference model (companion to Part 2).
  • ISO 20271-3 (in development): Taxonomy and XML-based reference markup for digital preservation.
  • ISO 32000 series: Specifications for the Portable Document Format (PDF), including PDF/A for archival.
  • ISO/IEC 26300: Open Document Format (ODF) for Office Applications.
  • ISO/IEC 29500: Office Open XML (OOXML).
  • ISO 15489: Information and documentation – Records management.

Leveraging ISO 20271-2 alongside these related standards ensures a comprehensive and future-proof approach to document management and long-term digital preservation.


Keywords: ISO 20271-2, document management, long-term preservation, textual documents, reference model, digital archiving, document structure, file format obsolescence, preservation strategies, metadata, interoperability, digital records, archival standards.

Buy Documents

Standard

ISO 20271-2:2026 - Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals

Release Date:22-May-2026
English language (22 pages)
sale 15% off
Preview
sale 15% off
Preview

Get Certified

Connect with accredited certification bodies for this standard

BSI Group

BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

UKAS United Kingdom Verified

NYCE

Mexican standards and certification body.

EMA Mexico Verified

Sponsored listings

Frequently Asked Questions

ISO 20271-2:2026 is a standard published by the International Organization for Standardization (ISO). Its full title is "Document management — Reference model for long-term preservation of textual documents — Part 2: Fundamentals". This standard covers: This document specifies fundamental concepts of the reference model for textual documents and provides guidance to support long-term preservation from the perspectives of its five layers. It defines: the layers that constitute the reference model for textual documents; the types of elements incorporated within textual documents; property types associated with textual documents; classifications of properties by type; and properties inherent to textual documents relevant to long-term preservation. This document does not cover: specific technical methods for checking whether the properties exist within a specific textual document; specific technical methods for analysing particular textual document format (e.g. DOC, DOCX, ODT, TXT, PDF); specific metadata items for the long-term preservation of textual documents; processes, procedures, or management practices related to long-term preservation or records management.

This document specifies fundamental concepts of the reference model for textual documents and provides guidance to support long-term preservation from the perspectives of its five layers. It defines: the layers that constitute the reference model for textual documents; the types of elements incorporated within textual documents; property types associated with textual documents; classifications of properties by type; and properties inherent to textual documents relevant to long-term preservation. This document does not cover: specific technical methods for checking whether the properties exist within a specific textual document; specific technical methods for analysing particular textual document format (e.g. DOC, DOCX, ODT, TXT, PDF); specific metadata items for the long-term preservation of textual documents; processes, procedures, or management practices related to long-term preservation or records management.

ISO 20271-2:2026 is classified under the following ICS (International Classification for Standards) categories: 35.240.30 - IT applications in information, documentation and publishing; 37.080 - Document imaging applications. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO 20271-2:2026 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


International
Standard
ISO 20271-2
First edition
Document management —
2026-05
Reference model for long-
term preservation of textual
documents —
Part 2:
Fundamentals
Reference number
© ISO 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Textual documents . 2
5 Reference model for textual documents . 5
5.1 Purpose .5
5.2 Approaches and rationale for long-term preservation .6
5.3 Multi-layered reference model .7
5.3.1 General .7
5.3.2 Layers of the reference model .8
5.3.3 Property types of each layer .10
5.3.4 Recommendations for assessing each layer .11
6 Target documents for applying reference model .15
6.1 Type of document for the reference model . 15
6.2 Content . . 15
6.2.1 General . 15
6.2.2 Text .16
6.2.3 Image .16
6.2.4 Table .18
6.2.5 Domain-specific notations .18
6.2.6 Reviewing and commenting .21
6.2.7 Other content elements of textual documents . .21
Bibliography .22

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 171, Document management applications,
Subcommittee SC 2, Document file formats, EDMS systems and authenticity of information.
A list of all parts in the ISO 20271 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
Over time, numerous file formats have been created and subsequently become obsolete, resulting in digital
files that are no longer accessible.
This situation typically arises when the technologies, software environments, or underlying specifications
and standards – whether international, industry-based, or proprietary – are no longer maintained, and
when insufficient information is available to interpret the file structure. Consequently, digital documents
created decades ago can become unreadable or unanalyzable, even though the data itself still exists. This
challenge has prompted sustained discussion among governments and organizations regarding the long-
term preservation of digital documents and has established digital preservation as a critical issue in
electronic document management.
The primary objective of this document is to support the long-term preservation of textual documents
by ensuring that they remain technically interpretable and understandable despite potential format
obsolescence. To achieve this, this document defines a reference model that enables systematic technical
analysis and quantitative evaluation of textual document formats, while accommodating different
preservation requirements and levels of available information.
This document defines multiple abstraction layers for textual documents and specifies the categories of
properties associated with each layer. It establishes technical criteria for recording and assessing these
properties within specific file formats, in order to identify risks related to long-term accessibility and
interpretability.
The reference model defined in this document serves as a practical resource for professionals involved in
document management, including institutional archivists and records managers, by providing a common
basis for evaluating the long-term preservation readiness of textual document formats, in order to support
consistent structure, interoperability, and long-term interpretability across different technologies and
systems. In addition, the reference markup presented in this document can be used as a reference when
developing new textual document formats or when enhancing the long-term preservation capabilities of
existing formats.
Accordingly, this document supports the following activities:
— format analysis for selection and evaluation of textual document formats for long-term preservation;
— technical design activities related to the development of new textual document format specifications;
— activities aimed at improving existing textual document standards through the addition of properties or
structural refinements;
— classification and comparative analysis of textual document formats.
The ISO 20271 series currently consists of the following parts:
1)
— Part 1 (ISO 20271-1 ) provides an overview and contextual background for this document;
— Part 2 (this document) defines the fundamental concepts of the reference model;
2)
— Part 3 (ISO 20271-3 ) defines a taxonomy and XML-based reference markup for digital preservation.
1) Under preparation. Stage at the time of publication: ISO/DIS 20271-1:2026.
2) Under preparation. Stage at the time of publication: ISO/WD 20271-3:2026.

v
International Standard ISO 20271-2:2026(en)
Document management — Reference model for long-term
preservation of textual documents —
Part 2:
Fundamentals
1 Scope
This document specifies fundamental concepts of the reference model for textual documents and provides
guidance to support long-term preservation from the perspectives of its five layers.
It defines:
— the layers that constitute the reference model for textual documents;
— the types of elements incorporated within textual documents;
— property types associated with textual documents;
— classifications of properties by type; and
— properties inherent to textual documents relevant to long-term preservation.
This document does not cover:
— specific technical methods for checking whether the properties exist within a specific textual document;
— specific technical methods for analysing particular textual document format (e.g. DOC, DOCX, ODT, TXT,
PDF);
— specific metadata items for the long-term preservation of textual documents;
— processes, procedures, or management practices related to long-term preservation or records
management.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/

3.1
Common file formats
documents formats include plain-text format (TXT), Office Open XML (DOCX), Open Document Text (ODT),
Portable Document Format (PDF), Open Word Processor Markup Language (OWPML), TeX and Hypertext
Markup Language (HTML)
3.2
element
component included in a textual document (3.6)
3.3
property
attribute, element (3.2), and other components found in textual documents (3.6), which are subject to long-
term preservation
3.4
rendering engine
software component responsible for converting document content (such as text, images, and formatting
instructions) into a visual or printable output on various devices, like screens or printers
Note 1 to entry: It interprets the document's code or format and displays it in a way that users can view or interact
with.
3.5
semantic information
properties of textual documents (3.6) that encompass all semantic content
Note 1 to entry: This includes both the substantive information and the structural aspects that convey meaning within
the document.
3.6
textual document
document that conveys its core message primarily through the use of human language characters, regardless
of the encoding or rendering method used
Note 1 to entry: A textual document may also include structured layouts, stylesheets, images, audio and other
embedded content elements.
Note 2 to entry: Common file formats for textual documents include plain text (TXT), Office Open XML (DOCX),
OpenDocument Text (ODT), Portable Document Format (PDF), Hangul Word Processor XML (HWPX), Hypertext
Markup Language (HTML) and TeX.
3.7
unicode
character encoding standard maintained by the Unicode Consortium designed to support the use of text
written in all of the world’s major writing systems
3.8
vector image
form of computer graphics in which visual images are created directly from geometric shapes defined on a
Cartesian plane, such as points, lines, curves, and polygons
4 Textual documents
Textual documents can be represented in a variety of file formats.
Common file formats for textual documents include TXT, DOCX, ODT, PDF, HWPX, HTML and TeX.
Textual documents can encompass a wide range of content, from simple text to multimedia elements like
images, videos and audio. They additionally support rich expressions through various styles, complex
layouts and integration with external elements such as fonts.

The structural content types in textual documents range from simple text-only formats to those
incorporating multimedia elements like images and videos. Furthermore, these documents contain various
properties enabling rich expression – such as diverse styles and complex layouts – through integration with
external elements like fonts.
The reference model defined in this document aims to provide a layered abstraction of technical
information, which helps break down the structure of textual documents. This breakdown facilitates
the establishment of evaluation criteria for long-term preservation and the categorization of documents.
In practical applications, this reference model can require additional layers beyond the five foundational
abstract layers initially identified for textual documents. These foundational layers typically include aspects
such as content, structure, presentation, interaction and metadata. Additional layers can be necessary for
documents with non-textual primary content, such as spreadsheets (for numerical data) and presentation
files (with dynamic features like animations). This document primarily focuses on text-centric documents
that store, preserve and deliver information conveyed through text content, ranging from simple structured
formats to those with complex layouts.
Figure 1 — Examples of textual documents
Figure 1 illustrates various types of textual documents. These documents can have the following layout
characteristics:
— a header, footer, and body aligned to fit a specific paper size;
— a body composed of several paragraphs, each of which can be represented by one or more sections;
— paragraphs constructed from characters encoded in various standards [such as Unicode, American
Standard Code for Information Interchange (ASCII), Shift-JIS, Extended Unix Code for Korean (EUC-KR),
Big5];
— images, and tables, including various styles to decorate them.

Key
1 vertical writing mode
2 horizontal writing mode
3 character size
4 line gap
5 inter-line space
6 left-to-right base direction
7 right-to-left base direction
NOTE Specific language examples are preserved where they demonstrate different layout structures and writing
directions, as these are essential to illustrate the concept rather than requiring translation.
Figure 2 — Different types of text flow in a paragraph
These textual documents are not just digitized forms but also reflect the cultural characteristics of the
countries and regions where they are used. For example, different regional documentation practices vary
in their approach: some frequently use lists, others commonly employ tables for layout, and certain writing
traditions utilize vertical paragraph orientation. As shown in Figure 2, in the Arabic script, the character
flow of paragraphs combines right-to-left (for Arabic text) and left-to-right (for embedded Chinese, Japanese,
Korean, Latin characters, or numerals), though the overall paragraph direction remains right-to-left.

Key
1 text progression
2 size of the illustration < left-right size of the column
3 size of the illustration = left-right size of the column
4 size of the illustration = top-bottom size of the two columns
5 size of the illustration = top-bottom size of basic layout
6 left-right size of the column
7 size of the illustration left-right size of basic layout
8 top-bottom size of the column
9 Size of the illustration = top-bottom size of the column
Figure 3 — Diverse logical structures and layouts of textual documents
As illustrated in Figure 3, a textual document can be a digital format that simply contains text content, but it
can also incorporate logical structures such as reading sequence and presentation order.
5 Reference model for textual documents
5.1 Purpose
This reference model is a fundamental framework that outlines the properties and recommendations for
each layer of textual documents that can be relevant to long-term preservation. It serves as a common basis
for understanding, analysing and establishing criteria for assessing the technical considerations required
for long-term preservation of textual documents. The model enables the examination and determination
of long-term preservation viability for various formats with distinct technical foundations, promoting

consistent integration, interoperability, scalability, maintainability and functionality among technical tools
and programs. However, this reference model does not address specific technical or implementation details
regarding the analysis of individual file formats.
5.2 Approaches and rationale for long-term preservation
Textual documents can be encoded in a variety of formats, including plain-text documents, and those
specified in standards such as ISO 32000 series (PDF), the ISO/IEC 26300 series (Open Document Format),
and the ISO/IEC 29500 series (Office Open XML). At the time of publication of this document, the field of
archiving, records management and document management continue to examine methods for the long-term
digital preservation of digital documents. Various approaches have been suggested and implemented for the
long-term preservation of different types of digital documents. These approaches include the following.
a) Preserving original formats
Even if the original document is kept intact for a long time, there is a possibility that it is not compatible
with the latest technology or that the file format can become outdated, leading to the inability to access
the document. It can also prove difficult to locate software capable of faithfully viewing the content.
b) Virtualisation to preserve the original file format’s usage environment
Virtualisation refers to creating a virtual copy of the original computing environment (e.g. hardware,
operating system, software) needed to access the document. This method allows future users to access
the document as if they were still using the original system, even if the technology has become obsolete.
The usage environment consists of the specific software and configurations required to render or
interact with the document. This method preserves access by emulating the original system. However,
it carries risks, including copyright infringement, high costs and complex maintenance, as the required
software and operating systems must be preserved.
c) Storing textual documents in a standardized long-term preservation format (e.g. PDF/A)
This involves converting documents into formats specifically designed for long-term preservation,
such as PDF/A. While this is a widely accepted method, it does not guarantee that all original document
properties will be fully preserved, particularly in cases where documents contain unique elements (e.g.
embedded media, dynamic elements) that do not necessarily convert well into the new format.
d) Storing textual documents in widely supported formats (which can be proprietary or non-standardized)
In this approach, documents are stored in formats that are widely supported, such as proprietary
formats (e.g. DOC). This option carries the risk that these formats can become obsolete in the future, but
it provides the advantage of using formats that are accessible and supported by various tools.
e) Converting textual documents to standardized or updated formats
Conversion refers to transforming textual documents from their original file formats into newer or
standardized formats to ensure continued accessibility. This process helps prevent obsolescence by
allowing documents to be opened and used in up-to-date environments. However, it can lead to loss
of information, metadata, or structural elements if the conversion does not fully support the source
format. For this reason, conversion procedures should be clearly documented, and the resulting files
should be verified for fidelity and completeness.
f) Encapsulating textual documents with related resources
Encapsulation involves packaging a textual document together with all its related information, such as
metadata, fonts, schemas, and usage context, into a single container file. This approach ensures that all
components necessary for rendering and interpretation are preserved together. Although encapsulation
can improve integrity and portability, it also increases storage requirements and can depend on
proprietary container structures. Standardized encapsulation formats should be used whenever
possible to maintain interoperability.

However, when converting documents into a dedicated visualization format (e.g. XPS, PDF/A) for long-term
preservation, there is no guarantee that all the information from the original document will be preserved
in the long run. This is because different document formats can have different properties and conversion
software can be limited. Therefore, solutions like virtualization or migration can face potential issues
related to technical obsolescence, legal problems, or the loss of document fidelity.
Data collection is crucial for the development of many technologies related to data analysis, generative
artificial intelligence (AI), big data etc. Most of this data is either numerical or text-based, and text
information may be included from textual documents. Therefore, it is essential to preserve documents for an
extended period to facilitate the training of AI models. This preservation can be done while preserving the
characteristics and semantic information of the original documents (e.g. DOCX, ODT, HWPX, HTML) when
converting textual documents into a dedicated format for long-term preservation.
At present, there is no standard definition or reference model that identifies the types of content or
structural information of textual documents that can be required for long-term preservation. This makes it
difficult to conduct technical analysis of individual documents. In the field of document management, where
technical analysis and information on textual documents are limited, it can be difficult to define long-term
preservation strategies or evaluation conditions for such documents.
It is important to establish preservation strategies for various existing types of digital content that can
include:
— text within the document;
— graphics, such as graphs and charts;
— audio and video clips;
— hyperlinks and metadata information;
— semantics of the original document;
— digital signature authentication information;
— binary data (closed stream format).
This reference model and its recommendations (see 5.3.4) provide a clear set of criteria and technical
factors – such as contextual information support, complexity, interoperability, viability, and reusability – for
evaluating the long-term preservation of textual documents.
These criteria can be applied in both quantitative and qualitative assessments, and even individuals without
technical training can conduct technical analyses. The model also serves as guidance for identifying a more
detailed and measurable set of criteria applicable to textual documents commonly defined in archiving or
document management.
5.3 Multi-layered reference model
5.3.1 General
Textual documents can vary in structure from being very simple to highly complex. In order to classify
textual documents, an abstract reference model is used that categorizes them based on their visual,
descriptive, logical, physical characteristics, and on the content itself. For each of these characteristics,
layers are constructed, and properties are mapped to the corresponding reference model. The reference
model for the textual document is defined as a multi-layered reference model consisting of five layers. The
complete structure of this reference model is illustrated in Figure 4.

Figure 4 — Reference model of textual documents
5.3.2 Layers of the reference model
5.3.2.1 Visualization layer
The visualization layer is defined as a layer that includes the properties necessary for visually representing
the document.
Textual documents contain text and related information, ranging from simple to complex forms. This content
is displayed on the screen, which is referred to as the visualization layer among the various layers that make
up the reference model. Textual documents express a variety of content through rendering processes. The
properties related to this are classified into the visualization layer.
Textual documents can be visualized through the visualization layer, targeting static content elements such
as text and images, style information including fonts and layout, as well as dynamic elements such as video.
In the case of plain-text documents with no additional visualization properties such as font face information
and layout style, the visualization information can depend on the platform or program used to visualize the
document.
This dependence should be recognised as a potential risk for long-term preservation, particularly for
documents requiring specific rendering (e.g. ASCII art, complex scripts, bidirectional text). Possible
approaches to mitigate this risk include providing coordinate-based or image-based representations,
embedding rendering metadata, or using standardized formats such as PDF/A. These approaches are
provided as informative examples, not as requirements.
Within the reference model, each unit of information is called a property. A property can have characteristics
from multiple layers of the reference model. The properties belonging to the visualization layer are called
visualization properties. Depending on their complexity or implementation method, these visualization
properties can be displayed differently. In some cases, the characteristics of other layers are maintained,
while in other cases, they are not. Depending on whether the original characteristics are preserved or not, it
not only affects the long-term preservation but also the compatibility of the document.
Based on the textual documents reference model, the visualization layer properties can be used to access
the accuracy and reliability of visualization, such as if visualization can vary depending on the system or
application used. For example, using an image format can preserve the visual appearance accurately, but
at the detriment of other layers, while other formats can be less visually precise if they support dynamic
reflow or re-layout.
5.3.2.2 Content layer
The second layer of the reference model, the content layer, is defined as the layer responsible for representing
the intrinsic content elements contained within the document.
Textual documents can include various types of information, ranging from basic text to images, videos,
sounds, charts, and more. The content layer includes properties that represent the crucial information for
all types of textual documents. The basic information included in the content layer can be in a standardized
format for each type or in a non-standardized format.
The content layer plays a critical role in representing the core elements of a document. Among these, the
text property constitutes the core element of textual documents and therefore represents the most essential
aspect for their preservation.
Other content properties, depending on their relevance, can also be prioritized based on the specific use
case or requirements. Regardless of the format, it is essential that the properties representing the content
itself are preserved without any loss, ensuring the integrity and meaning of the document remain intact.
The properties in the content layer can be visually represented, or not.
5.3.2.3 Metadata layer
The third layer in the reference model is the metadata layer, which encompasses a range of metadata within
textual documents. The metadata layer is defined to represent information classified as metadata included
within the document. This layer plays a crucial role in providing context, structure, and additional details
that enhance the understanding and management of the document's content.
This layer includes properties that do not directly influence the visualization or the content of the document
and therefore are not included in either the visualization layer or the content layer.
Conversely, visualization properties or content properties do not belong to the metadata layer. The metadata
layer is associated with additional information contained within the document, which is essential for
providing context and enhancing the usability of the document. Metadata properties can include, but are
not limited to, document summary information, fields within the document, alternative text for accessibility
support, document change tracking information, and notes, depending on the specific needs of the document.
5.3.2.4 Semantics layer
The fourth layer of the reference model is the semantics layer, which is defined to contain structured content
that expresses semantic information and conveys meaning within the document.
It is possible that the semantics layer is not present in simple textual documents that contain only basic
content information, such as plain text or images. However, when textual documents include properties
that encompass various structural information, such as paragraphs, lists, headers, table titles, and figure
titles, they are included in the semantics layer. Additionally, the properties within the semantics layer can be
applied to the visualization layer.
If properties within the semantics layer are omitted, it can result in discrepancies between the visualized
part of the document and the original document. This can lead to alterations or omissions in the logical
structure and contextual representation of the original document.
The semantics layer can include properties such as information distinguishing paragraphs, headers, footers,
the flow order of document content, captions for images or tables, automatically assigned paragraph
numbers, table of contents information, footnotes, endnotes and other related properties.
5.3.2.5 Package layer
The fifth layer of the reference model is the package layer, which is defined to represent the method and all
related properties for converting textual documents into data streams and storing them in physical storage,
along with any associated information. This layer treats document data as data streams and ensures that
the necessary information for storage and retrieval is included.

However, for embedded files within textual documents, they are treated as stream objects, and the specific
storage methods related to individual formats are not directly managed.
A key challenge related to the package layer is the potential risk of data loss when storing digital documents
on physical storage devices that use non-standard or proprietary formats. This can lead to difficulties in
interpreting the respective data streams, as well as potential issues with the storage medium itself.
In this context, the role of the file system is critical, as it governs the logical structure and accessibility of files
within physical storage. File systems that are non-standard, obsolete, or proprietary can pose additional
risks for long-term preservation due to limitations in compatibility or recoverability. It is therefore essential
to consider the characteristics of the file system when assessing the long-term preservation readiness of the
packaging layer.
The package layer can include additional information to address structural responses to errors in the
storage medium or compatibility with evolving technologies that can make data stream interpretation
difficult. Moreover, compression techniques used to bundle multiple files into a single physical unit during
the construction of textual documents are also managed within the package layer.
The package layer can incorporate properties defined by separate, independent specifications or standards,
potentially with more ease than the visualization, contents, semantics and metadata layers.
5.3.3 Property types of each layer
5.3.3.1 Visualization property
Visualization properties affect the visualization of textual documents. These properties can possess multiple
characteristics, as they can affect or apply to multiple layers. They are related to visual information, and
even if such information is not visible to the user, it can still be preserved, as it can be made visible through
appropriate rendering or presentation mechanisms. They can occupy space within the document and
influence the style properties in which documents or information are represented, making them part of the
visualization properties.
Visual properties that are used to represent or display content on a document include style properties like
font, margin, colour and text decoration. In addition, content properties like text, image and table can also be
considered as visualization properties in some formats, as they directly contribute to the visual appearance
of the document on a screen.
5.3.3.2 Content property
Content properties refer to those that represent the informational content of textual documents. These
properties can have inter-related characteristics and can be applied to multiple layers. Content properties
can be displayed when the document is visualized, or not. A type of content that is commonly used in
documents is text. This can be represented using publicly standardized character encoding schemes such
as Unicode, or ASCII, or platform-specific encoding schemes such as CP949 or Korean-Johap. The image
property, another common information type in textual documents, can be implemented in various formats
like BMP, JPG, PNG and GIF.
When documents are implemented as a format, it is common to distinguish between style properties
responsible for visualization and content properties representing the content itself. Hence, preserving
properties related to these two layers can play a significant role in enhancing long-term preservation,
enabling users to read and utilize digital documents.
5.3.3.3 Metadata property
The metadata property represents supplementary information, which does not affect the visualization layer
of textual documents and is not incorporated into the content layer.

An example of a metadata property is the diverse types of metadata embedded within a document. The
varieties of metadata found in textual documents can include the following.
a) Descriptive metadata, which is the descriptive information about a document. It is used for discovery
and identification. It includes elements such as author, title, abstract, author and keywords.
b) Structural/semantics metadata, which is about containers of data and indicates how compound objects
are put together, for example, how pages are ordered to form chapters. It describes the types, versions,
relationships, and other characteristics of digital materials or a specific part of content.
c) Administrative metadata, which is information to help manage a resource, like resource type,
permissions, and when and how it was created.
d) Reference metadata, which is information about the contents and quality of statistical data.
e) Legal metadata, which provides information about the creator, copyright holder and public licensing, if
provided.
The metadata property should not influence the visualization layer. However, metadata that is involved
in the content layer as content can affect the visualization layer. Even if the properties within the content
layer and metadata layer of a textual document contain similar information, they should be distinguished
and adhere to the recommendations of their respective layers. Moreover, it is often crucial to ensure that
the elimination of all Metadata properties from a document does not lead to any distortion of the visual
information or content of the textual document.
5.3.3.4 Semantics property
Semantics property represents structural and contextual information within textual documents. Generally,
the semantics property does not influence the visualization layer. However, depending on the document
creation and editing tools, it can be visualized for user convenience, thereby affecting the visualization layer.
Even so, like the metadata layer, the semantics layer can be entirely omitted from the document. Even in
such instances, the final form of the document’s visualization layer should remain unaffected.
5.3.3.5 Package property
Package property refers to the properties used for physical packaging, encryption, digital signatures,
integrity verification and other related aspects of a document. These properties are specifically implemented
to assemble textual documents into physical units for storage as data streams or to enable associated
functionalities.
The package property typically operates independently and possibly does not influence other layers. It
includes information related to the method of storing documents in physical storage, details used to verify
document integrity, information required for digital signature verification and information utilized for
encryption and decryption purposes.
5.3.4 Recommendations for assessing each layer
5.3.4.1 Recommendations for the visualization layer
When setting criteria to ensure that the visual representation of a textual document remains unchanged,
the properties defined in the visualization layer of the reference model should meet the following
recommendations.
a) Even if there is no need to preserve visual information, properties that pertain to both the content layer
and the visualization layer should be retained and not omitted.
b) The data representing the properties included in the visualization layer should be expressed as system-
and application-independent values

c) Target textual documents that cannot be adequately displayed without additional information should
include visualization properties. This situation can also arise in case of plain-text files. In such cases,
alternative formats should be considered and verified to ensure that they support the required
visualization properties.
d) In the absence of external references, corresponding visualization layer properties may be omitted.
e) Visualization properties can also pertain to other layers depending on the context. Therefore, these
properties should be considered in relation to the recommendations applicable to the associated layers.
As an example of the recommendations mentioned above, in the case of a), the absence of essential properties,
such as text or image, necessary for the composition of the visualization layer, renders it impossible to
maintain not only the visualization properties but also the content properties.
For item b), long-term preservation of documents is archivable only when no information explicitly depending
on values or states within a specific program or software rendering engine used to display documents on
devices such as displays or printers. Instead, representation should rely on coordinate-based vector values,
standardized colour information, and standardized units such as centimetres (cm) and millimetres (mm),
as well as pixel-based coordinates (px), relative values expressed as percentages (%) and typographic units
such as points (pt), where 1 pt equals approximately 0,3528 mm.
For c), which is plain text, additional information is added to satisfy the long-term preservation
recommendations of the visualization layer exactly as it was created. An example of this representation is
shown in Figure 5. This can
...