Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of detailed descriptions of common etymological phenomena and/or diachronic information with respect to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such an extension as well as the relevant data categories.

Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique

Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del: Etimološka razširitev

General Information

Status: Published
Publication Date: 30-Mar-2021

ICS: 01.020 - Terminology (principles and coordination)

Technical Committee: ISO/TC 37/SC 4 - Language resource management
Drafting Committee: ISO/TC 37/SC 4/WG 4 - Lexical resources

Current Stage: 9060 - Close of review
Completion Date: 02-Sep-2031

Ref Project: SIST ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF) - Part 3: Etymological extension

Buy Documents

ISO 24613-3:2021 - BARVE - Page 3 preview

ISO 24613-3:2021 - BARVE - Page 1 preview

ISO 24613-3:2021 - BARVE - Page 2 preview

Standard

ISO 24613-3:2021 - BARVE

English language (26 pages)

Preview

e-Library read for

AI-Chat

1 day

Create e-Library subscription and get permanent access to the document. Subscriptions are available for: 01 01.020

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF) - Page 1 preview

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF) - Page 2 preview

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF) - Page 3 preview

Standard

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF)

English language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

ISO 24613-3:2021 - Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension/31/2021 - Page 3 preview

Standard

ISO 24613-3:2021 - Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension/31/2021

Release Date:31-Mar-2021

English language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Page 3 preview

Standard

ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)

French language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

ISO 24613-3:2021 - Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique/31/2021 - Page 3 preview

Standard

ISO 24613-3:2021 - Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique/31/2021

Release Date:31-Mar-2021

French language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

Relations

Consolidated By: ISO 23500-1:2024 - Preparation and quality management of fluids for haemodialysis and related therapies — Part 1: General requirements
Effective Date: 06-Jun-2022

Revises: ISO 24613:2008 - Language resource management - Lexical markup framework (LMF)
Effective Date: 17-Feb-2018

Overview

ISO 24613-3:2021 - Language resource management - Lexical Markup Framework (LMF) - Part 3: Etymological extension - defines a standardized extension to the LMF core for encoding etymological and diachronic information in born-digital and retro-digitized lexicons. The standard provides a meta-model and a set of data categories to describe word origins, historical development, cognates and etymons, enabling interoperable representation of etymologies across lexicographic and language-technology resources.

Key topics and technical requirements

Extension to LMF: Builds on ISO 24613-1 (core model) and ISO 24613-2 (MRD model) to add etymology-specific classes and attributes.
Primary classes: Defines classes such as Etymon, Cognate, Etymologizable, Etymology, EtyLink, CognateSet, Date, and Gloss to model etymological structure.
Etymology and EtyLink: Etymology instances can include types/subtypes (e.g., borrowing, inheritance, compounding, sense shift, phonological processes) and are composed of ordered EtyLink steps connecting lexical elements; links use prev/next sequencing and may include temporal attributes.
CognateSet and onomasiology: Supports grouping of cognates across languages to document shared etymological sources or cross-linguistic comparisons.
Dating and precision: Uses a Date class with attributes (date, circa, notBefore, notAfter) and mandates ISO 8601‑1/ISO 8601‑2 compliant date/time formats.
Metadata and references: Etymology instances can reference bibliographic sources (per ISO 24613-2). Cross-references require id attributes for target objects (CrossREF usage).
Data categories: Annex B lists normative data categories for consistent etymology description; Annex A provides typology examples.

Applications and who uses it

ISO 24613-3:2021 is practical for:

Lexicographers and dictionary projects standardizing etymological entries in digital and retro-digitized lexicons.
Digital humanities researchers documenting diachronic language change and historical corpora.
Computational linguists and NLP engineers needing structured etymological metadata for lexical databases, morphological analyzers, or multilingual lexicons.
Language technology developers building interoperable lexical resources, machine-readable dictionaries, and etymology-aware search/indexing tools.
Archivists and publishers converting legacy dictionaries into standardized, machine-actionable formats.

Benefits include improved interoperability, consistent etymological annotation, and easier aggregation of historical linguistic data for research and language technology.

Related standards

ISO 24613-1:2019 - LMF core model
ISO 24613-2:2020 - LMF MRD model
ISO 8601-1 / ISO 8601-2 - Date/time representations
Other parts of the ISO 24613 series for additional lexical extensions (see ISO for list)

Keywords: ISO 24613-3:2021, Lexical Markup Framework, LMF etymology, etymological extension, lexical resources, etymology metadata, cognate, etymon, diachronic lexicon.

Buy Documents

Standard

ISO 24613-3:2021 - BARVE

English language (26 pages)

Preview

e-Library read for

AI-Chat

1 day

Create e-Library subscription and get permanent access to the document. Subscriptions are available for: 01 01.020

Standard

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF)

English language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

Standard

ISO 24613-3:2021 - Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension/31/2021

Release Date:31-Mar-2021

English language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

Standard

ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)

French language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

Standard

ISO 24613-3:2021 - Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique/31/2021

Release Date:31-Mar-2021

French language (22 pages)

sale 15% off

Preview

sale 15% off

Preview

Frequently Asked Questions

What is ISO 24613-3:2021?

ISO 24613-3:2021 is a standard published by the International Organization for Standardization (ISO). Its full title is "Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension". This standard covers: This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of detailed descriptions of common etymological phenomena and/or diachronic information with respect to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such an extension as well as the relevant data categories.

What is the scope of ISO 24613-3:2021?

What ICS categories does ISO 24613-3:2021 belong to?

ISO 24613-3:2021 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination). The ICS classification helps identify the subject area and facilitates finding related standards.

What standards are related to ISO 24613-3:2021?

ISO 24613-3:2021 has the following relationships with other standards: It is inter standard links to ISO 23500-1:2024, ISO 24613:2008. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

How can I access ISO 24613-3:2021?

ISO 24613-3:2021 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

SLOVENSKI STANDARD
01-junij-2021
Nadomešča:
SIST ISO 24613:2013
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:
Extension étymologique
Ta slovenski standard je istoveten z: ISO 24613-3:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

INTERNATIONAL ISO
STANDARD 24613-3
First edition
2021-03
Language resource management —
Lexical markup framework (LMF) —
Part 3:
Etymological extension
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
Reference number
©
ISO 2021
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-4:2021
1)
and ISO 24613-5:— , cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation. Stage at the time of publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – All rights reserved

INTERNATIONAL STANDARD ISO 24613-3:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
2)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
2) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2021 – All rights reserved

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2021 – All rights reserved

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the example given in A.6, and in
Figure A.10, for the example given in A.8.
6 © ISO 2021 – All rights reserved

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2021 – All rights reserved

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal informat
...

INTERNATIONAL ISO
STANDARD 24613-3
First edition
2021-03
Language resource management —
Lexical markup framework (LMF) —
Part 3:
Etymological extension
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
Reference number
©
ISO 2021
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-4:2021
1)
and ISO 24613-5:— , cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation. Stage at the time of publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – All rights reserved

INTERNATIONAL STANDARD ISO 24613-3:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
2)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
2) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2021 – All rights reserved

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2021 – All rights reserved

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the example given in A.6, and in
Figure A.10, for the example given in A.8.
6 © ISO 2021 – All rights reserved

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2021 – All rights reserved

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information
The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a
focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein
at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded
in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of
Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the
spans of time, and the sequential ordering asserted is th
...

NORME ISO
INTERNATIONALE 24613-3
Première édition
2021-03
Gestion des ressources
linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
Language resource management — Lexical markup framework
(LMF) —
Part 3: Etymological extension
Numéro de référence
©
ISO 2021
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2021
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2021 – Tous droits réservés

Sommaire Page
Avant-propos .iv
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Extension étymologique de LMF . 2
4.1 Classes Cognate et Etymon . 2
4.2 Classe Etymologizable . 2
4.3 Classes Etymology et EtyLink . 3
4.4 Classe CognateSet . 4
4.5 Classe Date . 4
4.6 Classe Gloss . 4
Annexe A (informative) Exemples de typologies étymologiques possibles . 6
Annexe B (normative) Catégories de données pour la description étymologique .15
Bibliographie .22
Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www
.iso .org/ directives).
L’attention est attirée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www .iso .org/ brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion
de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.
Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Cette première édition de l’ISO 24613-3, conjointement avec l'ISO 24613-1:2019, l'ISO 24613-2:2020,
1)
l'ISO 24613-4:2021 et l'ISO 24613-5:— , annule et remplace l’ISO 24613:2008, qui a fait l’objet d’une
révision technique.
Les principales modifications par rapport à l’édition précédente sont les suivantes:
— révision complète du contenu et de sa subdivision en plusieurs parties.
Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www .iso .org/ fr/ members .html.
1) En cours d’élaboration. Stade au moment de la publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – Tous droits réservés

NORME INTERNATIONALE ISO 24613-3:2021(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
1 Domaine d’application
Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration
des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques
par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un
métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s’applique (y compris les
éventuels amendements).
ISO 8601-1, Date et heure — Représentations pour l'échange d'information — Partie 1: Règles de base
ISO 8601-2, Date et heure — Représentations pour l'échange d'information — Partie 2: Extensions
ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1:
Modèle de base
ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de
dictionnaire lisible par ordinateur (MRD)
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions de l’ISO 24613-1 ainsi que les suivants,
s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;
— IEC Electropedia: disponible à l’adresse http:// www .electropedia .org/ .
3.1
cognat
dans une langue apparentée, forme qui partage une origine étymologique commune avec une forme
dans la langue du lexique
3.2
étymologisable
qui remplit les conditions requises pour avoir une étymologie (3.3)
Note 1 à l'article: Le terme «étymologisable» se rapporte à une catégorie d’éléments et d’usages lexicaux
(englobant par exemple des entrées lexicales, des sens et des mots-formes).
3.3
étymologie
origine et développement historique de tout aspect d’un élément lexical donné
3.4
étymon
entrée lexicale dont découle une autre entrée lexicale
Note 1 à l'article: Un étymon peut également être une phase antérieure d’un élément lexical.
3.5
onomasiologie
étude sémantique des mots qui, en partant d’un concept donné, examine les différents éléments lexicaux
utilisés dans une ou plusieurs langues pour se référer à ce concept
4 Extension étymologique de LMF
NOTE Voir l’Annexe A pour obtenir des exemples de typologies étymologiques possibles.
4.1 Classes Cognate et Etymon
Cognate et Etymon sont définies comme des sous-classes de la classe LexicalEntry du module de base
2)
LMF (voir Figure 1) . Les deux classes définissent des entrées lexicales qui ont été ajoutées à une
ressource lexicale dans le but de décrire les étymologies d’une ou plusieurs autres entrées lexicales. Aux
instances Etymon ou Cognate peut être assignée une langue qui est différente de celle du lexique dans
son ensemble (cette langue est spécifiée dans la classe LexiconInformation décrite dans l’ISO 24613-1).
Figure 1 — Sous-classes Cognate et Etymon de la classe LexicalEntry
Les individus des sous-classes Etymon et Cognate doivent être en relation d’agrégation avec au moins
un individu de type EtyLink (voir 4.3). Lors de la description des étymologies, il est nécessaire de traiter,
dans certains cas, des instances LexicalEntry (et, de ce fait, également des instances de relation de
sous-classe d’Etymon et de Cognate) qui sont des radicaux, et en particulier des radicaux reconstruits.
Dans ces cas, le fait d’être un radical et le type du radical en question doivent être spécifiés en utilisant
l’attribut rootType. Dans le cas de radicaux reconstruits ou d’autres mots-formes, l’attribut status sert
à associer l’élément à une description écrite de sa probabilité d’avoir été utilisé (voir l’exemple de A.8).
Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec ces deux sous-classes.
4.2 Classe Etymologizable
La classe Etymologizable permet de désigner l’ensemble des éléments linguistiques qui peuvent avoir
des étymologies. En définissant une seule classe englobant tous les éléments «étymologisables» de
ce type, les classes d’éléments qui peuvent avoir des étymologies pourront être facilement étendues
en fonction des besoins. Les classes suivantes sont des sous-types de la classe Etymologizable (voir
Figure 2): LexicalEntry, Sense, Form et CognateSet (voir 4.4).
2) Les schémas du présent document utilisent les codes de couleurs suivants : les classes en jaune sont
introduites dans le présent document, alors que les classes en rose l’avaient été précédemment dans l’ISO 24613-1
et l’ISO 24613-2.
2 © ISO 2021 – Tous droits réservés

Figure 2 — Classe Etymologizable et ses sous-classes
4.3 Classes Etymology et EtyLink
La classe Etymology permet de décrire l’étymologie d’un élément linguistique. Plus spécifiquement,
elle permet de décrire les éléments linguistiques qui sont des sous-classes de la classe Etymologizable.
Le type ou les types de processus étymologique impliqués dans une étymologie donnée peuvent être
spécifiés en utilisant l’attribut type, et aussi potentiellement l’attribut subtype (si le type de l’étymologie
peut également être précisé). Les valeurs possibles pour les attributs type et subtype peuvent varier
en fonction de l’approche théorique adoptée par le compilateur d’une ressource et/ou de l’orientation
linguistique ou éditoriale de la ressource en question. L’imbrication des instances Etymology permet de
combiner les processus étymologiques à décrire. Les processus étymologiques qui doivent être utilisés
comme valeurs pour les attributs type/subtype peuvent par exemple être: emprunt, héritage; formation
des mots: mots composés, dérivation; glissements de sens: restriction, élargissement, amélioration,
péjoration, métaphore, métonymie; processus phonétiques/phonologiques: assimilation, dissimilation,
épenthèse, métathèse, durcissement, affaiblissement, etc. La liste des catégories de données fournie
à l’Annexe B doit être utilisée en complément des classes appropriées. Un type peut également être
spécifié pour des liens individuels entre deux éléments au sein d’une étymologie (voir la description
d’EtyLink ci-dessous). Comme une instance Etymology peut être extraite d’une source externe, elle
peut être associée à une instance Bibliography, qui doit être définie conformément à l’ISO 24613-2 (voir
Figure 3).
Figure 3 — Classe Etymology
Les instances Etymology sont associées à une ou plusieurs instances EtyLink, chacune d’elles
représentant une seule phase ou étape dans l’étymologie d’un élément lexical donné (voir Figure 4).
EtyLink sert à rattacher des individus appartenant aux sous-classes Etymologizable. EtyLink est une
sous-classe de la classe CrossREF définie dans l’ISO 24613-1. L’utilisation de CrossREF impose d’affecter
des attributs id aux objets cibles représentant le contenu lexical spécifique. L’utilisation de l’attribut
id, en tant que cible, sur un individu de la classe Etymologizable permet de modéliser un tri temporel
séquentiel générique d’éléments multiples, avec les attributs prev et next. Des instances de la classe
EtyLink peuvent également spécifier des relations temporelles supplémentaires en utilisant divers
attributs temporels associés à la source et à la cible de chaque instance EtyLink.
Figure 4 — EtyLink
Les individus des classes Etymon et Cognate (sous-types de LexicalEntry) doivent être associés à au
moins un individu EtyLink. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec les classes
Etymology et EtyLink.
4.4 Classe CognateSet
La classe CognateSet (voir Figure 5) est un conteneur d’ensembles d’un ou plusieurs éléments Cognate
et de zéro ou plusieurs éléments Bibliography (voir l’ISO 24613-2). La classe CognateSet est une
construction liée à l’onomasiologie. Elle contient des éléments issus de langages apparentés à celui d’une
classe LexicalEntry donnée (et donc liés par la relation de sous-classe de tout Etymon ou Cognate donné)
et qui ont été collectés dans le but de démontrer des similarités ou des dissimilarités linguistiques de
type essentiel. L’utilisation d’une classe CognateSet implique que les instances LexicalEntry (et donc
Etymon et Cognate) qu’elle contient partagent une source étymologique.
Figure 5 — CognateSet
4.5 Classe Date
Les composants d’une classe LexicalEntry et de ses sous-classes doivent être associés à une date
spécifique en utilisant la classe Date. De plus, cette classe Date permet de spécifier un nombre de degrés
de précision. Une année précise, et potentiellement un mois et un jour donnés, doivent être spécifiés
en utilisant l’attribut date et une date approximative avec l’attribut circa. Sur un laps de temps donné
avec différents niveaux de spécificité, il est possible d’utiliser un ou plusieurs attributs de date. Lorsque
le laps de temps est connu (ou évalué), ses bornes inférieure et supérieure peuvent être précisées en
utilisant respectivement notBefore et notAfter. Pour les formats de date et d’heure, l’ISO 8601-1 et
l’ISO 8601-2 doivent être utilisées.
4.6 Classe Gloss
La classe Gloss (voir Figure 6) représente une description textuelle de la signification d’un mot ou d’un
syntagme qui est destiné à être compris par l’homme. Les individus de cette classe peuvent représenter
des paraphrases ou des synonymes qui peuvent être rédigés dans la langue de l’entrée ou dans une
autre langue. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec cette classe.
4 © ISO 2021 – Tous droits réservés

Figure 6 — Gloss
Tableau 1 — Exemple d’affectation de classes
Nom de la classe Exemple d’attributs
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss x m l: l a n g
Annexe A
(informative)
Exemples de typologies étymologiques possibles
A.1 Exemple d’héritage simple
L’exemple de la Figure A.1 décrit l’héritage d’une entrée lexicale provenant d’une langue parente,
dans ce cas l’adverbe sarde sempe, qui provient du mot latin semper. Le mot sarde est lié à une seule
instance Etymology qui est associée au type inheritance (voir l’Annexe B pour une définition de ce type).
L’instance Etymology est ensuite associée à un individu de type EtyLink qui représente le processus
d’évolution de l’étymon latin jusqu’à l’entrée lexicale sarde.
Figure A.1 — Schéma d’héritage en sarde
A.2 Exemple de processus étymologique diachronique (héritage avec
changement phonologique)
Dans l’exemple suivant, l’évolution du mot signifiant «vin» (ipa: [veŋ]) dans la variété de vins italiens
Emiliano est suivie en utilisant une série d’étymons qui sont triés et mutuellement liés en utilisant des
instances EtyLink. Ce processus mène du latin vulgaire vinu jusqu’au prédécesseur immédiat du mot
sous sa forme actuelle. Ces liens individuels sont accessibles via un individu Etymology (avec le type
inheritance) qui représente l’historique de LexicalEntry. Le tri des instances EtyLink est effectué au
moyen des deux attributs prev et next. Ces attributs ne sont pas représentés sur la Figure A.2 pour des
raisons de lisibilité, mais leur utilisation est indiquée sur la Figure A.6, pour l’exemple de A.6, et sur la
Figure A.10, pour l’exemple de A.8.
6 © ISO 2021 – Tous droits réservés

Figure A.2 — Schéma d’un héritage en plusieurs étapes et d’un changement phonologique en
bolonais
A.3 Exemple d’héritage de mot-forme
La dérivation des formes singulière et plurielle du nom portugais naçao ’nation’, naçao (sg) et nações
(pl), respectivement, dérivé de deux formes d’un nom en latin vulgaire (LV), nātiōnem (sg, acc), nātiōnes
(pl, acc) est décrite sur la Figure A.3. Dans ce cas, comme les formes portugaises concernées sont
toutes deux au pluriel et au singulier, l’étymon a deux instances WordForm correspondant chacune à
un nombre grammatical. Il faut notamment remarquer l’association des attributs grammaticalCase et
inflectionType à la sous-classe WordForm d’Etymon via une instance GrammaticalInformation. Dans
un lexique de portugais détaillé qui contiendrait de telles informations étymologiques pour un nombre
suffisant d’entrées lexicales, il serait possible, en opposant le contenu de l’instance WordForm dans
LexicalEntry à la classe Etymon, de percevoir les phénomènes langagiers suivants: 1) perte du cas
grammatical portugais; 2) la grande majorité de ses noms découlent du cas accusatif en LV; 3) lorsque la
terminaison singulière (accusatif) en LV était -tiōnem, la forme portugaise s’écrit «-çao» et se prononce
[sɐ̃w]; lorsque la terminaison pluriel (accusatif) en LV était -tiōnes, la forme portugaise s’écrit «-ções»
et se prononce [sõj̃s].
NOTE Cette étymologie peut être davantage articulée en ajoutant les types de processus phonologiques
pour chaque phase de la diachronie. Cela peut être effectué dans le modèle en ajoutant la catégorie de données
appropriée définie dans l’Annexe B, à la valeur de type d’EtyLink pour les phases indiquées.
Figure A.3 — Schéma d’héritage de flexions portugaises
A.4 Exemple de glissement sémantique métaphorique
L’exemple de la Figure A.4 de Mixtepec-Mixtec illustre la dérivation d’un premier sens à un autre par
métaphore. Dans ce cas, le mot se traduisant par ’rein’ découlait d’une extension métaphorique du mot
signifiant ’haricot’ ntuchi. L’étymologie est liée au sens rénal du mot ntuchi. L’instance EtyLink contient
les attributs source et target avec les valeurs id des composants respectifs spécifiant la directionnalité
du processus. Un trait déterminant du processus de métaphore est la nécessité d’un changement dans le
domaine sémantique entre les sens source et cible, chaque sens ayant un domaine spécifié dans le champ
de sujet. La classe CrossREF définie dans l’ISO 24613-2 est utilisée pour spécifier un URI correspondant
à l’entrée dbpedia pour chaque sens.
Figure A.4 — Schéma d’un glissement de sens métaphorique en Mixtepec-Mixtec
8 © ISO 2021 – Tous droits réservés

A.5 Exemple d’emprunt et de formation de mots composés
La Figure A.5 présente un exemple d’emprunt étymologique complexe: l’emprunt du mot français
pamplemousse ’grapefruit’ à un mot hollandais (voir les références [4] et [18]). En plus de l’emprunt,
l’étymologie montre la diachronie du processus de formation de mots qui a eu lieu en hollandais où le
composé pompelmousse a été formé à partir des étymons pompel et limoes, lequel composé ayant ensuite
été emprunté en français. Une instance CrossREF (voir l’ISO 24613-2) est utilisée pour spécifier que
l’étymon emprunté est un mot composé en hollandais et que ses composants sont triés. Les composants
du mot composé sont représentés sous forme de classes Etymon contenant les formes essentielles telles
que Lemma.
Figure A.5 — Schéma d’emprunt et de formation du mot composé français pamplemousse
A.6 Exemple d’utilisation d’informations temporelles
L’exemple de la Figure A.6 présente le codage de l’étymologie de l’élément lexical français chef en
mettant l’accent sur les phases diachroniques et la chronologie de son évolution phonétique (voir la
référence [13]). Dans ce cas, au niveau Etymology, le processus spécifiant le fait que chef est hérité du
latin est codé dans l’attribut type qui prend la valeur «inheritance». Au sein du niveau Etymology, il
existe une série d’éléments Etymon contenant les attributs temporels notBefore et notAfter qui sont
utilisés pour spécifier les intervalles de temps, et le classement séquentiel sous-entend qu’une forme
donnée de l’élément aurait été utilisée. Il faut également noter le tri des éléments Etymon via l’utilisation
des attributs prev et next sur les individus EtyLink concernés.
Figure A.6 — Schéma des changements phonologiques en plusieurs phases du mot français chef
A.7 Ensemble de cognats avec bibliographie
La Figure A.7 illustre un exemple d’ensemble de cognats extrait d’une source réelle (di
...

Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension

Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del: Etimološka razširitev

General Information

Buy Documents

ISO 24613-3:2021 - BARVE

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF)

ISO 24613-3:2021 - Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension/31/2021

ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)

ISO 24613-3:2021 - Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique/31/2021

Relations

Overview

Key topics and technical requirements

Applications and who uses it

Related standards

Buy Documents

ISO 24613-3:2021 - BARVE

ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF)

ISO 24613-3:2021 - Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension/31/2021

ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)

ISO 24613-3:2021 - Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique/31/2021

Frequently Asked Questions

Standards Content (Sample)

This May Also Interest You