Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of detailed descriptions of common etymological phenomena and/or diachronic information with respect to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such an extension as well as the relevant data categories.

Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension étymologique

Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del: Etimološka razširitev

General Information

Status
Published
Publication Date
30-Mar-2021
Current Stage
6060 - International Standard published
Start Date
31-Mar-2021
Due Date
12-Jun-2021
Completion Date
31-Mar-2021

Relations

Buy Standard

Standard
ISO 24613-3:2021 - BARVE
English language
26 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF)
English language
22 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)
French language
22 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/FDIS 24613-3:2021 - BARVE
English language
26 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Draft
ISO/FDIS 24613-3:Version 19-dec-2020 - Language resource management -- Lexical markup framework (LMF)
English language
22 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/FDIS 24613-3:2021 - BARVE
English language
26 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Draft
ISO/FDIS 24613-3:Version 22-jan-2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)
French language
22 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

SLOVENSKI STANDARD
SIST ISO 24613-3:2021
01-junij-2021
Nadomešča:
SIST ISO 24613:2013
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:
Extension étymologique
Ta slovenski standard je istoveten z: ISO 24613-3:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24613-3:2021 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24613-3:2021

---------------------- Page: 2 ----------------------
SIST ISO 24613-3:2021
INTERNATIONAL ISO
STANDARD 24613-3
First edition
2021-03
Language resource management —
Lexical markup framework (LMF) —
Part 3:
Etymological extension
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
Reference number
ISO 24613-3:2021(E)
©
ISO 2021

---------------------- Page: 3 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
© ISO 2021 – All rights reserved iii

---------------------- Page: 5 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-4:2021
1)
and ISO 24613-5:— , cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation. Stage at the time of publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24613-3:2021
INTERNATIONAL STANDARD ISO 24613-3:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2021 – All rights reserved 1

---------------------- Page: 7 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
2)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
2) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2021 – All rights reserved

---------------------- Page: 8 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
© ISO 2021 – All rights reserved 3

---------------------- Page: 9 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2021 – All rights reserved

---------------------- Page: 10 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2021 – All rights reserved 5

---------------------- Page: 11 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Annex A
(informative)

Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the example given in A.6, and in
Figure A.10, for the example given in A.8.
6 © ISO 2021 – All rights reserved

---------------------- Page: 12 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
© ISO 2021 – All rights reserved 7

---------------------- Page: 13 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2021 – All rights reserved

---------------------- Page: 14 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information
The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a
focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein
at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded
in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of
Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the
spans of time, and the sequential ordering asserted is that a given form of the item would have been in
use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate
EtyLink individuals.
© ISO 2021 – All rights reserved 9

---------------------- Page: 15 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography
Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source
(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical
item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed
and thus it is not represented as Cognate in the encoding.
A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English
sense is given in the Definition for the main entry, was used early on to denote a young person regardless
of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate
has an associated Sense and a Definition instance which together describe related lexical entries in
Germanic languages, all of which are hypothesised as having been inherited from a common source.
10 © ISO 2021 – All rights reserved

---------------------- Page: 16 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example
In order to demonstrate some of the expressivity of the current model and its potential for dealing with
realistic examples of etymological data, the etymological information of part of a single entry, namely
[6]
the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.
The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and
cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that
is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data
in the example. The original text of the entry is shown in Figure A.8.
© ISO 2021 – All rights reserved 11

---------------------- Page: 17 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus
The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details
the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the
lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling
of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in
ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of
the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and
with any additional information (in this case the presence of the plus sign after the bibliographic citation
in Reference [6] means that the linguistic usage in question can also be found in later sources) added as
text using the note attribute. In this instance, the bibliographic element refers to a single author (and
therefore to their corpus of works) and the date attribute refers to the whole period encompassing their
lifespan.
12 © ISO 2021 – All rights reserved

---------------------- Page: 18 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies
The second part of the example (Figure A.10) details the modelling of the etymological information
itself (for ease of reading the elements of each of the two cognate sets are coloured using two different
colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements
and the entire etymology is attributed to the source by associating the Bibliography with the Etymology
element for the LexicalEntry. Two other Etymology elements are used to describe etymologies for
Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the
etymon is reconstructed, and the attribute status to indicate a level of uncertainty.
© ISO 2021 – All rights reserved 13

---------------------- Page: 19 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification
Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,
5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for
model simplification), for a normative description of the simplification process.
14 © ISO 2021 – All rights reserved

---------------------- Page: 20 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Annex B
(normative)

Data categories for etymology description
This annex contains a structured list of categories that shall be used as possible values for describing
etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for
each category, its name, a camel case representation, a conceptual domain (for complex data categories),
a definition (with its source(s)), one or several examples (with their source(s) when applicable), as well
as some usage notes where needed.
Where applicable, the camel case form should be used in utilizing the features as a value for the attribute
type in declaring the process of an Etymology or EtyLink.
NOTE These features are not exhaustive and do not account for all the possible theoretical super- and sub-
typologies and variant definitions or descriptions.
Borrowing
Camel case code: borrowing
Definition: The p
...

INTERNATIONAL ISO
STANDARD 24613-3
First edition
2021-03
Language resource management —
Lexical markup framework (LMF) —
Part 3:
Etymological extension
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
Reference number
ISO 24613-3:2021(E)
©
ISO 2021

---------------------- Page: 1 ----------------------
ISO 24613-3:2021(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24613-3:2021(E)

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
© ISO 2021 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO 24613-3:2021(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-4:2021
1)
and ISO 24613-5:— , cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
1) Under preparation. Stage at the time of publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – All rights reserved

---------------------- Page: 4 ----------------------
INTERNATIONAL STANDARD ISO 24613-3:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2021 – All rights reserved 1

---------------------- Page: 5 ----------------------
ISO 24613-3:2021(E)

3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
2)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
2) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2021 – All rights reserved

---------------------- Page: 6 ----------------------
ISO 24613-3:2021(E)

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
© ISO 2021 – All rights reserved 3

---------------------- Page: 7 ----------------------
ISO 24613-3:2021(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2021 – All rights reserved

---------------------- Page: 8 ----------------------
ISO 24613-3:2021(E)

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2021 – All rights reserved 5

---------------------- Page: 9 ----------------------
ISO 24613-3:2021(E)

Annex A
(informative)

Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the example given in A.6, and in
Figure A.10, for the example given in A.8.
6 © ISO 2021 – All rights reserved

---------------------- Page: 10 ----------------------
ISO 24613-3:2021(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
© ISO 2021 – All rights reserved 7

---------------------- Page: 11 ----------------------
ISO 24613-3:2021(E)

Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2021 – All rights reserved

---------------------- Page: 12 ----------------------
ISO 24613-3:2021(E)

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information
The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a
focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein
at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded
in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of
Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the
spans of time, and the sequential ordering asserted is that a given form of the item would have been in
use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate
EtyLink individuals.
© ISO 2021 – All rights reserved 9

---------------------- Page: 13 ----------------------
ISO 24613-3:2021(E)

Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography
Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source
(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical
item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed
and thus it is not represented as Cognate in the encoding.
A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English
sense is given in the Definition for the main entry, was used early on to denote a young person regardless
of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate
has an associated Sense and a Definition instance which together describe related lexical entries in
Germanic languages, all of which are hypothesised as having been inherited from a common source.
10 © ISO 2021 – All rights reserved

---------------------- Page: 14 ----------------------
ISO 24613-3:2021(E)

Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example
In order to demonstrate some of the expressivity of the current model and its potential for dealing with
realistic examples of etymological data, the etymological information of part of a single entry, namely
[6]
the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.
The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and
cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that
is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data
in the example. The original text of the entry is shown in Figure A.8.
© ISO 2021 – All rights reserved 11

---------------------- Page: 15 ----------------------
ISO 24613-3:2021(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus
The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details
the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the
lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling
of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in
ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of
the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and
with any additional information (in this case the presence of the plus sign after the bibliographic citation
in Reference [6] means that the linguistic usage in question can also be found in later sources) added as
text using the note attribute. In this instance, the bibliographic element refers to a single author (and
therefore to their corpus of works) and the date attribute refers to the whole period encompassing their
lifespan.
12 © ISO 2021 – All rights reserved

---------------------- Page: 16 ----------------------
ISO 24613-3:2021(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies
The second part of the example (Figure A.10) details the modelling of the etymological information
itself (for ease of reading the elements of each of the two cognate sets are coloured using two different
colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements
and the entire etymology is attributed to the source by associating the Bibliography with the Etymology
element for the LexicalEntry. Two other Etymology elements are used to describe etymologies for
Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the
etymon is reconstructed, and the attribute status to indicate a level of uncertainty.
© ISO 2021 – All rights reserved 13

---------------------- Page: 17 ----------------------
ISO 24613-3:2021(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification
Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,
5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for
model simplification), for a normative description of the simplification process.
14 © ISO 2021 – All rights reserved

---------------------- Page: 18 ----------------------
ISO 24613-3:2021(E)

Annex B
(normative)

Data categories for etymology description
This annex contains a structured list of categories that shall be used as possible values for describing
etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for
each category, its name, a camel case representation, a conceptual domain (for complex data categories),
a definition (with its source(s)), one or several examples (with their source(s) when applicable), as well
as some usage notes where needed.
Where applicable, the camel case form should be used in utilizing the features as a value for the attribute
type in declaring the process of an Etymology or EtyLink.
NOTE These features are not exhaustive and do not account for all the possible theoretical super- and sub-
typologies and variant definitions or descriptions.
Borrowing
Camel case code: borrowing
Definition: The process in which a lexical item, phrase or other linguistic feature from a foreign
language or dialect is introduced into a given language or dialect.
Source: Reference [4].
Note: The lexical items which result from borrowing are loanwords.
Example: Russian картошка is a loanword from High German Kartoffel.
Alternate term: loaning, importing, transferring, copying
Calque
Camel case code: calque
Definition: A loanword in which only the meaning is borrowed with the term being literally
translated into the borrowing language.
Source: Reference [5].
Example: Spanish rascacielos (rasca 'scratch', 'scrape' + cielos 'sky') from English skyscraper
Alternate term: loan translation, semantic loan
Inheritance
Camel case code: inheritance
Definition: Inheritance, although often not regarded as an etymological process in itself, is used
to identify situations where a lexical item is known to, or is presumed to, have been inherited from
a predecessor or “parent” language(s). Items inherited from a predecessor language often undergo
any number of different separate etymological processes over the course of their diachrony.
Source: Reference [4].
Example: Sardinian semper was inherited from Latin semper.
© ISO 2021 – All rights reserved 15

---------------------- Page: 19 ----------------------
ISO 24613-3:2021(E)

Compounding
Camel case code: compounding
Definition: Compounding is the process of creating a new le
...

NORME ISO
INTERNATIONALE 24613-3
Première édition
2021-03
Gestion des ressources
linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
Language resource management — Lexical markup framework
(LMF) —
Part 3: Etymological extension
Numéro de référence
ISO 24613-3:2021(F)
©
ISO 2021

---------------------- Page: 1 ----------------------
ISO 24613-3:2021(F)

DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2021
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2021 – Tous droits réservés

---------------------- Page: 2 ----------------------
ISO 24613-3:2021(F)

Sommaire Page
Avant-propos .iv
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Extension étymologique de LMF . 2
4.1 Classes Cognate et Etymon . 2
4.2 Classe Etymologizable . 2
4.3 Classes Etymology et EtyLink . 3
4.4 Classe CognateSet . 4
4.5 Classe Date . 4
4.6 Classe Gloss . 4
Annexe A (informative) Exemples de typologies étymologiques possibles . 6
Annexe B (normative) Catégories de données pour la description étymologique .15
Bibliographie .22
© ISO 2021 – Tous droits réservés iii

---------------------- Page: 3 ----------------------
ISO 24613-3:2021(F)

Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www
.iso .org/ directives).
L’attention est attirée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www .iso .org/ brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion
de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.
Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Cette première édition de l’ISO 24613-3, conjointement avec l'ISO 24613-1:2019, l'ISO 24613-2:2020,
1)
l'ISO 24613-4:2021 et l'ISO 24613-5:— , annule et remplace l’ISO 24613:2008, qui a fait l’objet d’une
révision technique.
Les principales modifications par rapport à l’édition précédente sont les suivantes:
— révision complète du contenu et de sa subdivision en plusieurs parties.
Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www .iso .org/ fr/ members .html.
1) En cours d’élaboration. Stade au moment de la publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – Tous droits réservés

---------------------- Page: 4 ----------------------
NORME INTERNATIONALE ISO 24613-3:2021(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
1 Domaine d’application
Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration
des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques
par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un
métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s’applique (y compris les
éventuels amendements).
ISO 8601-1, Date et heure — Représentations pour l'échange d'information — Partie 1: Règles de base
ISO 8601-2, Date et heure — Représentations pour l'échange d'information — Partie 2: Extensions
ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1:
Modèle de base
ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de
dictionnaire lisible par ordinateur (MRD)
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions de l’ISO 24613-1 ainsi que les suivants,
s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;
— IEC Electropedia: disponible à l’adresse http:// www .electropedia .org/ .
3.1
cognat
dans une langue apparentée, forme qui partage une origine étymologique commune avec une forme
dans la langue du lexique
3.2
étymologisable
qui remplit les conditions requises pour avoir une étymologie (3.3)
Note 1 à l'article: Le terme «étymologisable» se rapporte à une catégorie d’éléments et d’usages lexicaux
(englobant par exemple des entrées lexicales, des sens et des mots-formes).
© ISO 2021 – Tous droits réservés 1

---------------------- Page: 5 ----------------------
ISO 24613-3:2021(F)

3.3
étymologie
origine et développement historique de tout aspect d’un élément lexical donné
3.4
étymon
entrée lexicale dont découle une autre entrée lexicale
Note 1 à l'article: Un étymon peut également être une phase antérieure d’un élément lexical.
3.5
onomasiologie
étude sémantique des mots qui, en partant d’un concept donné, examine les différents éléments lexicaux
utilisés dans une ou plusieurs langues pour se référer à ce concept
4 Extension étymologique de LMF
NOTE Voir l’Annexe A pour obtenir des exemples de typologies étymologiques possibles.
4.1 Classes Cognate et Etymon
Cognate et Etymon sont définies comme des sous-classes de la classe LexicalEntry du module de base
2)
LMF (voir Figure 1) . Les deux classes définissent des entrées lexicales qui ont été ajoutées à une
ressource lexicale dans le but de décrire les étymologies d’une ou plusieurs autres entrées lexicales. Aux
instances Etymon ou Cognate peut être assignée une langue qui est différente de celle du lexique dans
son ensemble (cette langue est spécifiée dans la classe LexiconInformation décrite dans l’ISO 24613-1).
Figure 1 — Sous-classes Cognate et Etymon de la classe LexicalEntry
Les individus des sous-classes Etymon et Cognate doivent être en relation d’agrégation avec au moins
un individu de type EtyLink (voir 4.3). Lors de la description des étymologies, il est nécessaire de traiter,
dans certains cas, des instances LexicalEntry (et, de ce fait, également des instances de relation de
sous-classe d’Etymon et de Cognate) qui sont des radicaux, et en particulier des radicaux reconstruits.
Dans ces cas, le fait d’être un radical et le type du radical en question doivent être spécifiés en utilisant
l’attribut rootType. Dans le cas de radicaux reconstruits ou d’autres mots-formes, l’attribut status sert
à associer l’élément à une description écrite de sa probabilité d’avoir été utilisé (voir l’exemple de A.8).
Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec ces deux sous-classes.
4.2 Classe Etymologizable
La classe Etymologizable permet de désigner l’ensemble des éléments linguistiques qui peuvent avoir
des étymologies. En définissant une seule classe englobant tous les éléments «étymologisables» de
ce type, les classes d’éléments qui peuvent avoir des étymologies pourront être facilement étendues
en fonction des besoins. Les classes suivantes sont des sous-types de la classe Etymologizable (voir
Figure 2): LexicalEntry, Sense, Form et CognateSet (voir 4.4).
2) Les schémas du présent document utilisent les codes de couleurs suivants : les classes en jaune sont
introduites dans le présent document, alors que les classes en rose l’avaient été précédemment dans l’ISO 24613-1
et l’ISO 24613-2.
2 © ISO 2021 – Tous droits réservés

---------------------- Page: 6 ----------------------
ISO 24613-3:2021(F)

Figure 2 — Classe Etymologizable et ses sous-classes
4.3 Classes Etymology et EtyLink
La classe Etymology permet de décrire l’étymologie d’un élément linguistique. Plus spécifiquement,
elle permet de décrire les éléments linguistiques qui sont des sous-classes de la classe Etymologizable.
Le type ou les types de processus étymologique impliqués dans une étymologie donnée peuvent être
spécifiés en utilisant l’attribut type, et aussi potentiellement l’attribut subtype (si le type de l’étymologie
peut également être précisé). Les valeurs possibles pour les attributs type et subtype peuvent varier
en fonction de l’approche théorique adoptée par le compilateur d’une ressource et/ou de l’orientation
linguistique ou éditoriale de la ressource en question. L’imbrication des instances Etymology permet de
combiner les processus étymologiques à décrire. Les processus étymologiques qui doivent être utilisés
comme valeurs pour les attributs type/subtype peuvent par exemple être: emprunt, héritage; formation
des mots: mots composés, dérivation; glissements de sens: restriction, élargissement, amélioration,
péjoration, métaphore, métonymie; processus phonétiques/phonologiques: assimilation, dissimilation,
épenthèse, métathèse, durcissement, affaiblissement, etc. La liste des catégories de données fournie
à l’Annexe B doit être utilisée en complément des classes appropriées. Un type peut également être
spécifié pour des liens individuels entre deux éléments au sein d’une étymologie (voir la description
d’EtyLink ci-dessous). Comme une instance Etymology peut être extraite d’une source externe, elle
peut être associée à une instance Bibliography, qui doit être définie conformément à l’ISO 24613-2 (voir
Figure 3).
Figure 3 — Classe Etymology
Les instances Etymology sont associées à une ou plusieurs instances EtyLink, chacune d’elles
représentant une seule phase ou étape dans l’étymologie d’un élément lexical donné (voir Figure 4).
EtyLink sert à rattacher des individus appartenant aux sous-classes Etymologizable. EtyLink est une
sous-classe de la classe CrossREF définie dans l’ISO 24613-1. L’utilisation de CrossREF impose d’affecter
des attributs id aux objets cibles représentant le contenu lexical spécifique. L’utilisation de l’attribut
id, en tant que cible, sur un individu de la classe Etymologizable permet de modéliser un tri temporel
séquentiel générique d’éléments multiples, avec les attributs prev et next. Des instances de la classe
EtyLink peuvent également spécifier des relations temporelles supplémentaires en utilisant divers
attributs temporels associés à la source et à la cible de chaque instance EtyLink.
Figure 4 — EtyLink
© ISO 2021 – Tous droits réservés 3

---------------------- Page: 7 ----------------------
ISO 24613-3:2021(F)

Les individus des classes Etymon et Cognate (sous-types de LexicalEntry) doivent être associés à au
moins un individu EtyLink. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec les classes
Etymology et EtyLink.
4.4 Classe CognateSet
La classe CognateSet (voir Figure 5) est un conteneur d’ensembles d’un ou plusieurs éléments Cognate
et de zéro ou plusieurs éléments Bibliography (voir l’ISO 24613-2). La classe CognateSet est une
construction liée à l’onomasiologie. Elle contient des éléments issus de langages apparentés à celui d’une
classe LexicalEntry donnée (et donc liés par la relation de sous-classe de tout Etymon ou Cognate donné)
et qui ont été collectés dans le but de démontrer des similarités ou des dissimilarités linguistiques de
type essentiel. L’utilisation d’une classe CognateSet implique que les instances LexicalEntry (et donc
Etymon et Cognate) qu’elle contient partagent une source étymologique.
Figure 5 — CognateSet
4.5 Classe Date
Les composants d’une classe LexicalEntry et de ses sous-classes doivent être associés à une date
spécifique en utilisant la classe Date. De plus, cette classe Date permet de spécifier un nombre de degrés
de précision. Une année précise, et potentiellement un mois et un jour donnés, doivent être spécifiés
en utilisant l’attribut date et une date approximative avec l’attribut circa. Sur un laps de temps donné
avec différents niveaux de spécificité, il est possible d’utiliser un ou plusieurs attributs de date. Lorsque
le laps de temps est connu (ou évalué), ses bornes inférieure et supérieure peuvent être précisées en
utilisant respectivement notBefore et notAfter. Pour les formats de date et d’heure, l’ISO 8601-1 et
l’ISO 8601-2 doivent être utilisées.
4.6 Classe Gloss
La classe Gloss (voir Figure 6) représente une description textuelle de la signification d’un mot ou d’un
syntagme qui est destiné à être compris par l’homme. Les individus de cette classe peuvent représenter
des paraphrases ou des synonymes qui peuvent être rédigés dans la langue de l’entrée ou dans une
autre langue. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec cette classe.
4 © ISO 2021 – Tous droits réservés

---------------------- Page: 8 ----------------------
ISO 24613-3:2021(F)

Figure 6 — Gloss
Tableau 1 — Exemple d’affectation de classes
Nom de la classe Exemple d’attributs
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss x m l: l a n g
© ISO 2021 – Tous droits réservés 5

---------------------- Page: 9 ----------------------
ISO 24613-3:2021(F)

Annexe A
(informative)

Exemples de typologies étymologiques possibles
A.1 Exemple d’héritage simple
L’exemple de la Figure A.1 décrit l’héritage d’une entrée lexicale provenant d’une langue parente,
dans ce cas l’adverbe sarde sempe, qui provient du mot latin semper. Le mot sarde est lié à une seule
instance Etymology qui est associée au type inheritance (voir l’Annexe B pour une définition de ce type).
L’instance Etymology est ensuite associée à un individu de type EtyLink qui représente le processus
d’évolution de l’étymon latin jusqu’à l’entrée lexicale sarde.
Figure A.1 — Schéma d’héritage en sarde
A.2 Exemple de processus étymologique diachronique (héritage avec
changement phonologique)
Dans l’exemple suivant, l’évolution du mot signifiant «vin» (ipa: [veŋ]) dans la variété de vins italiens
Emiliano est suivie en utilisant une série d’étymons qui sont triés et mutuellement liés en utilisant des
instances EtyLink. Ce processus mène du latin vulgaire vinu jusqu’au prédécesseur immédiat du mot
sous sa forme actuelle. Ces liens individuels sont accessibles via un individu Etymology (avec le type
inheritance) qui représente l’historique de LexicalEntry. Le tri des instances EtyLink est effectué au
moyen des deux attributs prev et next. Ces attributs ne sont pas représentés sur la Figure A.2 pour des
raisons de lisibilité, mais leur utilisation est indiquée sur la Figure A.6, pour l’exemple de A.6, et sur la
Figure A.10, pour l’exemple de A.8.
6 © ISO 2021 – Tous droits réservés

---------------------- Page: 10 ----------------------
ISO 24613-3:2021(F)

Figure A.2 — Schéma d’un héritage en plusieurs étapes et d’un changement phonologique en
bolonais
A.3 Exemple d’héritage de mot-forme
La dérivation des formes singulière et plurielle du nom portugais naçao ’nation’, naçao (sg) et nações
(pl), respectivement, dérivé de deux formes d’un nom en latin vulgaire (LV), nātiōnem (sg, acc), nātiōnes
(pl, acc) est décrite sur la Figure A.3. Dans ce cas, comme les formes portugaises concernées sont
toutes deux au pluriel et au singulier, l’étymon a deux instances WordForm correspondant chacune à
un nombre grammatical. Il faut notamment remarquer l’association des attributs grammaticalCase et
inflectionType à la sous-classe WordForm d’Etymon via une instance GrammaticalInformation. Dans
un lexique de portugais détaillé qui contiendrait de telles informations étymologiques pour un nombre
suffisant d’entrées lexicales, il serait possible, en opposant le contenu de l’instance WordForm dans
LexicalEntry à la classe Etymon, de percevoir les phénomènes langagiers suivants: 1) perte du cas
grammatical portugais; 2) la grande majorité de ses noms découlent du cas accusatif en LV; 3) lorsque la
terminaison singulière (accusatif) en LV était -tiōnem, la forme portugaise s’écrit «-çao» et se prononce
[sɐ̃w]; lorsque la terminaison pluriel (accusatif) en LV était -tiōnes, la forme portugaise s’écrit «-ções»
et se prononce [sõj̃s].
NOTE Cette étymologie peut être davantage articulée en ajoutant les types de processus phonologiques
pour chaque phase de la diachronie. Cela peut être effectué dans le modèle en ajoutant la catégorie de données
appropriée définie dans l’Annexe B, à la valeur de type d’EtyLink pour les phases indiquées.
© ISO 2021 – Tous droits réservés 7

---------------------- Page: 11 ----------------------
ISO 24613-3:2021(F)

Figure A.3 — Schéma d’héritage de flexions portugaises
A.4 Exemple de glissement sémantique métaphorique
L’exemple de la Figure A.4 de Mixtepec-Mixtec illustre la dérivation d’un premier sens à un autre par
métaphore. Dans ce cas, le mot se traduisant par ’rein’ découlait d’une extension métaphorique du mot
signifiant ’haricot’ ntuchi. L’étymologie est liée au sens rénal du mot ntuchi. L’instance EtyLink contient
les attributs source et target avec les valeurs id des composants respectifs spécifiant la directionnalité
du processus. Un trait déterminant du processus de métaphore est la nécessité d’un changement dans le
domaine sémantique entre les sens source et cible, chaque sens ayant un domaine spécifié dans le champ
de sujet. La classe CrossREF définie dans l’ISO 24613-2 est utilisée pour spécifier un URI correspondant
à l’entrée dbpedia pour chaque sens.
Figure A.4 — Schéma d’un glissement de sens métaphorique en Mixtepec-Mixtec
8 © ISO 2021 – Tous droits réservés

---------------------- Page: 12 ----------------------
ISO 24613-3:2021(F)

A.5 Exemple d’emprunt et de formation de mots composés
La Figure A.5 présente un exemple d’emprunt étymologique complexe: l’emprunt du mot français
pamplemousse ’grapefruit’ à un mot hollandais (voir les références [4] et [18]). En plus de l’emprunt,
l’étymologie montre la diachronie du processus de formation de mots qui a eu lieu en hollandais où le
composé pompelmousse a été formé à partir des étymons pompel et limoes, lequel composé ayant ensuite
été emprunté en français. Une instance CrossREF (voir l’ISO 24613-2) est utilisée pour spécifier que
l’étymon emprunté est un mot composé en hollandais et que ses composants sont triés. Les composants
du mot composé sont représentés sous forme de classes Etymon contenant les formes essentielles telles
que Lemma.
Figure A.5 — Schéma d’emprunt et de formation du mot composé français pamplemousse
A.6 Exemple d’utilisation d’informations temporelles
L’exemple de la Figure A.6 présente le codage de l’étymologie de l’élément lexical français chef en
mettant l’accent sur les phases diachroniques et la chronologie de son évolution phonétique (voir la
référence [13]). Dans ce cas, au niveau Etymology, le processus spécifiant le fait que chef est hérité du
latin est codé dans l’attribut type qui prend la valeur «inheritance». Au sein du niveau Etymology, il
existe une série d’éléments Etymon contenant les attributs temporels notBefore et notAfter qui sont
utilisés pour spécifier les intervalles de temps, et le classement séquentiel sous-entend qu’une forme
donnée de l’élément aurait été utilisée. Il faut également noter le tri des éléments Etymon via l’utilisation
des attributs prev et next sur les individus EtyLink concernés.
© ISO 2021 – Tous droits réservés 9

---------------------- Page: 13 ----------------------
ISO 24613-3:2021(F)

Figure A.6 — Schéma des changements phonologiques en plusieurs phases du mot français chef
A.7 Ensemble de cognats avec bibliographie
La Figure A.7 illustre un exemple d’ensemble de cognats extrait d’une source réelle (dictionnaire
étymologique) (voir la référence [19]). Plusieurs formes de cognats dans des langues connexes sont
indiquées en vue d’une comparaison à l’élément lexical girl. Bien que LexicalEntry fasse intrinsèquement
partie de CognateSet, cette relation est hypothétique et n’est donc pas représentée en tant que classe
Cognate dans le codage.
Un point important à remarquer dans cette entrée est le fait que l’entrée lexicale girl, dont le sens anglais
contemporain est donné dans la définition de l’entrée principale, était autrefois utilisée pour désigner
une jeune personne de sexe masculin ou féminin, et a donc subi un glissement sémantique au cours
de sa longue période d’utilisation. Chaque cognat est associé à des instances Sense et Definition qui
décrivent conjointement des entrées lexicales rattachées dans les langues germaniques, toutes étant
supposées héritées d’une source commune.
10 © ISO 2021 – Tous droits réservés

---------------------- Page: 14 ----------------------
ISO 24613-3:2021(F)

Figure A.7 — Schéma d’un ensemble de cognats associé à l’entrée anglaise girl
A.8 Exemple étymologique complet
Afin de démontrer l’expressivité du modèle actuel et sa capacité à traiter des exemples réalistes de
données étymologiques, les informations étymologiques d’une partie d’une seule entrée, à savoir
[6]
l’entrée du mot latin amārus ‘bitter’ sur la Figure A.8 , sont modélisées sur les Figures A.9 et A.10.
L’entrée se compose d’une liste de deux des mots dérivés ainsi que d’une série d’étymons et de cognats
associés. L’entrée contient également un paragraphe de commentaires textuels sur l’étymologie
d’amārus (le paragraphe qui débute par ‘The suffix –arus’…); ce commentaire n’est pas modélisé
sous forme de données structurées dans cet exemple. Le texte d’origine de l’entrée est illustré sur la
Figure A.8:
© ISO 2021 – Tous droits réservés 11

---------------------- Page: 15 ----------------------
ISO 24613-3:2021(F)

[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.8 — Exemple d’entrée du mot latin amārus
La modélisation de cette entrée s’organise en deux parties. La première est un schéma (Figure A.9)
qui détaille la modélisation LMF des deux premières lignes de l’entrée (représentée sur la Figure A.8):
les lignes contenant le lemme, la signification et la catégorie grammaticale de l’entrée lexicale, ainsi
que ses dérivés. Dans la modélisation LMF de l’entrée, l’entrée lexicale amārus est rattachée à ses
dérivés en utilisant l’élément CrossREF (défini dans l’ISO 24613-2). CrossREF est utilisé comme un
moyen de modélisation de l’attestation d’une dérivation ou d’un sens particulier de l’entrée. Les
individus CrossREF en question sont considérés comme des attestations en utilisant l’attribut type, et
en ajoutant toute information supplémentaire (dans ce cas, la présence du signe plus après la citation
bibliographique dans la référence [6] signifie que l’usage linguistique en question peut également
apparaître dans des sources ultérieures) sous forme de texte à l’aide de l’attribut note. Dans cet exemple,
l’élément bibliographique se réfère à un seul auteur (et donc à son corpus d’ouvrages) et l’attribut date
se rapporte à toute sa durée de vie.
12 © ISO 2021 – Tous droits réservés

---------------------- Page: 16 ----------------------
ISO 24613-3:2021(F)

[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.9 — Partie 1 de l’entrée amārus — Lemmes, dérivés et bibliographies
La seconde partie de l’exemple (Figure A.10) détaille la modélisation des informations étymologiques
en elles-mêmes (pour faciliter la lecture, les éléments de chacun des deux ensembles de cognats sont
repérés par deux couleurs différentes: cyan et violet). Les cognats sont chacun représentés comme
appartenant à l’un des deux éléments CognateSet et toute l’étymologie est attribuée à la source en
associant la classe Bibliography à l’élément Etymology pour LexicalEntry. Deux autres éléments
Etymology sont utilisés pour décrire les étymologies des cognats inclus dans l’élément CognateSet.
L’attribut rootType est utilisé pour indiquer que la forme de l’étymon est reconstruite, ainsi que
l’attribut status pour indiquer un degré d’incertitude.
© ISO 2021 – Tous droits réservés 13

---------------------- Page: 17 ----------------------
ISO 24613-3:2021(F)

[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.10 — Partie 2 de l’entrée amārus — Cognats, étymologie et bibliographie
A.9 Simplification du modèle
Plusieurs exemples de la présente annexe illustrent différentes méthodes de simplification d’un modèle.
Voir l’ISO 24613-1:2019, 5.5 (Méthodes de sélection des catégories de données et de création de sous-
classes), et en particulier le paragraphe 5.5.6 (Principes de simplification d’un modèle), pour obtenir
une description normative du processus de simplification.
14 © ISO 2021 – Tous droits réservés

---------------------- Page: 18 ----------------------
ISO 24613-3:2021(F)

Annexe B
(normative)

Catégories de données pour la description étymologique
La présente annexe fournit une liste structurée de catégories qui doivent être utilisées comme des
valeurs possibles pour décrire des processus étymologiques, incluant des v
...

SLOVENSKI STANDARD
oSIST ISO/FDIS 24613-3:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:
Extension étymologique
Ta slovenski standard je istoveten z: ISO/FDIS 24613-3
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/FDIS 24613-3:2021 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/FDIS 24613-3:2021

---------------------- Page: 2 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24613-3
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Lexical markup framework (LMF) —
Voting begins on:
2020-12-24
Part 3:
Voting terminates on:
Etymological extension
2021-02-18
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24613-3:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO 2020

---------------------- Page: 3 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved

---------------------- Page: 4 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
© ISO 2020 – All rights reserved iii

---------------------- Page: 5 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1, ISO 24613-2, ISO 24613-4 and ISO 24613-5,
cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2020 – All rights reserved

---------------------- Page: 6 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24613-3:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2020 – All rights reserved 1

---------------------- Page: 7 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
1)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
1) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2020 – All rights reserved

---------------------- Page: 8 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
© ISO 2020 – All rights reserved 3

---------------------- Page: 9 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2020 – All rights reserved

---------------------- Page: 10 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2020 – All rights reserved 5

---------------------- Page: 11 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Annex A
(informative)

Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the Example given in A.6, and in
Figure A.10, for the Example given in A.8.
6 © ISO 2020 – All rights reserved

---------------------- Page: 12 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
© ISO 2020 – All rights reserved 7

---------------------- Page: 13 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2020 – All rights reserved

---------------------- Page: 14 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information
The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a
focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein
at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded
in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of
Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the
spans of time, and the sequential ordering asserted is that a given form of the item would have been in
use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate
EtyLink individuals.
© ISO 2020 – All rights reserved 9

---------------------- Page: 15 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography
Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source
(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical
item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed
and thus it is not represented as Cognate in the encoding.
A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English
sense is given in the Definition for the main entry, was used early on to denote a young person regardless
of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate
has an associated Sense and a Definition instance which together describe related lexical entries in
Germanic languages, all of which are hypothesised as having been inherited from a common source.
10 © ISO 2020 – All rights reserved

---------------------- Page: 16 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example
In order to demonstrate some of the expressivity of the current model and its potential for dealing with
realistic examples of etymological data, the etymological information of part of a single entry, namely
[6]
the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.
The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and
cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that
is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data
in the example. The original text of the entry is shown in Figure A.8.
© ISO 2020 – All rights reserved 11

---------------------- Page: 17 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus
The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details
the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the
lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling
of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in
ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of
the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and
with any additional information (in this case the presence of the plus sign after the bibliographic citation
in Reference [6] means that the linguistic usage in question can also be found in later sources) added as
text using the note attribute. In this instance, the bibliographic element refers to a single author (and
therefore to their corpus of works) and the date attribute refers to the whole period encompassing their
lifespan.
12 © ISO 2020 – All rights reserved

---------------------- Page: 18 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies
The second part of the example (Figure A.10) details the modelling of the etymological information
itself (for ease of reading the elements of each of the two cognate sets are coloured using two different
colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements
and the entire etymology is attributed to the source by associating the Bibliography with the Etymology
element for the Lexical Entry. Two other Etymology elements are used to describe etymologies for
Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the
etymon is reconstructed, and the attribute status to indicate a level of uncertainty.
© ISO 2020 – All rights reserved 13

---------------------- Page: 19 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification
Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,
5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for
model simplification), for a normative description of the simplification process.
14 © ISO 2020 – All rights reserved

---------------------- Page: 20 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Annex B
(normative)

Data categories for etymology description
This annex contains a structured list of categories that shall be used as possible values for describing
etymological processes, including sound changes as well as
...

FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24613-3
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Lexical markup framework (LMF) —
Voting begins on:
2020-12-24
Part 3:
Voting terminates on:
Etymological extension
2021-02-18
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24613-3:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO 2020

---------------------- Page: 1 ----------------------
ISO/FDIS 24613-3:2020(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/FDIS 24613-3:2020(E)

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
© ISO 2020 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/FDIS 24613-3:2020(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1, ISO 24613-2, ISO 24613-4 and ISO 24613-5,
cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2020 – All rights reserved

---------------------- Page: 4 ----------------------
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24613-3:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2020 – All rights reserved 1

---------------------- Page: 5 ----------------------
ISO/FDIS 24613-3:2020(E)

3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
1)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
1) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2020 – All rights reserved

---------------------- Page: 6 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
© ISO 2020 – All rights reserved 3

---------------------- Page: 7 ----------------------
ISO/FDIS 24613-3:2020(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2020 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2020 – All rights reserved 5

---------------------- Page: 9 ----------------------
ISO/FDIS 24613-3:2020(E)

Annex A
(informative)

Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the Example given in A.6, and in
Figure A.10, for the Example given in A.8.
6 © ISO 2020 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
© ISO 2020 – All rights reserved 7

---------------------- Page: 11 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2020 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/FDIS 24613-3:2020(E)

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information
The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a
focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein
at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded
in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of
Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the
spans of time, and the sequential ordering asserted is that a given form of the item would have been in
use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate
EtyLink individuals.
© ISO 2020 – All rights reserved 9

---------------------- Page: 13 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography
Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source
(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical
item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed
and thus it is not represented as Cognate in the encoding.
A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English
sense is given in the Definition for the main entry, was used early on to denote a young person regardless
of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate
has an associated Sense and a Definition instance which together describe related lexical entries in
Germanic languages, all of which are hypothesised as having been inherited from a common source.
10 © ISO 2020 – All rights reserved

---------------------- Page: 14 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example
In order to demonstrate some of the expressivity of the current model and its potential for dealing with
realistic examples of etymological data, the etymological information of part of a single entry, namely
[6]
the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.
The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and
cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that
is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data
in the example. The original text of the entry is shown in Figure A.8.
© ISO 2020 – All rights reserved 11

---------------------- Page: 15 ----------------------
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus
The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details
the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the
lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling
of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in
ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of
the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and
with any additional information (in this case the presence of the plus sign after the bibliographic citation
in Reference [6] means that the linguistic usage in question can also be found in later sources) added as
text using the note attribute. In this instance, the bibliographic element refers to a single author (and
therefore to their corpus of works) and the date attribute refers to the whole period encompassing their
lifespan.
12 © ISO 2020 – All rights reserved

---------------------- Page: 16 ----------------------
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies
The second part of the example (Figure A.10) details the modelling of the etymological information
itself (for ease of reading the elements of each of the two cognate sets are coloured using two different
colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements
and the entire etymology is attributed to the source by associating the Bibliography with the Etymology
element for the Lexical Entry. Two other Etymology elements are used to describe etymologies for
Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the
etymon is reconstructed, and the attribute status to indicate a level of uncertainty.
© ISO 2020 – All rights reserved 13

---------------------- Page: 17 ----------------------
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification
Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,
5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for
model simplification), for a normative description of the simplification process.
14 © ISO 2020 – All rights reserved

---------------------- Page: 18 ----------------------
ISO/FDIS 24613-3:2020(E)

Annex B
(normative)

Data categories for etymology description
This annex contains a structured list of categories that shall be used as possible values for describing
etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for
each category, its name, a camel case representation, a conceptual domain (for complex data categories),
a definition (with its source(s)), one or several examples (with their source(s) when applicable), as well
as some usage notes where needed.
Where applicable, the camel case form should be used in utilizing the features as a value for the attribute
type in declaring the process of an Etymology or EtyLink.
NOTE These features are not exhaustive and do not account for all the possible theoretical super- and sub-
typologies and variant definitions or descriptions.
Borrowing
Camel case code: borrowing
Definition: The process in which a lexical item, phrase or other linguistic feature from a foreign
language or dialect is introduced into a given language or dialect.
Source: Reference [4].
Note: The lexical items which result from borrowing are loanwords.
Example: Russian картошка is a loanword from High German Kartoffel.
Alternate term: loaning, importing, transferring, copying
Calque
Camel case code: calque
Definition: A loanword in which only the meaning is borrowed with the term being literally
translated into the borrowing language.
Source: Reference [5].
Example: Spanish rascacielos (rasca 'scratch', 'scrape' + cielos 'sky') from English skyscraper
Alternate term: loan translation, semantic loan
Inheritance
Camel case code: inheritance
Definition: Inheritance, although often not regarded as an etymological process in itself, is used
to identify sit
...

SLOVENSKI STANDARD
oSIST ISO/FDIS 24613-3:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:
Extension étymologique
Ta slovenski standard je istoveten z: ISO/FDIS 24613-3
ICS:
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/FDIS 24613-3:2021 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/FDIS 24613-3:2021

---------------------- Page: 2 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24613-3
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Lexical markup framework (LMF) —
Voting begins on:
2020-12-24
Part 3:
Voting terminates on:
Etymological extension
2021-02-18
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24613-3:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO 2020

---------------------- Page: 3 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved

---------------------- Page: 4 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Contents Page
Foreword .iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The LMF etymology extension . 2
4.1 The Cognate class and the Etymon class . 2
4.2 The Etymologizable class . 2
4.3 The Etymology class and the EtyLink class . 3
4.4 The CognateSet class . 4
4.5 The Date class . 4
4.6 The Gloss class. 4
Annex A (informative) Examples of possible etymological typologies . 6
Annex B (normative) Data categories for etymology description .15
Bibliography .22
© ISO 2020 – All rights reserved iii

---------------------- Page: 5 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-3, together with ISO 24613-1, ISO 24613-2, ISO 24613-4 and ISO 24613-5,
cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2020 – All rights reserved

---------------------- Page: 6 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24613-3:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope
This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of
detailed descriptions of common etymological phenomena and/or diachronic information with respect
to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such
an extension as well as the relevant data categories.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules
ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate
form in a related language which shares a common etymological origin as a form in the language of
the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)
Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical
entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2020 – All rights reserved 1

---------------------- Page: 7 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

3.4
etymon
lexical entry from which another lexical entry is derived
Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.
3.5
onomasiology
approach to the investigation of word meaning which takes a given concept as a starting point and
studies the different lexical items in a language or languages that are used to refer to it
4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class
Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see
1)
Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the
purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon
or Cognate can be assigned a language which is different from the language of the lexicon as a whole
(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry
Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at
least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it
is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances
of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact
of being a root and the type of the root in question shall be specified using the attribute rootType.
In the case of reconstructed roots or other word forms, the attribute status serves to associate the
element with a written description of the likelihood of its having been in use (see the example in A.8).
See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class
The Etymologizable class provides a means of referring to the set of linguistic elements that can have
etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes
of elements which can have etymologies can be easily extended in the future wherever the necessity
arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,
Sense, Form and CognateSet (see 4.4).
1) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this
document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.
2 © ISO 2020 – All rights reserved

---------------------- Page: 8 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class
The Etymology class allows for the description of the etymology of a linguistic element. More specifically,
it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.
The type or types of etymological process involved in a given etymology can be specified using the type
attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be
further specified). Possible values for type and subtype can vary according to the theoretical approach
adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of
nested Etymology instances allows a combination of etymological processes to be described. Examples
of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word
formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,
metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,
hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement
to the appropriate classes. Individual links between two elements in an etymology can also be given
a type, see the description of EtyLink below. Given that an Etymology instance can be taken from
an external source, it can be associated with a Bibliography instance, which shall be defined as per
ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class
Instances of Etymology are associated with one or more EtyLink instances, each of which represents a
single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together
individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class
as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given
lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable
class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,
using the attributes prev and next. Instances of the EtyLink class can further specify additional
temporal relationships using various temporal attributes associated with the source and target of each
EtyLink instance.
Figure 4 — EtyLink
© ISO 2020 – All rights reserved 3

---------------------- Page: 9 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at
least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and
EtyLink classes.
4.4 The CognateSet class
The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or
more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its
contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass
relation of any given Etymon or Cognate) and which have been gathered together with the purpose
of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet
implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an
etymological source.
Figure 5 — CognateSet
4.5 The Date class
The components of a LexicalEntry and its subclasses shall be associated with a specific date by making
use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A
precise year, and potentially month and day, shall be stated using the date attribute and a rough date
with the attribute circa. Within a span of time with different levels of specificity, there is the possibility
of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper
ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,
ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class
The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase
that is intended for human consumption. Individuals of the class can either represent paraphrases or
synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of
attributes to be used with this class.
4 © ISO 2020 – All rights reserved

---------------------- Page: 10 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2020 – All rights reserved 5

---------------------- Page: 11 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Annex A
(informative)

Examples of possible etymological typologies
A.1 Example of simple inheritance
The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this
case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is
linked to a single Etymology instance which is associated with the type inheritance (see Annex B for
a definition of this type). The Etymology is then associated with an individual of type EtyLink which
represents the process of change from the Latin etymon to the Sardinian lexical entry.
Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)
In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano
variety of Italian is traced back using a series of Etymons which are ordered and linked together using
EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in
its current manifestation. These individual links can be accessed through an Etymology individual
(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks
is implemented by means of the two attributes prev and next. These attributes are not displayed in
Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the Example given in A.6, and in
Figure A.10, for the Example given in A.8.
6 © ISO 2020 – All rights reserved

---------------------- Page: 12 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese
A.3 Example of word form inheritance
The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)
and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),
nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the
plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note
in particular the association of grammaticalCase and inflectionType attributes with the WordForm
of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that
contained such etymological information for a sufficient number of lexical entries, it would be possible,
by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the
following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its
nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the
Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is
-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].
NOTE This etymology can be further articulated by adding the phonological process types for each stage
of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the
value of type on the EtyLink for the given stages.
© ISO 2020 – All rights reserved 7

---------------------- Page: 13 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift
The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via
metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension
of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.
The EtyLink instance contains the attributes source and target with the id values of the respective
components specifying the directionality of the process. As a defining feature of the process of metaphor
is that there has to be a change in the semantic domain between the source and target sense, each sense
has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify
a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2020 – All rights reserved

---------------------- Page: 14 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

A.5 Example of borrowing and compounding
In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse
‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the
etymology shows the diachrony of the word formation process that occurred within Dutch in which
the compound pompelmousse was formed from the etymons pompel and limoes, with the former
subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that
the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of
the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information
The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a
focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein
at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded
in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of
Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the
spans of time, and the sequential ordering asserted is that a given form of the item would have been in
use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate
EtyLink individuals.
© ISO 2020 – All rights reserved 9

---------------------- Page: 15 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography
Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source
(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical
item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed
and thus it is not represented as Cognate in the encoding.
A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English
sense is given in the Definition for the main entry, was used early on to denote a young person regardless
of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate
has an associated Sense and a Definition instance which together describe related lexical entries in
Germanic languages, all of which are hypothesised as having been inherited from a common source.
10 © ISO 2020 – All rights reserved

---------------------- Page: 16 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example
In order to demonstrate some of the expressivity of the current model and its potential for dealing with
realistic examples of etymological data, the etymological information of part of a single entry, namely
[6]
the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.
The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and
cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that
is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data
in the example. The original text of the entry is shown in Figure A.8.
© ISO 2020 – All rights reserved 11

---------------------- Page: 17 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus
The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details
the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the
lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling
of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in
ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of
the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and
with any additional information (in this case the presence of the plus sign after the bibliographic citation
in Reference [6] means that the linguistic usage in question can also be found in later sources) added as
text using the note attribute. In this instance, the bibliographic element refers to a single author (and
therefore to their corpus of works) and the date attribute refers to the whole period encompassing their
lifespan.
12 © ISO 2020 – All rights reserved

---------------------- Page: 18 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies
The second part of the example (Figure A.10) details the modelling of the etymological information
itself (for ease of reading the elements of each of the two cognate sets are coloured using two different
colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements
and the entire etymology is attributed to the source by associating the Bibliography with the Etymology
element for the Lexical Entry. Two other Etymology elements are used to describe etymologies for
Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the
etymon is reconstructed, and the attribute status to indicate a level of uncertainty.
© ISO 2020 – All rights reserved 13

---------------------- Page: 19 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification
Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,
5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for
model simplification), for a normative description of the simplification process.
14 © ISO 2020 – All rights reserved

---------------------- Page: 20 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Annex B
(normative)

Data categories for etymology description
This annex contains a structured list of categories that shall be used as possible values for describing
etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for
each category, its name, a camel case r
...

PROJET
NORME ISO/FDIS
FINAL
INTERNATIONALE 24613-3
ISO/TC 37/SC 4
Gestion des ressources
Secrétariat: KATS
linguistiques — Cadre de balisage
Début de vote:
2020-12-24 lexical (LMF) —
Vote clos le:
Partie 3:
2021-02-18
Extension étymologique
Language resource management — Lexical markup framework
(LMF) —
Part 3: Etymological extension
LES DESTINATAIRES DU PRÉSENT PROJET SONT
INVITÉS À PRÉSENTER, AVEC LEURS OBSER-
VATIONS, NOTIFICATION DES DROITS DE PRO-
PRIÉTÉ DONT ILS AURAIENT ÉVENTUELLEMENT
CONNAISSANCE ET À FOURNIR UNE DOCUMEN-
TATION EXPLICATIVE.
OUTRE LE FAIT D’ÊTRE EXAMINÉS POUR
ÉTABLIR S’ILS SONT ACCEPTABLES À DES FINS
INDUSTRIELLES, TECHNOLOGIQUES ET COM-
Numéro de référence
MERCIALES, AINSI QUE DU POINT DE VUE
ISO/FDIS 24613-3:2020(F)
DES UTILISATEURS, LES PROJETS DE NORMES
INTERNATIONALES DOIVENT PARFOIS ÊTRE
CONSIDÉRÉS DU POINT DE VUE DE LEUR POSSI-
BILITÉ DE DEVENIR DES NORMES POUVANT
SERVIR DE RÉFÉRENCE DANS LA RÉGLEMENTA-
©
TION NATIONALE. ISO 2020

---------------------- Page: 1 ----------------------
ISO/FDIS 24613-3:2020(F)

DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2020
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2020 – Tous droits réservés

---------------------- Page: 2 ----------------------
ISO/FDIS 24613-3:2020(F)

Sommaire Page
Avant-propos .iv
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Extension étymologique de LMF . 2
4.1 Classes Cognate et Etymon . 2
4.2 Classe Etymologizable . 2
4.3 Classes Etymology et EtyLink . 3
4.4 Classe CognateSet . 4
4.5 Classe Date . 4
4.6 Classe Gloss . 4
Annexe A (informative) Exemples de typologies étymologiques possibles . 6
Annexe B (normative) Catégories de données pour la description étymologique .15
Bibliographie .22
© ISO 2020 – Tous droits réservés iii

---------------------- Page: 3 ----------------------
ISO/FDIS 24613-3:2020(F)

Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www
.iso .org/ directives).
L’attention est attirée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www .iso .org/ brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion
de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.
Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Cette première édition de l’ISO 24613-3, conjointement avec l’ISO 24613-1, l’ISO 24613-2, l’ISO 24613-4
et l’ISO 24613-5, annule et remplace l’ISO 24613:2008, qui a fait l’objet d’une révision technique.
Les principales modifications par rapport à l’édition précédente sont les suivantes:
— révision complète du contenu et de sa subdivision en plusieurs parties.
Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www .iso .org/ fr/ members .html.
iv © ISO 2020 – Tous droits réservés

---------------------- Page: 4 ----------------------
PROJET FINAL DE NORME INTERNATIONALE ISO/FDIS 24613-3:2020(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
1 Domaine d’application
Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration
des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques
par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un
métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s’applique (y compris les
éventuels amendements).
ISO 8601-1, Date et heure — Représentations pour l'échange d'information — Partie 1: Règles de base
ISO 8601-2, Date et heure — Représentations pour l'échange d'information — Partie 2: Extensions
ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1:
Modèle de base
ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de
dictionnaire lisible par ordinateur (MRD)
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions de l’ISO 24613-1 ainsi que les suivants,
s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;
— IEC Electropedia: disponible à l’adresse http:// www .electropedia .org/ .
3.1
cognat
dans une langue apparentée, forme qui partage une origine étymologique commune avec une forme
dans la langue du lexique
3.2
étymologisable
qui remplit les conditions requises pour avoir une étymologie (3.3)
Note 1 à l'article: Le terme «étymologisable» se rapporte à une catégorie d’éléments et d’usages lexicaux
(englobant par exemple des entrées lexicales, des sens et des mots-formes).
© ISO 2020 – Tous droits réservés 1

---------------------- Page: 5 ----------------------
ISO/FDIS 24613-3:2020(F)

3.3
étymologie
origine et développement historique de tout aspect d’un élément lexical donné
3.4
étymon
entrée lexicale dont découle une autre entrée lexicale
Note 1 à l'article: Un étymon peut également être une phase antérieure d’un élément lexical.
3.5
onomasiologie
étude sémantique des mots qui, en partant d’un concept donné, examine les différents éléments lexicaux
utilisés dans une ou plusieurs langues pour se référer à ce concept
4 Extension étymologique de LMF
NOTE Voir l’Annexe A pour obtenir des exemples de typologies étymologiques possibles.
4.1 Classes Cognate et Etymon
Cognate et Etymon sont définies comme des sous-classes de la classe LexicalEntry du module de base
1)
LMF (voir Figure 1) . Les deux classes définissent des entrées lexicales qui ont été ajoutées à une
ressource lexicale dans le but de décrire les étymologies d’une ou plusieurs autres entrées lexicales. Aux
instances Etymon ou Cognate peut être assignée une langue qui est différente de celle du lexique dans
son ensemble (cette langue est spécifiée dans la classe LexiconInformation décrite dans l’ISO 24613-1).
Figure 1 — Sous-classes Cognate et Etymon de la classe LexicalEntry
Les individus des sous-classes Etymon et Cognate doivent être en relation d’agrégation avec au moins
un individu de type EtyLink (voir 4.3). Lors de la description des étymologies, il est nécessaire de traiter,
dans certains cas, des instances LexicalEntry (et, de ce fait, également des instances de relation de
sous-classe d’Etymon et de Cognate) qui sont des radicaux, et en particulier des radicaux reconstruits.
Dans ces cas, le fait d’être un radical et le type du radical en question doivent être spécifiés en utilisant
l’attribut rootType. Dans le cas de radicaux reconstruits ou d’autres mots-formes, l’attribut status sert
à associer l’élément à une description écrite de sa probabilité d’avoir été utilisé (voir l’exemple de A.8).
Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec ces deux sous-classes.
4.2 Classe Etymologizable
La classe Etymologizable permet de désigner l’ensemble des éléments linguistiques qui peuvent avoir
des étymologies. En définissant une seule classe englobant tous les éléments «étymologisables» de
ce type, les classes d’éléments qui peuvent avoir des étymologies pourront être facilement étendues
en fonction des besoins. Les classes suivantes sont des sous-types de la classe Etymologizable (voir
Figure 2): LexicalEntry, Sense, Form et CognateSet (voir 4.4).
1) Les schémas du présent document utilisent les codes de couleurs suivants : les classes en jaune sont
introduites dans le présent document, alors que les classes en rose l’avaient été précédemment dans l’ISO 24613-1
et l’ISO 24613-2.
2 © ISO 2020 – Tous droits réservés

---------------------- Page: 6 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure 2 — Classe Etymologizable et ses sous-classes
4.3 Classes Etymology et EtyLink
La classe Etymology permet de décrire l’étymologie d’un élément linguistique. Plus spécifiquement,
elle permet de décrire les éléments linguistiques qui sont des sous-classes de la classe Etymologizable.
Le type ou les types de processus étymologique impliqués dans une étymologie donnée peuvent être
spécifiés en utilisant l’attribut type, et aussi potentiellement l’attribut subtype (si le type de l’étymologie
peut également être précisé). Les valeurs possibles pour les attributs type et subtype peuvent varier
en fonction de l’approche théorique adoptée par le compilateur d’une ressource et/ou de l’orientation
linguistique ou éditoriale de la ressource en question. L’imbrication des instances Etymology permet de
combiner les processus étymologiques à décrire. Les processus étymologiques qui doivent être utilisés
comme valeurs pour les attributs type/subtype peuvent par exemple être: emprunt, héritage; formation
des mots: mots composés, dérivation; glissements de sens: restriction, élargissement, amélioration,
péjoration, métaphore, métonymie; processus phonétiques/phonologiques: assimilation, dissimilation,
épenthèse, métathèse, durcissement, affaiblissement, etc. La liste des catégories de données fournie
à l’Annexe B doit être utilisée en complément des classes appropriées. Un type peut également être
spécifié pour des liens individuels entre deux éléments au sein d’une étymologie (voir la description
d’EtyLink ci-dessous). Comme une instance Etymology peut être extraite d’une source externe, elle
peut être associée à une instance Bibliography, qui doit être définie conformément à l’ISO 24613-2 (voir
Figure 3).
Figure 3 — Classe Etymology
Les instances Etymology sont associées à une ou plusieurs instances EtyLink, chacune d’elles
représentant une seule phase ou étape dans l’étymologie d’un élément lexical donné (voir Figure 4).
EtyLink sert à rattacher des individus appartenant aux sous-classes Etymologizable. EtyLink est une
sous-classe de la classe CrossREF définie dans l’ISO 24613-1. L’utilisation de CrossREF impose d’affecter
des attributs id aux objets cibles représentant le contenu lexical spécifique. L’utilisation de l’attribut
id, en tant que cible, sur un individu de la classe Etymologizable permet de modéliser un tri temporel
séquentiel générique d’éléments multiples, avec les attributs prev et next. Des instances de la classe
EtyLink peuvent également spécifier des relations temporelles supplémentaires en utilisant divers
attributs temporels associés à la source et à la cible de chaque instance EtyLink.
Figure 4 — EtyLink
© ISO 2020 – Tous droits réservés 3

---------------------- Page: 7 ----------------------
ISO/FDIS 24613-3:2020(F)

Les individus des classes Etymon et Cognate (sous-types de LexicalEntry) doivent être associés à au
moins un individu EtyLink. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec les classes
Etymology et EtyLink.
4.4 Classe CognateSet
La classe CognateSet (voir Figure 5) est un conteneur d’ensembles d’un ou plusieurs éléments Cognate
et de zéro ou plusieurs éléments Bibliography (voir l’ISO 24613-2). La classe CognateSet est une
construction liée à l’onomasiologie. Elle contient des éléments issus de langages apparentés à celui d’une
classe LexicalEntry donnée (et donc liés par la relation de sous-classe de tout Etymon ou Cognate donné)
et qui ont été collectés dans le but de démontrer des similarités ou des dissimilarités linguistiques de
type essentiel. L’utilisation d’une classe CognateSet implique que les instances LexicalEntry (et donc
Etymon et Cognate) qu’elle contient partagent une source étymologique.
Figure 5 — CognateSet
4.5 Classe Date
Les composants d’une classe LexicalEntry et de ses sous-classes doivent être associés à une date
spécifique en utilisant la classe Date. De plus, cette classe Date permet de spécifier un nombre de degrés
de précision. Une année précise, et potentiellement un mois et un jour donnés, doivent être spécifiés
en utilisant l’attribut date et une date approximative avec l’attribut circa. Sur un laps de temps donné
avec différents niveaux de spécificité, il est possible d’utiliser un ou plusieurs attributs de date. Lorsque
le laps de temps est connu (ou évalué), ses bornes inférieure et supérieure peuvent être précisées en
utilisant respectivement notBefore et notAfter. Pour les formats de date et d’heure, l’ISO 8601-1 et
l’ISO 8601-2 doivent être utilisées.
4.6 Classe Gloss
La classe Gloss (voir Figure 6) représente une description textuelle de la signification d’un mot ou d’un
syntagme qui est destiné à être compris par l’homme. Les individus de cette classe peuvent représenter
des paraphrases ou des synonymes qui peuvent être rédigés dans la langue de l’entrée ou dans une
autre langue. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec cette classe.
4 © ISO 2020 – Tous droits réservés

---------------------- Page: 8 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure 6 — Gloss
Tableau 1 — Exemple d’affectation de classes
Nom de la classe Exemple d’attributs
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss x m l: l a n g
© ISO 2020 – Tous droits réservés 5

---------------------- Page: 9 ----------------------
ISO/FDIS 24613-3:2020(F)

Annexe A
(informative)

Exemples de typologies étymologiques possibles
A.1 Exemple d’héritage simple
L’exemple de la Figure A.1 décrit l’héritage d’une entrée lexicale provenant d’une langue parente,
dans ce cas l’adverbe sarde sempe, qui provient du mot latin semper. Le mot sarde est lié à une seule
instance Etymology qui est associée au type inheritance (voir l’Annexe B pour une définition de ce type).
L’instance Etymology est ensuite associée à un individu de type EtyLink qui représente le processus
d’évolution de l’étymon latin jusqu’à l’entrée lexicale sarde.
Figure A.1 — Schéma d’héritage en sarde
A.2 Exemple de processus étymologique diachronique (héritage avec
changement phonologique)
Dans l’exemple suivant, l’évolution du mot signifiant «vin» (ipa: [veŋ]) dans la variété de vins italiens
Emiliano est suivie en utilisant une série d’étymons qui sont triés et mutuellement liés en utilisant des
instances EtyLink. Ce processus mène du latin vulgaire vinu jusqu’au prédécesseur immédiat du mot
sous sa forme actuelle. Ces liens individuels sont accessibles via un individu Etymology (avec le type
inheritance) qui représente l’historique de LexicalEntry. Le tri des instances EtyLink est effectué au
moyen des deux attributs prev et next. Ces attributs ne sont pas représentés sur la Figure A.2 pour des
raisons de lisibilité, mais leur utilisation est indiquée sur la Figure A.6, pour l’exemple de A.6, et sur la
Figure A.10, pour l’exemple de A.8.
6 © ISO 2020 – Tous droits réservés

---------------------- Page: 10 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure A.2 — Schéma d’un héritage en plusieurs étapes et d’un changement phonologique en
bolonais
A.3 Exemple d’héritage de mot-forme
La dérivation des formes singulière et plurielle du nom portugais naçao ’nation’, naçao (sg) et nações
(pl), respectivement, dérivé de deux formes d’un nom en latin vulgaire (LV), nātiōnem (sg, acc), nātiōnes
(pl, acc) est décrite sur la Figure A.3. Dans ce cas, comme les formes portugaises concernées sont
toutes deux au pluriel et au singulier, l’étymon a deux instances WordForm correspondant chacune à
un nombre grammatical. Il faut notamment remarquer l’association des attributs grammaticalCase et
inflectionType à la sous-classe WordForm d’Etymon via une instance GrammaticalInformation. Dans
un lexique de portugais détaillé qui contiendrait de telles informations étymologiques pour un nombre
suffisant d’entrées lexicales, il serait possible, en opposant le contenu de l’instance WordForm dans
LexicalEntry à la classe Etymon, de percevoir les phénomènes langagiers suivants: 1) perte du cas
grammatical portugais; 2) la grande majorité de ses noms découlent du cas accusatif en LV; 3) lorsque la
terminaison singulière (accusatif) en LV était -tiōnem, la forme portugaise s’écrit «-çao» et se prononce
[sɐ̃w]; lorsque la terminaison pluriel (accusatif) en LV était -tiōnes, la forme portugaise s’écrit «-ções»
et se prononce [sõj̃s].
NOTE Cette étymologie peut être davantage articulée en ajoutant les types de processus phonologiques
pour chaque phase de la diachronie. Cela peut être effectué dans le modèle en ajoutant la catégorie de données
appropriée définie dans l’Annexe B, à la valeur de type d’EtyLink pour les phases indiquées.
© ISO 2020 – Tous droits réservés 7

---------------------- Page: 11 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure A.3 — Schéma d’héritage de flexions portugaises
A.4 Exemple de glissement sémantique métaphorique
L’exemple de la Figure A.4 de Mixtepec-Mixtec illustre la dérivation d’un premier sens à un autre par
métaphore. Dans ce cas, le mot se traduisant par ’rein’ découlait d’une extension métaphorique du mot
signifiant ’haricot’ ntuchi. L’étymologie est liée au sens rénal du mot ntuchi. L’instance EtyLink contient
les attributs source et target avec les valeurs id des composants respectifs spécifiant la directionnalité
du processus. Un trait déterminant du processus de métaphore est la nécessité d’un changement dans le
domaine sémantique entre les sens source et cible, chaque sens ayant un domaine spécifié dans le champ
de sujet. La classe CrossREF définie dans l’ISO 24613-2 est utilisée pour spécifier un URI correspondant
à l’entrée dbpedia pour chaque sens.
Figure A.4 — Schéma d’un glissement de sens métaphorique en Mixtepec-Mixtec
8 © ISO 2020 – Tous droits réservés

---------------------- Page: 12 ----------------------
ISO/FDIS 24613-3:2020(F)

A.5 Exemple d’emprunt et de formation de mots composés
La Figure A.5 présente un exemple d’emprunt étymologique complexe: l’emprunt du mot français
pamplemousse ’grapefruit’ à un mot hollandais (voir les références [4] et [18]). En plus de l’emprunt,
l’étymologie montre la diachronie du processus de formation de mots qui a eu lieu en hollandais où le
composé pompelmousse a été formé à partir des étymons pompel et limoes, lequel composé ayant ensuite
été emprunté en français. Une instance CrossREF (voir l’ISO 24613-2) est utilisée pour spécifier que
l’étymon emprunté est un mot composé en hollandais et que ses composants sont triés. Les composants
du mot composé sont représentés sous forme de classes Etymon contenant les formes essentielles telles
que Lemma.
Figure A.5 — Schéma d’emprunt et de formation du mot composé français pamplemousse
A.6 Exemple d’utilisation d’informations temporelles
L’exemple de la Figure A.6 présente le codage de l’étymologie de l’élément lexical français chef en
mettant l’accent sur les phases diachroniques et la chronologie de son évolution phonétique (voir la
référence [13]). Dans ce cas, au niveau Etymology, le processus spécifiant le fait que chef est hérité du
latin est codé dans l’attribut type qui prend la valeur «inheritance». Au sein du niveau Etymology, il
existe une série d’éléments Etymon contenant les attributs temporels notBefore et notAfter qui sont
utilisés pour spécifier les intervalles de temps, et le classement séquentiel sous-entend qu’une forme
donnée de l’élément aurait été utilisée. Il faut également noter le tri des éléments Etymon via l’utilisation
des attributs prev et next sur les individus EtyLink concernés.
© ISO 2020 – Tous droits réservés 9

---------------------- Page: 13 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure A.6 — Schéma des changements phonologiques en plusieurs phases du mot français chef
A.7 Ensemble de cognats avec bibliographie
La Figure A.7 illustre un exemple d’ensemble de cognats extrait d’une source réelle (dictionnaire
étymologique) (voir la référence [19]). Plusieurs formes de cognats dans des langues connexes sont
indiquées en vue d’une comparaison à l’élément lexical girl. Bien que LexicalEntry fasse intrinsèquement
partie de CognateSet, cette relation est hypothétique et n’est donc pas représentée en tant que classe
Cognate dans le codage.
Un point important à remarquer dans cette entrée est le fait que l’entrée lexicale girl, dont le sens anglais
contemporain est donné dans la définition de l’entrée principale, était autrefois utilisée pour désigner
une jeune personne de sexe masculin ou féminin, et a donc subi un glissement sémantique au cours
de sa longue période d’utilisation. Chaque cognat est associé à des instances Sense et Definition qui
décrivent conjointement des entrées lexicales rattachées dans les langues germaniques, toutes étant
supposées héritées d’une source commune.
10 © ISO 2020 – Tous droits réservés

---------------------- Page: 14 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure A.7 — Schéma d’un ensemble de cognats associé à l’entrée anglaise girl
A.8 Exemple étymologique complet
Afin de démontrer l’expressivité du modèle actuel et sa capacité à traiter des exemples réalistes de
données étymologiques, les informations étymologiques d’une partie d’une seule entrée, à savoir
[6]
l’entrée du mot latin amārus ‘bitter’ sur la Figure A.8 , sont modélisées sur les Figures A.9 et A.10.
L’entrée se compose d’une liste de deux des mots dérivés ainsi que d’une série d’étymons et de cognats
associés. L’entrée contient également un paragraphe de commentaires textuels sur l’étymologie
d’amārus (le paragraphe qui débute par ‘The suffix –arus’…); ce commentaire n’est pas modélisé
sous forme de données structurées dans cet exemple. Le texte d’origine de l’entrée est illustré sur la
Figure A.8:
© ISO 2020 – Tous droits réservés 11

---------------------- Page: 15 ----------------------
ISO/FDIS 24613-3:2020(F)

[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.8 — Exemple d’entrée du mot latin amārus
La modélisation de cette entrée s’organise en deux parties. La première est un schéma (Figure A.9)
qui détaille la modélisation LMF des deux premières lignes de l’entrée (représentée sur la Figure A.8):
les lignes contenant le lemme, la signification et la catégorie grammaticale de l’entrée lexicale, ainsi
que ses dérivés. Dans la modélisation LMF de l’entrée, l’entrée lexicale amārus est rattachée à ses
dérivés en utilisant l’élément CrossREF (défini dans l’ISO 24613-2). CrossREF est utilisé comme un
moyen de modélisation de l’attestation d’une dérivation ou d’un sens particulier de l’entrée. Les
individus CrossREF en question sont considérés comme des attestations en utilisant l’attribut type, et
en ajoutant toute information supplémentaire (dans ce cas, la présence du signe plus après la citation
bibliographique dans la référence [6] signifie que l’usage linguistique en question peut également
apparaître dans des sources ultérieures) sous forme de texte à l’aide de l’attribut note. Dans cet exemple,
l’élément bibliographique se réfère à un seul auteur (et donc à son corpus d’ouvrages) et l’attribut date
se rapporte à toute sa durée de vie.
12 © ISO 2020 – Tous droits réservés

---------------------- Page: 16 ----------------------
ISO/FDIS 24613-3:2020(F)

[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.9 — Partie 1 de l’entrée amārus — Lemmes, dérivés et bibliographies
La seconde partie de l’exemple (Figure A.10) détaille la modélisation des informations étymologiques
en elles-mêmes (pour faciliter la lecture, les éléments de chacun des deux ensembles de cognats sont
repérés par deux couleurs différentes: cyan et violet). Les cognats sont chacun représentés comme
appartenant à l’un des deux éléments CognateSet et toute l’étymologie est attribuée à la source en
associant la classe Bibliography à l’élément Etymology pour l’entrée lexicale. Deux autres éléments
Etymology sont utilisés pour décrire les étymologies des cognats inclus dans l’élément CognateSet.
L’attribut rootType est utilisé pour indiquer que la forme de l’étymon est reconstruite, ainsi que
l’attribut status pour indiquer un degré d’incertitude.
© ISO 2020 – Tous droits réservés 13

---------------------- Page: 17 ----------------------
ISO/FDIS 24613-3:2020(F)

[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.10 — Partie 2 de l’entrée amārus — Cognats, étymologie et bibliographie
A.9 Simplification du modèle
Plusieurs exemples de la présente annexe illustrent différentes métho
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.