Language resource management -- Lexical markup framework (LMF)

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of detailed descriptions of common etymological phenomena and/or diachronic information with respect to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such an extension as well as the relevant data categories.

Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)

Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del: Etimološka razširitev

General Information

Status
Published
Publication Date
30-Mar-2021
Current Stage
5060 - Close of voting Proof returned by Secretariat
Start Date
19-Feb-2021
Completion Date
18-Feb-2021

RELATIONS

Buy Standard

Standard
ISO 24613-3:2021 - BARVE na PDF-str 8,9,10,11,12,13,14,15,16,17,19,20
English language
26 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Standard
ISO 24613-3:2021 - Language resource management -- Lexical markup framework (LMF)
English language
22 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24613-3:2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)
French language
22 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/FDIS 24613-3:Version 19-dec-2020 - Language resource management -- Lexical markup framework (LMF)
English language
22 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/FDIS 24613-3:2021 - BARVE
English language
26 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Draft
ISO/FDIS 24613-3:2021 - BARVE
English language
26 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Draft
ISO/FDIS 24613-3:Version 22-jan-2021 - Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF)
French language
22 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

SLOVENSKI STANDARD
SIST ISO 24613-3:2021
01-junij-2021
Nadomešča:
SIST ISO 24613:2013
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension

Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:

Extension étymologique
Ta slovenski standard je istoveten z: ISO 24613-3:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24613-3:2021 en,fr

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24613-3:2021
---------------------- Page: 2 ----------------------
SIST ISO 24613-3:2021
INTERNATIONAL ISO
STANDARD 24613-3
First edition
2021-03
Language resource management —
Lexical markup framework (LMF) —
Part 3:
Etymological extension
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
Reference number
ISO 24613-3:2021(E)
ISO 2021
---------------------- Page: 3 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 The LMF etymology extension ................................................................................................................................................................ 2

4.1 The Cognate class and the Etymon class .......................................................................................................................... 2

4.2 The Etymologizable class ............................................................................................................................................................... 2

4.3 The Etymology class and the EtyLink class .................................................................................................................... 3

4.4 The CognateSet class .......................................................................................................................................................................... 4

4.5 The Date class .......................................................................................................................................................................................... 4

4.6 The Gloss class......................................................................................................................................................................................... 4

Annex A (informative) Examples of possible etymological typologies ............................................................................. 6

Annex B (normative) Data categories for etymology description ......................................................................................15

Bibliography .............................................................................................................................................................................................................................22

© ISO 2021 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.

This first edition of ISO 24613-3, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-4:2021

and ISO 24613-5:— , cancels and replaces ISO 24613:2008, which has been technically revised.

The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.

1) Under preparation. Stage at the time of publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24613-3:2021
INTERNATIONAL STANDARD ISO 24613-3:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of

detailed descriptions of common etymological phenomena and/or diachronic information with respect

to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such

an extension as well as the relevant data categories.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules

ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions

ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-

readable dictionary (MRD) model
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate

form in a related language which shares a common etymological origin as a form in the language of

the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)

Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical

entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2021 – All rights reserved 1
---------------------- Page: 7 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
3.4
etymon
lexical entry from which another lexical entry is derived

Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.

3.5
onomasiology

approach to the investigation of word meaning which takes a given concept as a starting point and

studies the different lexical items in a language or languages that are used to refer to it

4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class

Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see

Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the

purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon

or Cognate can be assigned a language which is different from the language of the lexicon as a whole

(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry

Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at

least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it

is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances

of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact

of being a root and the type of the root in question shall be specified using the attribute rootType.

In the case of reconstructed roots or other word forms, the attribute status serves to associate the

element with a written description of the likelihood of its having been in use (see the example in A.8).

See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class

The Etymologizable class provides a means of referring to the set of linguistic elements that can have

etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes

of elements which can have etymologies can be easily extended in the future wherever the necessity

arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,

Sense, Form and CognateSet (see 4.4).

2) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this

document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.

2 © ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class

The Etymology class allows for the description of the etymology of a linguistic element. More specifically,

it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.

The type or types of etymological process involved in a given etymology can be specified using the type

attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be

further specified). Possible values for type and subtype can vary according to the theoretical approach

adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of

nested Etymology instances allows a combination of etymological processes to be described. Examples

of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word

formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,

metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,

hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement

to the appropriate classes. Individual links between two elements in an etymology can also be given

a type, see the description of EtyLink below. Given that an Etymology instance can be taken from

an external source, it can be associated with a Bibliography instance, which shall be defined as per

ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class

Instances of Etymology are associated with one or more EtyLink instances, each of which represents a

single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together

individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class

as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given

lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable

class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,

using the attributes prev and next. Instances of the EtyLink class can further specify additional

temporal relationships using various temporal attributes associated with the source and target of each

EtyLink instance.
Figure 4 — EtyLink
© ISO 2021 – All rights reserved 3
---------------------- Page: 9 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at

least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and

EtyLink classes.
4.4 The CognateSet class

The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or

more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its

contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass

relation of any given Etymon or Cognate) and which have been gathered together with the purpose

of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet

implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an

etymological source.
Figure 5 — CognateSet
4.5 The Date class

The components of a LexicalEntry and its subclasses shall be associated with a specific date by making

use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A

precise year, and potentially month and day, shall be stated using the date attribute and a rough date

with the attribute circa. Within a span of time with different levels of specificity, there is the possibility

of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper

ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,

ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class

The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase

that is intended for human consumption. Individuals of the class can either represent paraphrases or

synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of

attributes to be used with this class.
4 © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2021 – All rights reserved 5
---------------------- Page: 11 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance

The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this

case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is

linked to a single Etymology instance which is associated with the type inheritance (see Annex B for

a definition of this type). The Etymology is then associated with an individual of type EtyLink which

represents the process of change from the Latin etymon to the Sardinian lexical entry.

Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)

In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano

variety of Italian is traced back using a series of Etymons which are ordered and linked together using

EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in

its current manifestation. These individual links can be accessed through an Etymology individual

(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks

is implemented by means of the two attributes prev and next. These attributes are not displayed in

Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the example given in A.6, and in

Figure A.10, for the example given in A.8.
6 © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese

A.3 Example of word form inheritance

The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)

and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),

nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the

plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note

in particular the association of grammaticalCase and inflectionType attributes with the WordForm

of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that

contained such etymological information for a sufficient number of lexical entries, it would be possible,

by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the

following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its

nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the

Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is

-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].

NOTE This etymology can be further articulated by adding the phonological process types for each stage

of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the

value of type on the EtyLink for the given stages.
© ISO 2021 – All rights reserved 7
---------------------- Page: 13 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift

The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via

metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension

of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.

The EtyLink instance contains the attributes source and target with the id values of the respective

components specifying the directionality of the process. As a defining feature of the process of metaphor

is that there has to be a change in the semantic domain between the source and target sense, each sense

has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify

a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2021 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
A.5 Example of borrowing and compounding

In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse

‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the

etymology shows the diachrony of the word formation process that occurred within Dutch in which

the compound pompelmousse was formed from the etymons pompel and limoes, with the former

subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that

the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of

the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information

The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a

focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein

at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded

in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of

Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the

spans of time, and the sequential ordering asserted is that a given form of the item would have been in

use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate

EtyLink individuals.
© ISO 2021 – All rights reserved 9
---------------------- Page: 15 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography

Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source

(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical

item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed

and thus it is not represented as Cognate in the encoding.

A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English

sense is given in the Definition for the main entry, was used early on to denote a young person regardless

of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate

has an associated Sense and a Definition instance which together describe related lexical entries in

Germanic languages, all of which are hypothesised as having been inherited from a common source.

10 © ISO 2021 – All rights reserved
---------------------- Page: 16 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example

In order to demonstrate some of the expressivity of the current model and its potential for dealing with

realistic examples of etymological data, the etymological information of part of a single entry, namely

[6]

the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.

The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and

cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that

is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data

in the example. The original text of the entry is shown in Figure A.8.
© ISO 2021 – All rights reserved 11
---------------------- Page: 17 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus

The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details

the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the

lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling

of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in

ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of

the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and

with any additional information (in this case the presence of the plus sign after the bibliographic citation

in Reference [6] means that the linguistic usage in question can also be found in later sources) added as

text using the note attribute. In this instance, the bibliographic element refers to a single author (and

therefore to their corpus of works) and the date attribute refers to the whole period encompassing their

lifespan.
12 © ISO 2021 – All rights reserved
---------------------- Page: 18 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies

The second part of the example (Figure A.10) details the modelling of the etymological information

itself (for ease of reading the elements of each of the two cognate sets are coloured using two different

colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements

and the entire etymology is attributed to the source by associating the Bibliography with the Etymology

element for the LexicalEntry. Two other Etymology elements are used to describe etymologies for

Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the

etymon is reconstructed, and the attribute status to indicate a level of uncertainty.

© ISO 2021 – All rights reserved 13
---------------------- Page: 19 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification

Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,

5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for

model simplification), for a normative description of the simplification process.

14 © ISO 2021 – All rights reserved
---------------------- Page: 20 ----------------------
SIST ISO 24613-3:2021
ISO 24613-3:2021(E)
Annex B
(normative)
Data categories for etymology description

This annex contains a structured list of categories that shall be used as possible values for describing

etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for

each category, its name, a camel case representation, a conceptual domain (for complex data categories),

a definition (with its source(s)), one or several examples (with their source(s) when applicable), as well

as some usage notes where needed.

Where applicable, the camel case form should be used in utilizing the features as a value for the attribute

type in declaring the process of an Etymology or EtyLink.

NOTE These features are not exhaustive and do not account for all the possible theoretical super- and sub-

typologies and variant definitions or descriptions.
Borrowing
Camel case code: borrowing
Definition: The p
...

INTERNATIONAL ISO
STANDARD 24613-3
First edition
2021-03
Language resource management —
Lexical markup framework (LMF) —
Part 3:
Etymological extension
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
Reference number
ISO 24613-3:2021(E)
ISO 2021
---------------------- Page: 1 ----------------------
ISO 24613-3:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24613-3:2021(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 The LMF etymology extension ................................................................................................................................................................ 2

4.1 The Cognate class and the Etymon class .......................................................................................................................... 2

4.2 The Etymologizable class ............................................................................................................................................................... 2

4.3 The Etymology class and the EtyLink class .................................................................................................................... 3

4.4 The CognateSet class .......................................................................................................................................................................... 4

4.5 The Date class .......................................................................................................................................................................................... 4

4.6 The Gloss class......................................................................................................................................................................................... 4

Annex A (informative) Examples of possible etymological typologies ............................................................................. 6

Annex B (normative) Data categories for etymology description ......................................................................................15

Bibliography .............................................................................................................................................................................................................................22

© ISO 2021 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24613-3:2021(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.

This first edition of ISO 24613-3, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-4:2021

and ISO 24613-5:— , cancels and replaces ISO 24613:2008, which has been technically revised.

The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.

1) Under preparation. Stage at the time of publication: ISO/DIS 24613-5:2020.
iv © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
INTERNATIONAL STANDARD ISO 24613-3:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of

detailed descriptions of common etymological phenomena and/or diachronic information with respect

to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such

an extension as well as the relevant data categories.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules

ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions

ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-

readable dictionary (MRD) model
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate

form in a related language which shares a common etymological origin as a form in the language of

the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)

Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical

entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2021 – All rights reserved 1
---------------------- Page: 5 ----------------------
ISO 24613-3:2021(E)
3.4
etymon
lexical entry from which another lexical entry is derived

Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.

3.5
onomasiology

approach to the investigation of word meaning which takes a given concept as a starting point and

studies the different lexical items in a language or languages that are used to refer to it

4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class

Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see

Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the

purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon

or Cognate can be assigned a language which is different from the language of the lexicon as a whole

(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry

Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at

least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it

is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances

of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact

of being a root and the type of the root in question shall be specified using the attribute rootType.

In the case of reconstructed roots or other word forms, the attribute status serves to associate the

element with a written description of the likelihood of its having been in use (see the example in A.8).

See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class

The Etymologizable class provides a means of referring to the set of linguistic elements that can have

etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes

of elements which can have etymologies can be easily extended in the future wherever the necessity

arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,

Sense, Form and CognateSet (see 4.4).

2) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this

document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.

2 © ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
ISO 24613-3:2021(E)
Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class

The Etymology class allows for the description of the etymology of a linguistic element. More specifically,

it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.

The type or types of etymological process involved in a given etymology can be specified using the type

attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be

further specified). Possible values for type and subtype can vary according to the theoretical approach

adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of

nested Etymology instances allows a combination of etymological processes to be described. Examples

of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word

formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,

metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,

hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement

to the appropriate classes. Individual links between two elements in an etymology can also be given

a type, see the description of EtyLink below. Given that an Etymology instance can be taken from

an external source, it can be associated with a Bibliography instance, which shall be defined as per

ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class

Instances of Etymology are associated with one or more EtyLink instances, each of which represents a

single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together

individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class

as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given

lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable

class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,

using the attributes prev and next. Instances of the EtyLink class can further specify additional

temporal relationships using various temporal attributes associated with the source and target of each

EtyLink instance.
Figure 4 — EtyLink
© ISO 2021 – All rights reserved 3
---------------------- Page: 7 ----------------------
ISO 24613-3:2021(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at

least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and

EtyLink classes.
4.4 The CognateSet class

The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or

more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its

contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass

relation of any given Etymon or Cognate) and which have been gathered together with the purpose

of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet

implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an

etymological source.
Figure 5 — CognateSet
4.5 The Date class

The components of a LexicalEntry and its subclasses shall be associated with a specific date by making

use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A

precise year, and potentially month and day, shall be stated using the date attribute and a rough date

with the attribute circa. Within a span of time with different levels of specificity, there is the possibility

of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper

ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,

ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class

The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase

that is intended for human consumption. Individuals of the class can either represent paraphrases or

synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of

attributes to be used with this class.
4 © ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
ISO 24613-3:2021(E)
Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2021 – All rights reserved 5
---------------------- Page: 9 ----------------------
ISO 24613-3:2021(E)
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance

The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this

case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is

linked to a single Etymology instance which is associated with the type inheritance (see Annex B for

a definition of this type). The Etymology is then associated with an individual of type EtyLink which

represents the process of change from the Latin etymon to the Sardinian lexical entry.

Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)

In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano

variety of Italian is traced back using a series of Etymons which are ordered and linked together using

EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in

its current manifestation. These individual links can be accessed through an Etymology individual

(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks

is implemented by means of the two attributes prev and next. These attributes are not displayed in

Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the example given in A.6, and in

Figure A.10, for the example given in A.8.
6 © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
ISO 24613-3:2021(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese

A.3 Example of word form inheritance

The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)

and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),

nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the

plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note

in particular the association of grammaticalCase and inflectionType attributes with the WordForm

of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that

contained such etymological information for a sufficient number of lexical entries, it would be possible,

by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the

following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its

nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the

Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is

-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].

NOTE This etymology can be further articulated by adding the phonological process types for each stage

of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the

value of type on the EtyLink for the given stages.
© ISO 2021 – All rights reserved 7
---------------------- Page: 11 ----------------------
ISO 24613-3:2021(E)
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift

The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via

metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension

of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.

The EtyLink instance contains the attributes source and target with the id values of the respective

components specifying the directionality of the process. As a defining feature of the process of metaphor

is that there has to be a change in the semantic domain between the source and target sense, each sense

has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify

a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
ISO 24613-3:2021(E)
A.5 Example of borrowing and compounding

In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse

‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the

etymology shows the diachrony of the word formation process that occurred within Dutch in which

the compound pompelmousse was formed from the etymons pompel and limoes, with the former

subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that

the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of

the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information

The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a

focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein

at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded

in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of

Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the

spans of time, and the sequential ordering asserted is that a given form of the item would have been in

use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate

EtyLink individuals.
© ISO 2021 – All rights reserved 9
---------------------- Page: 13 ----------------------
ISO 24613-3:2021(E)
Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography

Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source

(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical

item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed

and thus it is not represented as Cognate in the encoding.

A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English

sense is given in the Definition for the main entry, was used early on to denote a young person regardless

of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate

has an associated Sense and a Definition instance which together describe related lexical entries in

Germanic languages, all of which are hypothesised as having been inherited from a common source.

10 © ISO 2021 – All rights reserved
---------------------- Page: 14 ----------------------
ISO 24613-3:2021(E)
Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example

In order to demonstrate some of the expressivity of the current model and its potential for dealing with

realistic examples of etymological data, the etymological information of part of a single entry, namely

[6]

the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.

The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and

cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that

is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data

in the example. The original text of the entry is shown in Figure A.8.
© ISO 2021 – All rights reserved 11
---------------------- Page: 15 ----------------------
ISO 24613-3:2021(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus

The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details

the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the

lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling

of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in

ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of

the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and

with any additional information (in this case the presence of the plus sign after the bibliographic citation

in Reference [6] means that the linguistic usage in question can also be found in later sources) added as

text using the note attribute. In this instance, the bibliographic element refers to a single author (and

therefore to their corpus of works) and the date attribute refers to the whole period encompassing their

lifespan.
12 © ISO 2021 – All rights reserved
---------------------- Page: 16 ----------------------
ISO 24613-3:2021(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies

The second part of the example (Figure A.10) details the modelling of the etymological information

itself (for ease of reading the elements of each of the two cognate sets are coloured using two different

colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements

and the entire etymology is attributed to the source by associating the Bibliography with the Etymology

element for the LexicalEntry. Two other Etymology elements are used to describe etymologies for

Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the

etymon is reconstructed, and the attribute status to indicate a level of uncertainty.

© ISO 2021 – All rights reserved 13
---------------------- Page: 17 ----------------------
ISO 24613-3:2021(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification

Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,

5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for

model simplification), for a normative description of the simplification process.

14 © ISO 2021 – All rights reserved
---------------------- Page: 18 ----------------------
ISO 24613-3:2021(E)
Annex B
(normative)
Data categories for etymology description

This annex contains a structured list of categories that shall be used as possible values for describing

etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for

each category, its name, a camel case representation, a conceptual domain (for complex data categories),

a definition (with its source(s)), one or several examples (with their source(s) when applicable), as well

as some usage notes where needed.

Where applicable, the camel case form should be used in utilizing the features as a value for the attribute

type in declaring the process of an Etymology or EtyLink.

NOTE These features are not exhaustive and do not account for all the possible theoretical super- and sub-

typologies and variant definitions or descriptions.
Borrowing
Camel case code: borrowing

Definition: The process in which a lexical item, phrase or other linguistic feature from a foreign

language or dialect is introduced into a given language or dialect.
Source: Reference [4].
Note: The lexical items which result from borrowing are loanwords.
Example: Russian картошка is a loanword from High German Kartoffel.
Alternate term: loaning, importing, transferring, copying
Calque
Camel case code: calque

Definition: A loanword in which only the meaning is borrowed with the term being literally

translated into the borrowing language.
Source: Reference [5].

Example: Spanish rascacielos (rasca 'scratch', 'scrape' + cielos 'sky') from English skyscraper

Alternate term: loan translation, semantic loan
Inheritance
Camel case code: inheritance

Definition: Inheritance, although often not regarded as an etymological process in itself, is used

to identify situations where a lexical item is known to, or is presumed to, have been inherited from

a predecessor or “parent” language(s). Items inherited from a predecessor language often undergo

any number of different separate etymological processes over the course of their diachrony.

Source: Reference [4].
Example: Sardinian semper was inherited from Latin semper.
© ISO 2021 – All rights reserved 15
---------------------- Page: 19 ----------------------
ISO 24613-3:2021(E)
Compounding
Camel case code: compounding
Definition: Compounding is the process of creating a new le
...

NORME ISO
INTERNATIONALE 24613-3
Première édition
2021-03
Gestion des ressources
linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
Language resource management — Lexical markup framework
(LMF) —
Part 3: Etymological extension
Numéro de référence
ISO 24613-3:2021(F)
ISO 2021
---------------------- Page: 1 ----------------------
ISO 24613-3:2021(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2021

Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette

publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,

y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut

être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.

ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2021 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO 24613-3:2021(F)
Sommaire Page

Avant-propos ..............................................................................................................................................................................................................................iv

1 Domaine d’application ................................................................................................................................................................................... 1

2 Références normatives ................................................................................................................................................................................... 1

3 Termes et définitions ....................................................................................................................................................................................... 1

4 Extension étymologique de LMF .......................................................................................................................................................... 2

4.1 Classes Cognate et Etymon ........................................................................................................................................................... 2

4.2 Classe Etymologizable ...................................................................................................................................................................... 2

4.3 Classes Etymology et EtyLink .................................................................................................................................................... 3

4.4 Classe CognateSet ................................................................................................................................................................................. 4

4.5 Classe Date .................................................................................................................................................................................................. 4

4.6 Classe Gloss ................................................................................................................................................................................................ 4

Annexe A (informative) Exemples de typologies étymologiques possibles ................................................................. 6

Annexe B (normative) Catégories de données pour la description étymologique ............................................15

Bibliographie ...........................................................................................................................................................................................................................22

© ISO 2021 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO 24613-3:2021(F)
Avant-propos

L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes

nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est

en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude

a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,

gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.

L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui

concerne la normalisation électrotechnique.

Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont

décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents

critères d’approbation requis pour les différents types de documents ISO. Le présent document a été

rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www

.iso .org/ directives).

L’attention est attirée sur le fait que certains des éléments du présent document peuvent faire l’objet de

droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable

de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant

les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de

l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de

brevets reçues par l’ISO (voir www .iso .org/ brevets).

Les appellations commerciales éventuellement mentionnées dans le présent document sont données

pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un

engagement.

Pour une explication de la nature volontaire des normes, la signification des termes et expressions

spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion

de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles

techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.

Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-

comité SC 4, Gestion des ressources linguistiques.

Cette première édition de l’ISO 24613-3, conjointement avec l'ISO 24613-1:2019, l'ISO 24613-2:2020,

l'ISO 24613-4:2021 et l'ISO 24613-5:— , annule et remplace l’ISO 24613:2008, qui a fait l’objet d’une

révision technique.

Les principales modifications par rapport à l’édition précédente sont les suivantes:

— révision complète du contenu et de sa subdivision en plusieurs parties.

Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.

Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent

document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes

se trouve à l’adresse www .iso .org/ fr/ members .html.

1) En cours d’élaboration. Stade au moment de la publication: ISO/DIS 24613-5:2020.

iv © ISO 2021 – Tous droits réservés
---------------------- Page: 4 ----------------------
NORME INTERNATIONALE ISO 24613-3:2021(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
1 Domaine d’application

Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration

des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques

par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un

métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.

2 Références normatives

Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur

contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.

Pour les références non datées, la dernière édition du document de référence s’applique (y compris les

éventuels amendements).

ISO 8601-1, Date et heure — Représentations pour l'échange d'information — Partie 1: Règles de base

ISO 8601-2, Date et heure — Représentations pour l'échange d'information — Partie 2: Extensions

ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1:

Modèle de base

ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de

dictionnaire lisible par ordinateur (MRD)
3 Termes et définitions

Pour les besoins du présent document, les termes et définitions de l’ISO 24613-1 ainsi que les suivants,

s’appliquent.

L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en

normalisation, consultables aux adresses suivantes:

— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;

— IEC Electropedia: disponible à l’adresse http:// www .electropedia .org/ .
3.1
cognat

dans une langue apparentée, forme qui partage une origine étymologique commune avec une forme

dans la langue du lexique
3.2
étymologisable
qui remplit les conditions requises pour avoir une étymologie (3.3)

Note 1 à l'article: Le terme «étymologisable» se rapporte à une catégorie d’éléments et d’usages lexicaux

(englobant par exemple des entrées lexicales, des sens et des mots-formes).
© ISO 2021 – Tous droits réservés 1
---------------------- Page: 5 ----------------------
ISO 24613-3:2021(F)
3.3
étymologie
origine et développement historique de tout aspect d’un élément lexical donné
3.4
étymon
entrée lexicale dont découle une autre entrée lexicale

Note 1 à l'article: Un étymon peut également être une phase antérieure d’un élément lexical.

3.5
onomasiologie

étude sémantique des mots qui, en partant d’un concept donné, examine les différents éléments lexicaux

utilisés dans une ou plusieurs langues pour se référer à ce concept
4 Extension étymologique de LMF

NOTE Voir l’Annexe A pour obtenir des exemples de typologies étymologiques possibles.

4.1 Classes Cognate et Etymon

Cognate et Etymon sont définies comme des sous-classes de la classe LexicalEntry du module de base

LMF (voir Figure 1) . Les deux classes définissent des entrées lexicales qui ont été ajoutées à une

ressource lexicale dans le but de décrire les étymologies d’une ou plusieurs autres entrées lexicales. Aux

instances Etymon ou Cognate peut être assignée une langue qui est différente de celle du lexique dans

son ensemble (cette langue est spécifiée dans la classe LexiconInformation décrite dans l’ISO 24613-1).

Figure 1 — Sous-classes Cognate et Etymon de la classe LexicalEntry

Les individus des sous-classes Etymon et Cognate doivent être en relation d’agrégation avec au moins

un individu de type EtyLink (voir 4.3). Lors de la description des étymologies, il est nécessaire de traiter,

dans certains cas, des instances LexicalEntry (et, de ce fait, également des instances de relation de

sous-classe d’Etymon et de Cognate) qui sont des radicaux, et en particulier des radicaux reconstruits.

Dans ces cas, le fait d’être un radical et le type du radical en question doivent être spécifiés en utilisant

l’attribut rootType. Dans le cas de radicaux reconstruits ou d’autres mots-formes, l’attribut status sert

à associer l’élément à une description écrite de sa probabilité d’avoir été utilisé (voir l’exemple de A.8).

Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec ces deux sous-classes.

4.2 Classe Etymologizable

La classe Etymologizable permet de désigner l’ensemble des éléments linguistiques qui peuvent avoir

des étymologies. En définissant une seule classe englobant tous les éléments «étymologisables» de

ce type, les classes d’éléments qui peuvent avoir des étymologies pourront être facilement étendues

en fonction des besoins. Les classes suivantes sont des sous-types de la classe Etymologizable (voir

Figure 2): LexicalEntry, Sense, Form et CognateSet (voir 4.4).

2) Les schémas du présent document utilisent les codes de couleurs suivants : les classes en jaune sont

introduites dans le présent document, alors que les classes en rose l’avaient été précédemment dans l’ISO 24613-1

et l’ISO 24613-2.
2 © ISO 2021 – Tous droits réservés
---------------------- Page: 6 ----------------------
ISO 24613-3:2021(F)
Figure 2 — Classe Etymologizable et ses sous-classes
4.3 Classes Etymology et EtyLink

La classe Etymology permet de décrire l’étymologie d’un élément linguistique. Plus spécifiquement,

elle permet de décrire les éléments linguistiques qui sont des sous-classes de la classe Etymologizable.

Le type ou les types de processus étymologique impliqués dans une étymologie donnée peuvent être

spécifiés en utilisant l’attribut type, et aussi potentiellement l’attribut subtype (si le type de l’étymologie

peut également être précisé). Les valeurs possibles pour les attributs type et subtype peuvent varier

en fonction de l’approche théorique adoptée par le compilateur d’une ressource et/ou de l’orientation

linguistique ou éditoriale de la ressource en question. L’imbrication des instances Etymology permet de

combiner les processus étymologiques à décrire. Les processus étymologiques qui doivent être utilisés

comme valeurs pour les attributs type/subtype peuvent par exemple être: emprunt, héritage; formation

des mots: mots composés, dérivation; glissements de sens: restriction, élargissement, amélioration,

péjoration, métaphore, métonymie; processus phonétiques/phonologiques: assimilation, dissimilation,

épenthèse, métathèse, durcissement, affaiblissement, etc. La liste des catégories de données fournie

à l’Annexe B doit être utilisée en complément des classes appropriées. Un type peut également être

spécifié pour des liens individuels entre deux éléments au sein d’une étymologie (voir la description

d’EtyLink ci-dessous). Comme une instance Etymology peut être extraite d’une source externe, elle

peut être associée à une instance Bibliography, qui doit être définie conformément à l’ISO 24613-2 (voir

Figure 3).
Figure 3 — Classe Etymology

Les instances Etymology sont associées à une ou plusieurs instances EtyLink, chacune d’elles

représentant une seule phase ou étape dans l’étymologie d’un élément lexical donné (voir Figure 4).

EtyLink sert à rattacher des individus appartenant aux sous-classes Etymologizable. EtyLink est une

sous-classe de la classe CrossREF définie dans l’ISO 24613-1. L’utilisation de CrossREF impose d’affecter

des attributs id aux objets cibles représentant le contenu lexical spécifique. L’utilisation de l’attribut

id, en tant que cible, sur un individu de la classe Etymologizable permet de modéliser un tri temporel

séquentiel générique d’éléments multiples, avec les attributs prev et next. Des instances de la classe

EtyLink peuvent également spécifier des relations temporelles supplémentaires en utilisant divers

attributs temporels associés à la source et à la cible de chaque instance EtyLink.

Figure 4 — EtyLink
© ISO 2021 – Tous droits réservés 3
---------------------- Page: 7 ----------------------
ISO 24613-3:2021(F)

Les individus des classes Etymon et Cognate (sous-types de LexicalEntry) doivent être associés à au

moins un individu EtyLink. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec les classes

Etymology et EtyLink.
4.4 Classe CognateSet

La classe CognateSet (voir Figure 5) est un conteneur d’ensembles d’un ou plusieurs éléments Cognate

et de zéro ou plusieurs éléments Bibliography (voir l’ISO 24613-2). La classe CognateSet est une

construction liée à l’onomasiologie. Elle contient des éléments issus de langages apparentés à celui d’une

classe LexicalEntry donnée (et donc liés par la relation de sous-classe de tout Etymon ou Cognate donné)

et qui ont été collectés dans le but de démontrer des similarités ou des dissimilarités linguistiques de

type essentiel. L’utilisation d’une classe CognateSet implique que les instances LexicalEntry (et donc

Etymon et Cognate) qu’elle contient partagent une source étymologique.
Figure 5 — CognateSet
4.5 Classe Date

Les composants d’une classe LexicalEntry et de ses sous-classes doivent être associés à une date

spécifique en utilisant la classe Date. De plus, cette classe Date permet de spécifier un nombre de degrés

de précision. Une année précise, et potentiellement un mois et un jour donnés, doivent être spécifiés

en utilisant l’attribut date et une date approximative avec l’attribut circa. Sur un laps de temps donné

avec différents niveaux de spécificité, il est possible d’utiliser un ou plusieurs attributs de date. Lorsque

le laps de temps est connu (ou évalué), ses bornes inférieure et supérieure peuvent être précisées en

utilisant respectivement notBefore et notAfter. Pour les formats de date et d’heure, l’ISO 8601-1 et

l’ISO 8601-2 doivent être utilisées.
4.6 Classe Gloss

La classe Gloss (voir Figure 6) représente une description textuelle de la signification d’un mot ou d’un

syntagme qui est destiné à être compris par l’homme. Les individus de cette classe peuvent représenter

des paraphrases ou des synonymes qui peuvent être rédigés dans la langue de l’entrée ou dans une

autre langue. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec cette classe.

4 © ISO 2021 – Tous droits réservés
---------------------- Page: 8 ----------------------
ISO 24613-3:2021(F)
Figure 6 — Gloss
Tableau 1 — Exemple d’affectation de classes
Nom de la classe Exemple d’attributs
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss x m l: l a n g
© ISO 2021 – Tous droits réservés 5
---------------------- Page: 9 ----------------------
ISO 24613-3:2021(F)
Annexe A
(informative)
Exemples de typologies étymologiques possibles
A.1 Exemple d’héritage simple

L’exemple de la Figure A.1 décrit l’héritage d’une entrée lexicale provenant d’une langue parente,

dans ce cas l’adverbe sarde sempe, qui provient du mot latin semper. Le mot sarde est lié à une seule

instance Etymology qui est associée au type inheritance (voir l’Annexe B pour une définition de ce type).

L’instance Etymology est ensuite associée à un individu de type EtyLink qui représente le processus

d’évolution de l’étymon latin jusqu’à l’entrée lexicale sarde.
Figure A.1 — Schéma d’héritage en sarde
A.2 Exemple de processus étymologique diachronique (héritage avec
changement phonologique)

Dans l’exemple suivant, l’évolution du mot signifiant «vin» (ipa: [veŋ]) dans la variété de vins italiens

Emiliano est suivie en utilisant une série d’étymons qui sont triés et mutuellement liés en utilisant des

instances EtyLink. Ce processus mène du latin vulgaire vinu jusqu’au prédécesseur immédiat du mot

sous sa forme actuelle. Ces liens individuels sont accessibles via un individu Etymology (avec le type

inheritance) qui représente l’historique de LexicalEntry. Le tri des instances EtyLink est effectué au

moyen des deux attributs prev et next. Ces attributs ne sont pas représentés sur la Figure A.2 pour des

raisons de lisibilité, mais leur utilisation est indiquée sur la Figure A.6, pour l’exemple de A.6, et sur la

Figure A.10, pour l’exemple de A.8.
6 © ISO 2021 – Tous droits réservés
---------------------- Page: 10 ----------------------
ISO 24613-3:2021(F)

Figure A.2 — Schéma d’un héritage en plusieurs étapes et d’un changement phonologique en

bolonais
A.3 Exemple d’héritage de mot-forme

La dérivation des formes singulière et plurielle du nom portugais naçao ’nation’, naçao (sg) et nações

(pl), respectivement, dérivé de deux formes d’un nom en latin vulgaire (LV), nātiōnem (sg, acc), nātiōnes

(pl, acc) est décrite sur la Figure A.3. Dans ce cas, comme les formes portugaises concernées sont

toutes deux au pluriel et au singulier, l’étymon a deux instances WordForm correspondant chacune à

un nombre grammatical. Il faut notamment remarquer l’association des attributs grammaticalCase et

inflectionType à la sous-classe WordForm d’Etymon via une instance GrammaticalInformation. Dans

un lexique de portugais détaillé qui contiendrait de telles informations étymologiques pour un nombre

suffisant d’entrées lexicales, il serait possible, en opposant le contenu de l’instance WordForm dans

LexicalEntry à la classe Etymon, de percevoir les phénomènes langagiers suivants: 1) perte du cas

grammatical portugais; 2) la grande majorité de ses noms découlent du cas accusatif en LV; 3) lorsque la

terminaison singulière (accusatif) en LV était -tiōnem, la forme portugaise s’écrit «-çao» et se prononce

[sɐ̃w]; lorsque la terminaison pluriel (accusatif) en LV était -tiōnes, la forme portugaise s’écrit «-ções»

et se prononce [sõj̃s].

NOTE Cette étymologie peut être davantage articulée en ajoutant les types de processus phonologiques

pour chaque phase de la diachronie. Cela peut être effectué dans le modèle en ajoutant la catégorie de données

appropriée définie dans l’Annexe B, à la valeur de type d’EtyLink pour les phases indiquées.

© ISO 2021 – Tous droits réservés 7
---------------------- Page: 11 ----------------------
ISO 24613-3:2021(F)
Figure A.3 — Schéma d’héritage de flexions portugaises
A.4 Exemple de glissement sémantique métaphorique

L’exemple de la Figure A.4 de Mixtepec-Mixtec illustre la dérivation d’un premier sens à un autre par

métaphore. Dans ce cas, le mot se traduisant par ’rein’ découlait d’une extension métaphorique du mot

signifiant ’haricot’ ntuchi. L’étymologie est liée au sens rénal du mot ntuchi. L’instance EtyLink contient

les attributs source et target avec les valeurs id des composants respectifs spécifiant la directionnalité

du processus. Un trait déterminant du processus de métaphore est la nécessité d’un changement dans le

domaine sémantique entre les sens source et cible, chaque sens ayant un domaine spécifié dans le champ

de sujet. La classe CrossREF définie dans l’ISO 24613-2 est utilisée pour spécifier un URI correspondant

à l’entrée dbpedia pour chaque sens.
Figure A.4 — Schéma d’un glissement de sens métaphorique en Mixtepec-Mixtec
8 © ISO 2021 – Tous droits réservés
---------------------- Page: 12 ----------------------
ISO 24613-3:2021(F)
A.5 Exemple d’emprunt et de formation de mots composés

La Figure A.5 présente un exemple d’emprunt étymologique complexe: l’emprunt du mot français

pamplemousse ’grapefruit’ à un mot hollandais (voir les références [4] et [18]). En plus de l’emprunt,

l’étymologie montre la diachronie du processus de formation de mots qui a eu lieu en hollandais où le

composé pompelmousse a été formé à partir des étymons pompel et limoes, lequel composé ayant ensuite

été emprunté en français. Une instance CrossREF (voir l’ISO 24613-2) est utilisée pour spécifier que

l’étymon emprunté est un mot composé en hollandais et que ses composants sont triés. Les composants

du mot composé sont représentés sous forme de classes Etymon contenant les formes essentielles telles

que Lemma.

Figure A.5 — Schéma d’emprunt et de formation du mot composé français pamplemousse

A.6 Exemple d’utilisation d’informations temporelles

L’exemple de la Figure A.6 présente le codage de l’étymologie de l’élément lexical français chef en

mettant l’accent sur les phases diachroniques et la chronologie de son évolution phonétique (voir la

référence [13]). Dans ce cas, au niveau Etymology, le processus spécifiant le fait que chef est hérité du

latin est codé dans l’attribut type qui prend la valeur «inheritance». Au sein du niveau Etymology, il

existe une série d’éléments Etymon contenant les attributs temporels notBefore et notAfter qui sont

utilisés pour spécifier les intervalles de temps, et le classement séquentiel sous-entend qu’une forme

donnée de l’élément aurait été utilisée. Il faut également noter le tri des éléments Etymon via l’utilisation

des attributs prev et next sur les individus EtyLink concernés.
© ISO 2021 – Tous droits réservés 9
---------------------- Page: 13 ----------------------
ISO 24613-3:2021(F)

Figure A.6 — Schéma des changements phonologiques en plusieurs phases du mot français chef

A.7 Ensemble de cognats avec bibliographie

La Figure A.7 illustre un exemple d’ensemble de cognats extrait d’une source réelle (dictionnaire

étymologique) (voir la référence [19]). Plusieurs formes de cognats dans des langues connexes sont

indiquées en vue d’une comparaison à l’élément lexical girl. Bien que LexicalEntry fasse intrinsèquement

partie de CognateSet, cette relation est hypothétique et n’est donc pas représentée en tant que classe

Cognate dans le codage.

Un point important à remarquer dans cette entrée est le fait que l’entrée lexicale girl, dont le sens anglais

contemporain est donné dans la définition de l’entrée principale, était autrefois utilisée pour désigner

une jeune personne de sexe masculin ou féminin, et a donc subi un glissement sémantique au cours

de sa longue période d’utilisation. Chaque cognat est associé à des instances Sense et Definition qui

décrivent conjointement des entrées lexicales rattachées dans les langues germaniques, toutes étant

supposées héritées d’une source commune.
10 © ISO 2021 – Tous droits réservés
---------------------- Page: 14 ----------------------
ISO 24613-3:2021(F)
Figure A.7 — Schéma d’un ensemble de cognats associé à l’entrée anglaise girl
A.8 Exemple étymologique complet

Afin de démontrer l’expressivité du modèle actuel et sa capacité à traiter des exemples réalistes de

données étymologiques, les informations étymologiques d’une partie d’une seule entrée, à savoir

[6]

l’entrée du mot latin amārus ‘bitter’ sur la Figure A.8 , sont modélisées sur les Figures A.9 et A.10.

L’entrée se compose d’une liste de deux des mots dérivés ainsi que d’une série d’étymons et de cognats

associés. L’entrée contient également un paragraphe de commentaires textuels sur l’étymologie

d’amārus (le paragraphe qui débute par ‘The suffix –arus’…); ce commentaire n’est pas modélisé

sous forme de données structurées dans cet exemple. Le texte d’origine de l’entrée est illustré sur la

Figure A.8:
© ISO 2021 – Tous droits réservés 11
---------------------- Page: 15 ----------------------
ISO 24613-3:2021(F)
[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.8 — Exemple d’entrée du mot latin amārus

La modélisation de cette entrée s’organise en deux parties. La première est un schéma (Figure A.9)

qui détaille la modélisation LMF des deux premières lignes de l’entrée (représentée sur la Figure A.8):

les lignes contenant le lemme, la signification et la catégorie grammaticale de l’entrée lexicale, ainsi

que ses dérivés. Dans la modélisation LMF de l’entrée, l’entrée lexicale amārus est rattachée à ses

dérivés en utilisant l’élément CrossREF (défini dans l’ISO 24613-2). CrossREF est utilisé comme un

moyen de modélisation de l’attestation d’une dérivation ou d’un sens particulier de l’entrée. Les

individus CrossREF en question sont considérés comme des attestations en utilisant l’attribut type, et

en ajoutant toute information supplémentaire (dans ce cas, la présence du signe plus après la citation

bibliographique dans la référence [6] signifie que l’usage linguistique en question peut également

apparaître dans des sources ultérieures) sous forme de texte à l’aide de l’attribut note. Dans cet exemple,

l’élément bibliographique se réfère à un seul auteur (et donc à son corpus d’ouvrages) et l’attribut date

se rapporte à toute sa durée de vie.
12 © ISO 2021 – Tous droits réservés
---------------------- Page: 16 ----------------------
ISO 24613-3:2021(F)
[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.9 — Partie 1 de l’entrée amārus — Lemmes, dérivés et bibliographies

La seconde partie de l’exemple (Figure A.10) détaille la modélisation des informations étymologiques

en elles-mêmes (pour faciliter la lecture, les éléments de chacun des deux ensembles de cognats sont

repérés par deux couleurs différentes: cyan et violet). Les cognats sont chacun représentés comme

appartenant à l’un des deux éléments CognateSet et toute l’étymologie est attribuée à la source en

associant la classe Bibliography à l’élément Etymology pour LexicalEntry. Deux autres éléments

Etymology sont utilisés pour décrire les étymologies des cognats inclus dans l’élément CognateSet.

L’attribut rootType est utilisé pour indiquer que la forme de l’étymon est reconstruite, ainsi que

l’attribut status pour indiquer un degré d’incertitude.
© ISO 2021 – Tous droits réservés 13
---------------------- Page: 17 ----------------------
ISO 24613-3:2021(F)
[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.10 — Partie 2 de l’entrée amārus — Cognats, étymologie et bibliographie
A.9 Simplification du modèle

Plusieurs exemples de la présente annexe illustrent différentes méthodes de simplification d’un modèle.

Voir l’ISO 24613-1:2019, 5.5 (Méthodes de sélection des catégories de données et de création de sous-

classes), et en particulier le paragraphe 5.5.6 (Principes de simplification d’un modèle), pour obtenir

une description normative du processus de simplification.
14 © ISO 2021 – Tous droits réservés
---------------------- Page: 18 ----------------------
ISO 24613-3:2021(F)
Annexe B
(normative)
Catégories de données pour la description étymologique

La présente annexe fournit une liste structurée de catégories qui doivent être utilisées comme des

valeurs possibles pour décrire des processus étymologiques, incluant des v
...

FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24613-3
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Lexical markup framework (LMF) —
Voting begins on:
2020-12-24
Part 3:
Voting terminates on:
Etymological extension
2021-02-18
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24613-3:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS. ISO 2020
---------------------- Page: 1 ----------------------
ISO/FDIS 24613-3:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/FDIS 24613-3:2020(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 The LMF etymology extension ................................................................................................................................................................ 2

4.1 The Cognate class and the Etymon class .......................................................................................................................... 2

4.2 The Etymologizable class ............................................................................................................................................................... 2

4.3 The Etymology class and the EtyLink class .................................................................................................................... 3

4.4 The CognateSet class .......................................................................................................................................................................... 4

4.5 The Date class .......................................................................................................................................................................................... 4

4.6 The Gloss class......................................................................................................................................................................................... 4

Annex A (informative) Examples of possible etymological typologies ............................................................................. 6

Annex B (normative) Data categories for etymology description ......................................................................................15

Bibliography .............................................................................................................................................................................................................................22

© ISO 2020 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/FDIS 24613-3:2020(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.

This first edition of ISO 24613-3, together with ISO 24613-1, ISO 24613-2, ISO 24613-4 and ISO 24613-5,

cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24613-3:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of

detailed descriptions of common etymological phenomena and/or diachronic information with respect

to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such

an extension as well as the relevant data categories.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules

ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions

ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-

readable dictionary (MRD) model
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate

form in a related language which shares a common etymological origin as a form in the language of

the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)

Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical

entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2020 – All rights reserved 1
---------------------- Page: 5 ----------------------
ISO/FDIS 24613-3:2020(E)
3.4
etymon
lexical entry from which another lexical entry is derived

Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.

3.5
onomasiology

approach to the investigation of word meaning which takes a given concept as a starting point and

studies the different lexical items in a language or languages that are used to refer to it

4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class

Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see

Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the

purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon

or Cognate can be assigned a language which is different from the language of the lexicon as a whole

(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry

Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at

least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it

is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances

of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact

of being a root and the type of the root in question shall be specified using the attribute rootType.

In the case of reconstructed roots or other word forms, the attribute status serves to associate the

element with a written description of the likelihood of its having been in use (see the example in A.8).

See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class

The Etymologizable class provides a means of referring to the set of linguistic elements that can have

etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes

of elements which can have etymologies can be easily extended in the future wherever the necessity

arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,

Sense, Form and CognateSet (see 4.4).

1) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this

document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.

2 © ISO 2020 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/FDIS 24613-3:2020(E)
Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class

The Etymology class allows for the description of the etymology of a linguistic element. More specifically,

it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.

The type or types of etymological process involved in a given etymology can be specified using the type

attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be

further specified). Possible values for type and subtype can vary according to the theoretical approach

adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of

nested Etymology instances allows a combination of etymological processes to be described. Examples

of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word

formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,

metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,

hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement

to the appropriate classes. Individual links between two elements in an etymology can also be given

a type, see the description of EtyLink below. Given that an Etymology instance can be taken from

an external source, it can be associated with a Bibliography instance, which shall be defined as per

ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class

Instances of Etymology are associated with one or more EtyLink instances, each of which represents a

single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together

individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class

as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given

lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable

class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,

using the attributes prev and next. Instances of the EtyLink class can further specify additional

temporal relationships using various temporal attributes associated with the source and target of each

EtyLink instance.
Figure 4 — EtyLink
© ISO 2020 – All rights reserved 3
---------------------- Page: 7 ----------------------
ISO/FDIS 24613-3:2020(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at

least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and

EtyLink classes.
4.4 The CognateSet class

The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or

more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its

contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass

relation of any given Etymon or Cognate) and which have been gathered together with the purpose

of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet

implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an

etymological source.
Figure 5 — CognateSet
4.5 The Date class

The components of a LexicalEntry and its subclasses shall be associated with a specific date by making

use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A

precise year, and potentially month and day, shall be stated using the date attribute and a rough date

with the attribute circa. Within a span of time with different levels of specificity, there is the possibility

of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper

ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,

ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class

The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase

that is intended for human consumption. Individuals of the class can either represent paraphrases or

synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of

attributes to be used with this class.
4 © ISO 2020 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/FDIS 24613-3:2020(E)
Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2020 – All rights reserved 5
---------------------- Page: 9 ----------------------
ISO/FDIS 24613-3:2020(E)
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance

The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this

case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is

linked to a single Etymology instance which is associated with the type inheritance (see Annex B for

a definition of this type). The Etymology is then associated with an individual of type EtyLink which

represents the process of change from the Latin etymon to the Sardinian lexical entry.

Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)

In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano

variety of Italian is traced back using a series of Etymons which are ordered and linked together using

EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in

its current manifestation. These individual links can be accessed through an Etymology individual

(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks

is implemented by means of the two attributes prev and next. These attributes are not displayed in

Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the Example given in A.6, and in

Figure A.10, for the Example given in A.8.
6 © ISO 2020 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/FDIS 24613-3:2020(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese

A.3 Example of word form inheritance

The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)

and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),

nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the

plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note

in particular the association of grammaticalCase and inflectionType attributes with the WordForm

of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that

contained such etymological information for a sufficient number of lexical entries, it would be possible,

by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the

following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its

nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the

Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is

-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].

NOTE This etymology can be further articulated by adding the phonological process types for each stage

of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the

value of type on the EtyLink for the given stages.
© ISO 2020 – All rights reserved 7
---------------------- Page: 11 ----------------------
ISO/FDIS 24613-3:2020(E)
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift

The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via

metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension

of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.

The EtyLink instance contains the attributes source and target with the id values of the respective

components specifying the directionality of the process. As a defining feature of the process of metaphor

is that there has to be a change in the semantic domain between the source and target sense, each sense

has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify

a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2020 – All rights reserved
---------------------- Page: 12 ----------------------
ISO/FDIS 24613-3:2020(E)
A.5 Example of borrowing and compounding

In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse

‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the

etymology shows the diachrony of the word formation process that occurred within Dutch in which

the compound pompelmousse was formed from the etymons pompel and limoes, with the former

subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that

the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of

the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information

The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a

focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein

at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded

in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of

Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the

spans of time, and the sequential ordering asserted is that a given form of the item would have been in

use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate

EtyLink individuals.
© ISO 2020 – All rights reserved 9
---------------------- Page: 13 ----------------------
ISO/FDIS 24613-3:2020(E)
Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography

Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source

(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical

item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed

and thus it is not represented as Cognate in the encoding.

A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English

sense is given in the Definition for the main entry, was used early on to denote a young person regardless

of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate

has an associated Sense and a Definition instance which together describe related lexical entries in

Germanic languages, all of which are hypothesised as having been inherited from a common source.

10 © ISO 2020 – All rights reserved
---------------------- Page: 14 ----------------------
ISO/FDIS 24613-3:2020(E)
Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example

In order to demonstrate some of the expressivity of the current model and its potential for dealing with

realistic examples of etymological data, the etymological information of part of a single entry, namely

[6]

the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.

The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and

cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that

is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data

in the example. The original text of the entry is shown in Figure A.8.
© ISO 2020 – All rights reserved 11
---------------------- Page: 15 ----------------------
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus

The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details

the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the

lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling

of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in

ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of

the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and

with any additional information (in this case the presence of the plus sign after the bibliographic citation

in Reference [6] means that the linguistic usage in question can also be found in later sources) added as

text using the note attribute. In this instance, the bibliographic element refers to a single author (and

therefore to their corpus of works) and the date attribute refers to the whole period encompassing their

lifespan.
12 © ISO 2020 – All rights reserved
---------------------- Page: 16 ----------------------
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies

The second part of the example (Figure A.10) details the modelling of the etymological information

itself (for ease of reading the elements of each of the two cognate sets are coloured using two different

colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements

and the entire etymology is attributed to the source by associating the Bibliography with the Etymology

element for the Lexical Entry. Two other Etymology elements are used to describe etymologies for

Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the

etymon is reconstructed, and the attribute status to indicate a level of uncertainty.

© ISO 2020 – All rights reserved 13
---------------------- Page: 17 ----------------------
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification

Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,

5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for

model simplification), for a normative description of the simplification process.

14 © ISO 2020 – All rights reserved
---------------------- Page: 18 ----------------------
ISO/FDIS 24613-3:2020(E)
Annex B
(normative)
Data categories for etymology description

This annex contains a structured list of categories that shall be used as possible values for describing

etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for

each category, its name, a camel case representation, a conceptual domain (for complex data categories),

a definition (with its source(s)), one or several examples (with their source(s) when applicable), as well

as some usage notes where needed.

Where applicable, the camel case form should be used in utilizing the features as a value for the attribute

type in declaring the process of an Etymology or EtyLink.

NOTE These features are not exhaustive and do not account for all the possible theoretical super- and sub-

typologies and variant definitions or descriptions.
Borrowing
Camel case code: borrowing

Definition: The process in which a lexical item, phrase or other linguistic feature from a foreign

language or dialect is introduced into a given language or dialect.
Source: Reference [4].
Note: The lexical items which result from borrowing are loanwords.
Example: Russian картошка is a loanword from High German Kartoffel.
Alternate term: loaning, importing, transferring, copying
Calque
Camel case code: calque

Definition: A loanword in which only the meaning is borrowed with the term being literally

translated into the borrowing language.
Source: Reference [5].

Example: Spanish rascacielos (rasca 'scratch', 'scrape' + cielos 'sky') from English skyscraper

Alternate term: loan translation, semantic loan
Inheritance
Camel case code: inheritance

Definition: Inheritance, although often not regarded as an etymological process in itself, is used

to identify sit
...

SLOVENSKI STANDARD
oSIST ISO/FDIS 24613-3:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension

Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:

Extension étymologique
Ta slovenski standard je istoveten z: ISO/FDIS 24613-3
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/FDIS 24613-3:2021 en,fr

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/FDIS 24613-3:2021
---------------------- Page: 2 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24613-3
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Lexical markup framework (LMF) —
Voting begins on:
2020-12-24
Part 3:
Voting terminates on:
Etymological extension
2021-02-18
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24613-3:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS. ISO 2020
---------------------- Page: 3 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 The LMF etymology extension ................................................................................................................................................................ 2

4.1 The Cognate class and the Etymon class .......................................................................................................................... 2

4.2 The Etymologizable class ............................................................................................................................................................... 2

4.3 The Etymology class and the EtyLink class .................................................................................................................... 3

4.4 The CognateSet class .......................................................................................................................................................................... 4

4.5 The Date class .......................................................................................................................................................................................... 4

4.6 The Gloss class......................................................................................................................................................................................... 4

Annex A (informative) Examples of possible etymological typologies ............................................................................. 6

Annex B (normative) Data categories for etymology description ......................................................................................15

Bibliography .............................................................................................................................................................................................................................22

© ISO 2020 – All rights reserved iii
---------------------- Page: 5 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.

This first edition of ISO 24613-3, together with ISO 24613-1, ISO 24613-2, ISO 24613-4 and ISO 24613-5,

cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2020 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24613-3:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of

detailed descriptions of common etymological phenomena and/or diachronic information with respect

to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such

an extension as well as the relevant data categories.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules

ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions

ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-

readable dictionary (MRD) model
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate

form in a related language which shares a common etymological origin as a form in the language of

the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)

Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical

entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2020 – All rights reserved 1
---------------------- Page: 7 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
3.4
etymon
lexical entry from which another lexical entry is derived

Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.

3.5
onomasiology

approach to the investigation of word meaning which takes a given concept as a starting point and

studies the different lexical items in a language or languages that are used to refer to it

4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class

Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see

Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the

purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon

or Cognate can be assigned a language which is different from the language of the lexicon as a whole

(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry

Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at

least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it

is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances

of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact

of being a root and the type of the root in question shall be specified using the attribute rootType.

In the case of reconstructed roots or other word forms, the attribute status serves to associate the

element with a written description of the likelihood of its having been in use (see the example in A.8).

See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class

The Etymologizable class provides a means of referring to the set of linguistic elements that can have

etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes

of elements which can have etymologies can be easily extended in the future wherever the necessity

arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,

Sense, Form and CognateSet (see 4.4).

1) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this

document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.

2 © ISO 2020 – All rights reserved
---------------------- Page: 8 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class

The Etymology class allows for the description of the etymology of a linguistic element. More specifically,

it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.

The type or types of etymological process involved in a given etymology can be specified using the type

attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be

further specified). Possible values for type and subtype can vary according to the theoretical approach

adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of

nested Etymology instances allows a combination of etymological processes to be described. Examples

of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word

formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,

metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,

hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement

to the appropriate classes. Individual links between two elements in an etymology can also be given

a type, see the description of EtyLink below. Given that an Etymology instance can be taken from

an external source, it can be associated with a Bibliography instance, which shall be defined as per

ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class

Instances of Etymology are associated with one or more EtyLink instances, each of which represents a

single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together

individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class

as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given

lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable

class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,

using the attributes prev and next. Instances of the EtyLink class can further specify additional

temporal relationships using various temporal attributes associated with the source and target of each

EtyLink instance.
Figure 4 — EtyLink
© ISO 2020 – All rights reserved 3
---------------------- Page: 9 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at

least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and

EtyLink classes.
4.4 The CognateSet class

The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or

more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its

contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass

relation of any given Etymon or Cognate) and which have been gathered together with the purpose

of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet

implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an

etymological source.
Figure 5 — CognateSet
4.5 The Date class

The components of a LexicalEntry and its subclasses shall be associated with a specific date by making

use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A

precise year, and potentially month and day, shall be stated using the date attribute and a rough date

with the attribute circa. Within a span of time with different levels of specificity, there is the possibility

of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper

ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,

ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class

The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase

that is intended for human consumption. Individuals of the class can either represent paraphrases or

synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of

attributes to be used with this class.
4 © ISO 2020 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2020 – All rights reserved 5
---------------------- Page: 11 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance

The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this

case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is

linked to a single Etymology instance which is associated with the type inheritance (see Annex B for

a definition of this type). The Etymology is then associated with an individual of type EtyLink which

represents the process of change from the Latin etymon to the Sardinian lexical entry.

Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)

In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano

variety of Italian is traced back using a series of Etymons which are ordered and linked together using

EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in

its current manifestation. These individual links can be accessed through an Etymology individual

(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks

is implemented by means of the two attributes prev and next. These attributes are not displayed in

Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the Example given in A.6, and in

Figure A.10, for the Example given in A.8.
6 © ISO 2020 – All rights reserved
---------------------- Page: 12 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese

A.3 Example of word form inheritance

The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)

and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),

nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the

plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note

in particular the association of grammaticalCase and inflectionType attributes with the WordForm

of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that

contained such etymological information for a sufficient number of lexical entries, it would be possible,

by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the

following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its

nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the

Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is

-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].

NOTE This etymology can be further articulated by adding the phonological process types for each stage

of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the

value of type on the EtyLink for the given stages.
© ISO 2020 – All rights reserved 7
---------------------- Page: 13 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift

The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via

metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension

of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.

The EtyLink instance contains the attributes source and target with the id values of the respective

components specifying the directionality of the process. As a defining feature of the process of metaphor

is that there has to be a change in the semantic domain between the source and target sense, each sense

has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify

a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2020 – All rights reserved
---------------------- Page: 14 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
A.5 Example of borrowing and compounding

In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse

‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the

etymology shows the diachrony of the word formation process that occurred within Dutch in which

the compound pompelmousse was formed from the etymons pompel and limoes, with the former

subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that

the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of

the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information

The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a

focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein

at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded

in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of

Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the

spans of time, and the sequential ordering asserted is that a given form of the item would have been in

use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate

EtyLink individuals.
© ISO 2020 – All rights reserved 9
---------------------- Page: 15 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography

Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source

(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical

item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed

and thus it is not represented as Cognate in the encoding.

A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English

sense is given in the Definition for the main entry, was used early on to denote a young person regardless

of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate

has an associated Sense and a Definition instance which together describe related lexical entries in

Germanic languages, all of which are hypothesised as having been inherited from a common source.

10 © ISO 2020 – All rights reserved
---------------------- Page: 16 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example

In order to demonstrate some of the expressivity of the current model and its potential for dealing with

realistic examples of etymological data, the etymological information of part of a single entry, namely

[6]

the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.

The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and

cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that

is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data

in the example. The original text of the entry is shown in Figure A.8.
© ISO 2020 – All rights reserved 11
---------------------- Page: 17 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus

The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details

the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the

lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling

of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in

ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of

the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and

with any additional information (in this case the presence of the plus sign after the bibliographic citation

in Reference [6] means that the linguistic usage in question can also be found in later sources) added as

text using the note attribute. In this instance, the bibliographic element refers to a single author (and

therefore to their corpus of works) and the date attribute refers to the whole period encompassing their

lifespan.
12 © ISO 2020 – All rights reserved
---------------------- Page: 18 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies

The second part of the example (Figure A.10) details the modelling of the etymological information

itself (for ease of reading the elements of each of the two cognate sets are coloured using two different

colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements

and the entire etymology is attributed to the source by associating the Bibliography with the Etymology

element for the Lexical Entry. Two other Etymology elements are used to describe etymologies for

Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the

etymon is reconstructed, and the attribute status to indicate a level of uncertainty.

© ISO 2020 – All rights reserved 13
---------------------- Page: 19 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification

Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,

5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for

model simplification), for a normative description of the simplification process.

14 © ISO 2020 – All rights reserved
---------------------- Page: 20 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Annex B
(normative)
Data categories for etymology description

This annex contains a structured list of categories that shall be used as possible values for describing

etymological processes, including sound changes as well as
...

SLOVENSKI STANDARD
oSIST ISO/FDIS 24613-3:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 3. del:
Etimološka razširitev
Language resource management -- Lexical markup framework (LMF) - Part 3:
Etymological extension

Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 3:

Extension étymologique
Ta slovenski standard je istoveten z: ISO/FDIS 24613-3
ICS:
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/FDIS 24613-3:2021 en,fr

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/FDIS 24613-3:2021
---------------------- Page: 2 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24613-3
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Lexical markup framework (LMF) —
Voting begins on:
2020-12-24
Part 3:
Voting terminates on:
Etymological extension
2021-02-18
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 3: Extension étymologique
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24613-3:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS. ISO 2020
---------------------- Page: 3 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 The LMF etymology extension ................................................................................................................................................................ 2

4.1 The Cognate class and the Etymon class .......................................................................................................................... 2

4.2 The Etymologizable class ............................................................................................................................................................... 2

4.3 The Etymology class and the EtyLink class .................................................................................................................... 3

4.4 The CognateSet class .......................................................................................................................................................................... 4

4.5 The Date class .......................................................................................................................................................................................... 4

4.6 The Gloss class......................................................................................................................................................................................... 4

Annex A (informative) Examples of possible etymological typologies ............................................................................. 6

Annex B (normative) Data categories for etymology description ......................................................................................15

Bibliography .............................................................................................................................................................................................................................22

© ISO 2020 – All rights reserved iii
---------------------- Page: 5 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.

This first edition of ISO 24613-3, together with ISO 24613-1, ISO 24613-2, ISO 24613-4 and ISO 24613-5,

cancels and replaces ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivision into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2020 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST ISO/FDIS 24613-3:2021
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24613-3:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 3:
Etymological extension
1 Scope

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of

detailed descriptions of common etymological phenomena and/or diachronic information with respect

to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such

an extension as well as the relevant data categories.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 8601-1, Date and time — Representations for information interchange — Part 1: Basic rules

ISO 8601-2, Date and time — Representations for information interchange — Part 2: Extensions

ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-

readable dictionary (MRD) model
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24613-1 and the following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
cognate

form in a related language which shares a common etymological origin as a form in the language of

the lexicon
3.2
etymologizable
meeting the conditions for having an etymology (3.3)

Note 1 to entry: "Etymologizable" is a category of lexical elements and usages (encompassing for instance lexical

entries, senses, word forms).
3.3
etymology
origin and historical development of any aspect of a given lexical item
© ISO 2020 – All rights reserved 1
---------------------- Page: 7 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
3.4
etymon
lexical entry from which another lexical entry is derived

Note 1 to entry: An etymon can also be simply an earlier stage of a lexical item.

3.5
onomasiology

approach to the investigation of word meaning which takes a given concept as a starting point and

studies the different lexical items in a language or languages that are used to refer to it

4 The LMF etymology extension
NOTE See Annex A for examples of possible etymological typologies.
4.1 The Cognate class and the Etymon class

Cognate and Etymon are defined as subclasses of the LexicalEntry class from the LMF core module (see

Figure 1) . Both classes define lexical entries which have been added to a lexical resource with the

purpose of describing the etymologies of one or more other lexical entries. Instances of either Etymon

or Cognate can be assigned a language which is different from the language of the lexicon as a whole

(this is specified in the LexiconInformation class as described in ISO 24613-1).
Figure 1 — Cognate and Etymon as subclasses of LexicalEntry

Individuals of both the Etymon and the Cognate classes shall be in an aggregation relationship with at

least one individual of type EtyLink (see 4.3). When describing etymologies, there are cases in which it

is necessary to deal with instances of LexicalEntry (and hence also by the subclass relation instances

of Etymon and Cognate) which are roots, and in particular reconstructed roots. In these cases, the fact

of being a root and the type of the root in question shall be specified using the attribute rootType.

In the case of reconstructed roots or other word forms, the attribute status serves to associate the

element with a written description of the likelihood of its having been in use (see the example in A.8).

See Table 1 for a list of attributes to be used with these two classes.
4.2 The Etymologizable class

The Etymologizable class provides a means of referring to the set of linguistic elements that can have

etymologies. By defining a single class encompassing all such ‘etymologizable’ elements, the classes

of elements which can have etymologies can be easily extended in the future wherever the necessity

arises. The following classes are subtypes of the Etymologizable class (see Figure 2): LexicalEntry,

Sense, Form and CognateSet (see 4.4).

1) In this document, the following colour scheme is used in diagrams: classes in yellow are introduced in this

document, and classes in pink have been previously introduced in ISO 24613-1 and ISO 24613-2.

2 © ISO 2020 – All rights reserved
---------------------- Page: 8 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure 2 — The Etymologizable class and its subclasses
4.3 The Etymology class and the EtyLink class

The Etymology class allows for the description of the etymology of a linguistic element. More specifically,

it allows for the description of those linguistic elements that are subclasses of the Etymologizable class.

The type or types of etymological process involved in a given etymology can be specified using the type

attribute, and also potentially the subtype attribute (in the case when the type of the etymology can be

further specified). Possible values for type and subtype can vary according to the theoretical approach

adopted by the compiler of a resource and/or the linguistic or editorial focus of the resource. The use of

nested Etymology instances allows a combination of etymological processes to be described. Examples

of etymological processes that shall be used as values for type/subtype: borrowing, inheritance; word

formation: compounding, derivation; sense shifts: narrowing, widening, amelioration, pejoration, metaphor,

metonymy; phonetic/phonological processes: place assimilation, dissimilation, epenthesis, metathesis,

hardening, weakening, etc. The list of data categories provided in Annex B shall be used in complement

to the appropriate classes. Individual links between two elements in an etymology can also be given

a type, see the description of EtyLink below. Given that an Etymology instance can be taken from

an external source, it can be associated with a Bibliography instance, which shall be defined as per

ISO 24613-2 (see Figure 3).
Figure 3 — The Etymology class

Instances of Etymology are associated with one or more EtyLink instances, each of which represents a

single stage or step in the etymology of a given lexical item (see Figure 4). EtyLink serves to link together

individuals belonging to the subclasses of Etymologizable. EtyLink is a subclass of the CrossREF class

as defined in ISO 24613-1. The use of CrossREF requires that the target objects representing the given

lexical content be given id attributes. The use of the id attribute on an individual of the Etymologizable

class as a target allows for the modelling of a generic sequential temporal ordering of multiple elements,

using the attributes prev and next. Instances of the EtyLink class can further specify additional

temporal relationships using various temporal attributes associated with the source and target of each

EtyLink instance.
Figure 4 — EtyLink
© ISO 2020 – All rights reserved 3
---------------------- Page: 9 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Individuals of the Etymon and Cognate classes (subtypes of LexicalEntry) shall be associated with at

least one individual of EtyLink. See Table 1 for a list of attributes to be used with the Etymology and

EtyLink classes.
4.4 The CognateSet class

The CognateSet class (see Figure 5) is a container for sets of one or more Cognate items and zero or

more Bibliography items (see ISO 24613-2). The CognateSet is a construct related to onomasiology. Its

contents are items from languages related to that of a given LexicalEntry (and therefore by the subclass

relation of any given Etymon or Cognate) and which have been gathered together with the purpose

of demonstrating linguistic similarities or dissimilarities of salient kinds. The use of CognateSet

implies that the LexicalEntry (and therefore Etymon and Cognate) instances which it contains share an

etymological source.
Figure 5 — CognateSet
4.5 The Date class

The components of a LexicalEntry and its subclasses shall be associated with a specific date by making

use of the Date class. Furthermore, Date allows the specification of a number of degrees of precision. A

precise year, and potentially month and day, shall be stated using the date attribute and a rough date

with the attribute circa. Within a span of time with different levels of specificity, there is the possibility

of using one or more dating attributes. Where a span of time is known (or asserted), the lower and upper

ends of the span can be specified using notBefore, notAfter respectively. For date and time formats,

ISO 8601-1 and ISO 8601-2 shall be used.
4.6 The Gloss class

The Gloss class (see Figure 6) represents a textual description of the meaning of a word or a phrase

that is intended for human consumption. Individuals of the class can either represent paraphrases or

synonyms and these may be in the language of the entry or in another language. See Table 1 for a list of

attributes to be used with this class.
4 © ISO 2020 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure 6 — Gloss
Table 1 — Example of class adornment
Class name Example of attributes
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss xml:l ang
© ISO 2020 – All rights reserved 5
---------------------- Page: 11 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Annex A
(informative)
Examples of possible etymological typologies
A.1 Example of simple inheritance

The example in Figure A.1 describes the inheritance of a lexical entry from a parent language, in this

case the adverb semper in Sardinian which comes from the Latin word semper. The Sardinian word is

linked to a single Etymology instance which is associated with the type inheritance (see Annex B for

a definition of this type). The Etymology is then associated with an individual of type EtyLink which

represents the process of change from the Latin etymon to the Sardinian lexical entry.

Figure A.1 — Diagram of inheritance in Sardinian
A.2 Example of a diachronic etymological process (inheritance with
phonological change)

In the following example, the development of the word meaning ‘wine’ (ipa: [veŋ]) in the Emiliano

variety of Italian is traced back using a series of Etymons which are ordered and linked together using

EtyLink instances, from the Vulgar Latin vinu through to the immediate predecessor of the word in

its current manifestation. These individual links can be accessed through an Etymology individual

(with the type inheritance) which represents the history of the LexicalEntry. The ordering of EtyLinks

is implemented by means of the two attributes prev and next. These attributes are not displayed in

Figure A.2 for reasons of space, but their use is shown in Figure A.6, for the Example given in A.6, and in

Figure A.10, for the Example given in A.8.
6 © ISO 2020 – All rights reserved
---------------------- Page: 12 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)

Figure A.2 — Diagram of multi-stage inheritance and phonological change in Bolognese

A.3 Example of word form inheritance

The derivation of the singular and plural forms of the Portuguese noun naçao ‘nation’, naçao (sg)

and nações (pl), respectively, derived from two forms of a Vulgar Latin (VL) noun, nātiōnem (sg, acc),

nātiōnes (pl, acc) is described in Figure A.3. Herein, since the Portuguese forms concerned are both the

plural and singular, the Etymon has two WordForm instances, one for each grammatical number. Note

in particular the association of grammaticalCase and inflectionType attributes with the WordForm

of the Etymon via a GrammaticalInformation instance. In a comprehensive lexicon of Portuguese that

contained such etymological information for a sufficient number of lexical entries, it would be possible,

by contrasting the contents of the WordForm in the LexicalEntry with the Etymon, to appreciate the

following language-wide phenomena: 1) Portuguese lost grammatical case; 2) the vast majority of its

nouns come from the VL accusative case; 3) where the VL singular (accusative) ending is -tiōnem, the

Portuguese form is written “-çao” and pronounced [sɐ̃w]; where the VL plural (accusative) ending is

-tiōnes, the Portuguese form is written “-ções” and pronounced [sõj̃s].

NOTE This etymology can be further articulated by adding the phonological process types for each stage

of the diachrony. This can done in the model by adding the appropriate data category defined in Annex B to the

value of type on the EtyLink for the given stages.
© ISO 2020 – All rights reserved 7
---------------------- Page: 13 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure A.3 — Diagram of inheritance of Portuguese inflections
A.4 Example of metaphorical semantic shift

The example in Figure A.4 from Mixtepec-Mixtec shows the derivation of one sense from another via

metaphor. In this case, the word translating to ‘kidney’ was derived from a metaphorical extension

of the word meaning ‘bean’ ntuchi. The Etymology is attached to the kidney sense of the word ntuchi.

The EtyLink instance contains the attributes source and target with the id values of the respective

components specifying the directionality of the process. As a defining feature of the process of metaphor

is that there has to be a change in the semantic domain between the source and target sense, each sense

has a domain specified in the subject field. The CrossREF class defined in ISO 24613-2 is used to specify

a URI corresponding to the dbpedia entry for each sense.
Figure A.4 — Diagram of metaphorical sense shift in Mixtepec-Mixtec
8 © ISO 2020 – All rights reserved
---------------------- Page: 14 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
A.5 Example of borrowing and compounding

In Figure A.5, an example of a complex etymological borrowing is given, the borrowing of pamplemousse

‘grapefruit’ by French from Dutch (see References [4] and [18]). In addition to the borrowing, the

etymology shows the diachrony of the word formation process that occurred within Dutch in which

the compound pompelmousse was formed from the etymons pompel and limoes, with the former

subsequently being borrowed into French. A CrossREF (see ISO 24613-2) instance is used to specify that

the borrowed Etymon is a compound in Dutch and that its components are ordered. The components of

the compound are represented as Etymons containing the salient forms as Lemma.
Figure A.5 — Diagram of borrowing and compounding of French pamplemousse
A.6 Example of use of temporal information

The example in Figure A.6 shows the encoding of the etymology of the French lexical item chef with a

focus on the diachronic stages and time frame of its phonetic development (see Reference [13]). Herein

at the level of Etymology, the process specifying the fact that chef was inherited from Latin is encoded

in the attribute type which is given the value ”inheritance”. Within the Etymology, there is a series of

Etymon items containing the temporal attributes notBefore and notAfter which are used to specify the

spans of time, and the sequential ordering asserted is that a given form of the item would have been in

use. Note also the ordering of Etymon items via the use of prev and next attributes on the appropriate

EtyLink individuals.
© ISO 2020 – All rights reserved 9
---------------------- Page: 15 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure A.6 — Diagram of multi-stage phonological changes of French chef
A.7 CognateSet with Bibliography

Figure A.7 shows an example of a cognate set as taken from an actual etymological dictionary source

(see Reference [19]). A number of cognate forms in related languages are given to compare to the lexical

item girl. Although the LexicalEntry is inherently part of the CognateSet, this relationship is assumed

and thus it is not represented as Cognate in the encoding.

A major point of interest in this entry is the fact that the lexical entry girl, whose contemporary English

sense is given in the Definition for the main entry, was used early on to denote a young person regardless

of gender, and therefore became subject to a semantic shift over its long period of use. Each Cognate

has an associated Sense and a Definition instance which together describe related lexical entries in

Germanic languages, all of which are hypothesised as having been inherited from a common source.

10 © ISO 2020 – All rights reserved
---------------------- Page: 16 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Figure A.7 — Diagram of cognate set for English entry girl
A.8 Comprehensive etymological example

In order to demonstrate some of the expressivity of the current model and its potential for dealing with

realistic examples of etymological data, the etymological information of part of a single entry, namely

[6]

the entry for the Latin word amārus ‘bitter’ in Figure A.8 , is modelled in Figures A.9 and A.10.

The entry consists of a listing of two of the word’s derivatives along with a series of its etymons and

cognates. The entry also contains a paragraph of textual commentary on the etymology of amārus (that

is the paragraph starting with ‘The suffix –arus’…); this commentary is not modelled as structured data

in the example. The original text of the entry is shown in Figure A.8.
© ISO 2020 – All rights reserved 11
---------------------- Page: 17 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.8 — Sample entry of Latin amārus

The modelling of this entry is presented in two parts. First is a diagram (Figure A.9) which details

the LMF modelling of the first two lines of the entry (shown in Figure A.8): the lines containing the

lemma, meaning and part of speech of the lexical entry as well as its derivatives. In the LMF modelling

of the entry, the lexical entry amārus is linked to its derivatives using the CrossREF element (defined in

ISO 24613-2). CrossREF is used as a means of modelling the attestation of a given derivation or sense of

the entry. The CrossREF individuals in question are typed as attestations using the attribute type, and

with any additional information (in this case the presence of the plus sign after the bibliographic citation

in Reference [6] means that the linguistic usage in question can also be found in later sources) added as

text using the note attribute. In this instance, the bibliographic element refers to a single author (and

therefore to their corpus of works) and the date attribute refers to the whole period encompassing their

lifespan.
12 © ISO 2020 – All rights reserved
---------------------- Page: 18 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.9 — Part 1 of the entry amārus — Lemma, derivatives and bibliographies

The second part of the example (Figure A.10) details the modelling of the etymological information

itself (for ease of reading the elements of each of the two cognate sets are coloured using two different

colours, cyan and purple). The Cognates are each shown as belonging to one of two CognateSet elements

and the entire etymology is attributed to the source by associating the Bibliography with the Etymology

element for the Lexical Entry. Two other Etymology elements are used to describe etymologies for

Cognates included within the CognateSet. The attribute rootType is used to indicate that the form of the

etymon is reconstructed, and the attribute status to indicate a level of uncertainty.

© ISO 2020 – All rights reserved 13
---------------------- Page: 19 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
[6]
SOURCE De Vaan (2008) , reproduced with permission.
Figure A.10 — Part 2 of entry amārus — Cognates, etymology and bibliography
A.9 Model simplification

Several examples in this Annex illustrate approaches for model simplification. See ISO 24613-1:2019,

5.5 (Methods for data category selection and subclass creation), and in particular, 5.5.6 (Principles for

model simplification), for a normative description of the simplification process.

14 © ISO 2020 – All rights reserved
---------------------- Page: 20 ----------------------
oSIST ISO/FDIS 24613-3:2021
ISO/FDIS 24613-3:2020(E)
Annex B
(normative)
Data categories for etymology description

This annex contains a structured list of categories that shall be used as possible values for describing

etymological processes, including sound changes as well as relevant lexical subfeatures. It describes, for

each category, its name, a camel case r
...

PROJET
NORME ISO/FDIS
FINAL
INTERNATIONALE 24613-3
ISO/TC 37/SC 4
Gestion des ressources
Secrétariat: KATS
linguistiques — Cadre de balisage
Début de vote:
2020-12-24 lexical (LMF) —
Vote clos le:
Partie 3:
2021-02-18
Extension étymologique
Language resource management — Lexical markup framework
(LMF) —
Part 3: Etymological extension
LES DESTINATAIRES DU PRÉSENT PROJET SONT
INVITÉS À PRÉSENTER, AVEC LEURS OBSER-
VATIONS, NOTIFICATION DES DROITS DE PRO-
PRIÉTÉ DONT ILS AURAIENT ÉVENTUELLEMENT
CONNAISSANCE ET À FOURNIR UNE DOCUMEN-
TATION EXPLICATIVE.
OUTRE LE FAIT D’ÊTRE EXAMINÉS POUR
ÉTABLIR S’ILS SONT ACCEPTABLES À DES FINS
INDUSTRIELLES, TECHNOLOGIQUES ET COM-
Numéro de référence
MERCIALES, AINSI QUE DU POINT DE VUE
ISO/FDIS 24613-3:2020(F)
DES UTILISATEURS, LES PROJETS DE NORMES
INTERNATIONALES DOIVENT PARFOIS ÊTRE
CONSIDÉRÉS DU POINT DE VUE DE LEUR POSSI-
BILITÉ DE DEVENIR DES NORMES POUVANT
SERVIR DE RÉFÉRENCE DANS LA RÉGLEMENTA-
TION NATIONALE. ISO 2020
---------------------- Page: 1 ----------------------
ISO/FDIS 24613-3:2020(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2020

Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette

publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,

y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut

être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.

ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2020 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO/FDIS 24613-3:2020(F)
Sommaire Page

Avant-propos ..............................................................................................................................................................................................................................iv

1 Domaine d’application ................................................................................................................................................................................... 1

2 Références normatives ................................................................................................................................................................................... 1

3 Termes et définitions ....................................................................................................................................................................................... 1

4 Extension étymologique de LMF .......................................................................................................................................................... 2

4.1 Classes Cognate et Etymon ........................................................................................................................................................... 2

4.2 Classe Etymologizable ...................................................................................................................................................................... 2

4.3 Classes Etymology et EtyLink .................................................................................................................................................... 3

4.4 Classe CognateSet ................................................................................................................................................................................. 4

4.5 Classe Date .................................................................................................................................................................................................. 4

4.6 Classe Gloss ................................................................................................................................................................................................ 4

Annexe A (informative) Exemples de typologies étymologiques possibles ................................................................. 6

Annexe B (normative) Catégories de données pour la description étymologique ............................................15

Bibliographie ...........................................................................................................................................................................................................................22

© ISO 2020 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO/FDIS 24613-3:2020(F)
Avant-propos

L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes

nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est

en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude

a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,

gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.

L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui

concerne la normalisation électrotechnique.

Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont

décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents

critères d’approbation requis pour les différents types de documents ISO. Le présent document a été

rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www

.iso .org/ directives).

L’attention est attirée sur le fait que certains des éléments du présent document peuvent faire l’objet de

droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable

de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant

les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de

l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de

brevets reçues par l’ISO (voir www .iso .org/ brevets).

Les appellations commerciales éventuellement mentionnées dans le présent document sont données

pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un

engagement.

Pour une explication de la nature volontaire des normes, la signification des termes et expressions

spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion

de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles

techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.

Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-

comité SC 4, Gestion des ressources linguistiques.

Cette première édition de l’ISO 24613-3, conjointement avec l’ISO 24613-1, l’ISO 24613-2, l’ISO 24613-4

et l’ISO 24613-5, annule et remplace l’ISO 24613:2008, qui a fait l’objet d’une révision technique.

Les principales modifications par rapport à l’édition précédente sont les suivantes:

— révision complète du contenu et de sa subdivision en plusieurs parties.

Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.

Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent

document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes

se trouve à l’adresse www .iso .org/ fr/ members .html.
iv © ISO 2020 – Tous droits réservés
---------------------- Page: 4 ----------------------
PROJET FINAL DE NORME INTERNATIONALE ISO/FDIS 24613-3:2020(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 3:
Extension étymologique
1 Domaine d’application

Le présent document décrit une extension de l’ISO 24613-1 et de l’ISO 24613-2 et facilite l’élaboration

des descriptions détaillées de phénomènes étymologiques courants et/ou d’informations diachroniques

par rapport aux entrées lexicales des lexiques numériques et/ou rétronumérisés. Il fournit à la fois un

métamodèle pour une extension de ce type ainsi que les catégories de données pertinentes.

2 Références normatives

Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur

contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.

Pour les références non datées, la dernière édition du document de référence s’applique (y compris les

éventuels amendements).

ISO 8601-1, Date et heure — Représentations pour l'échange d'information — Partie 1: Règles de base

ISO 8601-2, Date et heure — Représentations pour l'échange d'information — Partie 2: Extensions

ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1:

Modèle de base

ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de

dictionnaire lisible par ordinateur (MRD)
3 Termes et définitions

Pour les besoins du présent document, les termes et définitions de l’ISO 24613-1 ainsi que les suivants,

s’appliquent.

L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en

normalisation, consultables aux adresses suivantes:

— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;

— IEC Electropedia: disponible à l’adresse http:// www .electropedia .org/ .
3.1
cognat

dans une langue apparentée, forme qui partage une origine étymologique commune avec une forme

dans la langue du lexique
3.2
étymologisable
qui remplit les conditions requises pour avoir une étymologie (3.3)

Note 1 à l'article: Le terme «étymologisable» se rapporte à une catégorie d’éléments et d’usages lexicaux

(englobant par exemple des entrées lexicales, des sens et des mots-formes).
© ISO 2020 – Tous droits réservés 1
---------------------- Page: 5 ----------------------
ISO/FDIS 24613-3:2020(F)
3.3
étymologie
origine et développement historique de tout aspect d’un élément lexical donné
3.4
étymon
entrée lexicale dont découle une autre entrée lexicale

Note 1 à l'article: Un étymon peut également être une phase antérieure d’un élément lexical.

3.5
onomasiologie

étude sémantique des mots qui, en partant d’un concept donné, examine les différents éléments lexicaux

utilisés dans une ou plusieurs langues pour se référer à ce concept
4 Extension étymologique de LMF

NOTE Voir l’Annexe A pour obtenir des exemples de typologies étymologiques possibles.

4.1 Classes Cognate et Etymon

Cognate et Etymon sont définies comme des sous-classes de la classe LexicalEntry du module de base

LMF (voir Figure 1) . Les deux classes définissent des entrées lexicales qui ont été ajoutées à une

ressource lexicale dans le but de décrire les étymologies d’une ou plusieurs autres entrées lexicales. Aux

instances Etymon ou Cognate peut être assignée une langue qui est différente de celle du lexique dans

son ensemble (cette langue est spécifiée dans la classe LexiconInformation décrite dans l’ISO 24613-1).

Figure 1 — Sous-classes Cognate et Etymon de la classe LexicalEntry

Les individus des sous-classes Etymon et Cognate doivent être en relation d’agrégation avec au moins

un individu de type EtyLink (voir 4.3). Lors de la description des étymologies, il est nécessaire de traiter,

dans certains cas, des instances LexicalEntry (et, de ce fait, également des instances de relation de

sous-classe d’Etymon et de Cognate) qui sont des radicaux, et en particulier des radicaux reconstruits.

Dans ces cas, le fait d’être un radical et le type du radical en question doivent être spécifiés en utilisant

l’attribut rootType. Dans le cas de radicaux reconstruits ou d’autres mots-formes, l’attribut status sert

à associer l’élément à une description écrite de sa probabilité d’avoir été utilisé (voir l’exemple de A.8).

Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec ces deux sous-classes.

4.2 Classe Etymologizable

La classe Etymologizable permet de désigner l’ensemble des éléments linguistiques qui peuvent avoir

des étymologies. En définissant une seule classe englobant tous les éléments «étymologisables» de

ce type, les classes d’éléments qui peuvent avoir des étymologies pourront être facilement étendues

en fonction des besoins. Les classes suivantes sont des sous-types de la classe Etymologizable (voir

Figure 2): LexicalEntry, Sense, Form et CognateSet (voir 4.4).

1) Les schémas du présent document utilisent les codes de couleurs suivants : les classes en jaune sont

introduites dans le présent document, alors que les classes en rose l’avaient été précédemment dans l’ISO 24613-1

et l’ISO 24613-2.
2 © ISO 2020 – Tous droits réservés
---------------------- Page: 6 ----------------------
ISO/FDIS 24613-3:2020(F)
Figure 2 — Classe Etymologizable et ses sous-classes
4.3 Classes Etymology et EtyLink

La classe Etymology permet de décrire l’étymologie d’un élément linguistique. Plus spécifiquement,

elle permet de décrire les éléments linguistiques qui sont des sous-classes de la classe Etymologizable.

Le type ou les types de processus étymologique impliqués dans une étymologie donnée peuvent être

spécifiés en utilisant l’attribut type, et aussi potentiellement l’attribut subtype (si le type de l’étymologie

peut également être précisé). Les valeurs possibles pour les attributs type et subtype peuvent varier

en fonction de l’approche théorique adoptée par le compilateur d’une ressource et/ou de l’orientation

linguistique ou éditoriale de la ressource en question. L’imbrication des instances Etymology permet de

combiner les processus étymologiques à décrire. Les processus étymologiques qui doivent être utilisés

comme valeurs pour les attributs type/subtype peuvent par exemple être: emprunt, héritage; formation

des mots: mots composés, dérivation; glissements de sens: restriction, élargissement, amélioration,

péjoration, métaphore, métonymie; processus phonétiques/phonologiques: assimilation, dissimilation,

épenthèse, métathèse, durcissement, affaiblissement, etc. La liste des catégories de données fournie

à l’Annexe B doit être utilisée en complément des classes appropriées. Un type peut également être

spécifié pour des liens individuels entre deux éléments au sein d’une étymologie (voir la description

d’EtyLink ci-dessous). Comme une instance Etymology peut être extraite d’une source externe, elle

peut être associée à une instance Bibliography, qui doit être définie conformément à l’ISO 24613-2 (voir

Figure 3).
Figure 3 — Classe Etymology

Les instances Etymology sont associées à une ou plusieurs instances EtyLink, chacune d’elles

représentant une seule phase ou étape dans l’étymologie d’un élément lexical donné (voir Figure 4).

EtyLink sert à rattacher des individus appartenant aux sous-classes Etymologizable. EtyLink est une

sous-classe de la classe CrossREF définie dans l’ISO 24613-1. L’utilisation de CrossREF impose d’affecter

des attributs id aux objets cibles représentant le contenu lexical spécifique. L’utilisation de l’attribut

id, en tant que cible, sur un individu de la classe Etymologizable permet de modéliser un tri temporel

séquentiel générique d’éléments multiples, avec les attributs prev et next. Des instances de la classe

EtyLink peuvent également spécifier des relations temporelles supplémentaires en utilisant divers

attributs temporels associés à la source et à la cible de chaque instance EtyLink.

Figure 4 — EtyLink
© ISO 2020 – Tous droits réservés 3
---------------------- Page: 7 ----------------------
ISO/FDIS 24613-3:2020(F)

Les individus des classes Etymon et Cognate (sous-types de LexicalEntry) doivent être associés à au

moins un individu EtyLink. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec les classes

Etymology et EtyLink.
4.4 Classe CognateSet

La classe CognateSet (voir Figure 5) est un conteneur d’ensembles d’un ou plusieurs éléments Cognate

et de zéro ou plusieurs éléments Bibliography (voir l’ISO 24613-2). La classe CognateSet est une

construction liée à l’onomasiologie. Elle contient des éléments issus de langages apparentés à celui d’une

classe LexicalEntry donnée (et donc liés par la relation de sous-classe de tout Etymon ou Cognate donné)

et qui ont été collectés dans le but de démontrer des similarités ou des dissimilarités linguistiques de

type essentiel. L’utilisation d’une classe CognateSet implique que les instances LexicalEntry (et donc

Etymon et Cognate) qu’elle contient partagent une source étymologique.
Figure 5 — CognateSet
4.5 Classe Date

Les composants d’une classe LexicalEntry et de ses sous-classes doivent être associés à une date

spécifique en utilisant la classe Date. De plus, cette classe Date permet de spécifier un nombre de degrés

de précision. Une année précise, et potentiellement un mois et un jour donnés, doivent être spécifiés

en utilisant l’attribut date et une date approximative avec l’attribut circa. Sur un laps de temps donné

avec différents niveaux de spécificité, il est possible d’utiliser un ou plusieurs attributs de date. Lorsque

le laps de temps est connu (ou évalué), ses bornes inférieure et supérieure peuvent être précisées en

utilisant respectivement notBefore et notAfter. Pour les formats de date et d’heure, l’ISO 8601-1 et

l’ISO 8601-2 doivent être utilisées.
4.6 Classe Gloss

La classe Gloss (voir Figure 6) représente une description textuelle de la signification d’un mot ou d’un

syntagme qui est destiné à être compris par l’homme. Les individus de cette classe peuvent représenter

des paraphrases ou des synonymes qui peuvent être rédigés dans la langue de l’entrée ou dans une

autre langue. Voir le Tableau 1 pour obtenir une liste d’attributs à utiliser avec cette classe.

4 © ISO 2020 – Tous droits réservés
---------------------- Page: 8 ----------------------
ISO/FDIS 24613-3:2020(F)
Figure 6 — Gloss
Tableau 1 — Exemple d’affectation de classes
Nom de la classe Exemple d’attributs
Etymon xml: lang, gloss, rootType, status
Etymology type, subtype
EtyLink type, prev, next
CognateSet
Cognate xml: lang, gloss, rootType, status
Date notBefore, notAfter, circa, date
Gloss x m l: l a n g
© ISO 2020 – Tous droits réservés 5
---------------------- Page: 9 ----------------------
ISO/FDIS 24613-3:2020(F)
Annexe A
(informative)
Exemples de typologies étymologiques possibles
A.1 Exemple d’héritage simple

L’exemple de la Figure A.1 décrit l’héritage d’une entrée lexicale provenant d’une langue parente,

dans ce cas l’adverbe sarde sempe, qui provient du mot latin semper. Le mot sarde est lié à une seule

instance Etymology qui est associée au type inheritance (voir l’Annexe B pour une définition de ce type).

L’instance Etymology est ensuite associée à un individu de type EtyLink qui représente le processus

d’évolution de l’étymon latin jusqu’à l’entrée lexicale sarde.
Figure A.1 — Schéma d’héritage en sarde
A.2 Exemple de processus étymologique diachronique (héritage avec
changement phonologique)

Dans l’exemple suivant, l’évolution du mot signifiant «vin» (ipa: [veŋ]) dans la variété de vins italiens

Emiliano est suivie en utilisant une série d’étymons qui sont triés et mutuellement liés en utilisant des

instances EtyLink. Ce processus mène du latin vulgaire vinu jusqu’au prédécesseur immédiat du mot

sous sa forme actuelle. Ces liens individuels sont accessibles via un individu Etymology (avec le type

inheritance) qui représente l’historique de LexicalEntry. Le tri des instances EtyLink est effectué au

moyen des deux attributs prev et next. Ces attributs ne sont pas représentés sur la Figure A.2 pour des

raisons de lisibilité, mais leur utilisation est indiquée sur la Figure A.6, pour l’exemple de A.6, et sur la

Figure A.10, pour l’exemple de A.8.
6 © ISO 2020 – Tous droits réservés
---------------------- Page: 10 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure A.2 — Schéma d’un héritage en plusieurs étapes et d’un changement phonologique en

bolonais
A.3 Exemple d’héritage de mot-forme

La dérivation des formes singulière et plurielle du nom portugais naçao ’nation’, naçao (sg) et nações

(pl), respectivement, dérivé de deux formes d’un nom en latin vulgaire (LV), nātiōnem (sg, acc), nātiōnes

(pl, acc) est décrite sur la Figure A.3. Dans ce cas, comme les formes portugaises concernées sont

toutes deux au pluriel et au singulier, l’étymon a deux instances WordForm correspondant chacune à

un nombre grammatical. Il faut notamment remarquer l’association des attributs grammaticalCase et

inflectionType à la sous-classe WordForm d’Etymon via une instance GrammaticalInformation. Dans

un lexique de portugais détaillé qui contiendrait de telles informations étymologiques pour un nombre

suffisant d’entrées lexicales, il serait possible, en opposant le contenu de l’instance WordForm dans

LexicalEntry à la classe Etymon, de percevoir les phénomènes langagiers suivants: 1) perte du cas

grammatical portugais; 2) la grande majorité de ses noms découlent du cas accusatif en LV; 3) lorsque la

terminaison singulière (accusatif) en LV était -tiōnem, la forme portugaise s’écrit «-çao» et se prononce

[sɐ̃w]; lorsque la terminaison pluriel (accusatif) en LV était -tiōnes, la forme portugaise s’écrit «-ções»

et se prononce [sõj̃s].

NOTE Cette étymologie peut être davantage articulée en ajoutant les types de processus phonologiques

pour chaque phase de la diachronie. Cela peut être effectué dans le modèle en ajoutant la catégorie de données

appropriée définie dans l’Annexe B, à la valeur de type d’EtyLink pour les phases indiquées.

© ISO 2020 – Tous droits réservés 7
---------------------- Page: 11 ----------------------
ISO/FDIS 24613-3:2020(F)
Figure A.3 — Schéma d’héritage de flexions portugaises
A.4 Exemple de glissement sémantique métaphorique

L’exemple de la Figure A.4 de Mixtepec-Mixtec illustre la dérivation d’un premier sens à un autre par

métaphore. Dans ce cas, le mot se traduisant par ’rein’ découlait d’une extension métaphorique du mot

signifiant ’haricot’ ntuchi. L’étymologie est liée au sens rénal du mot ntuchi. L’instance EtyLink contient

les attributs source et target avec les valeurs id des composants respectifs spécifiant la directionnalité

du processus. Un trait déterminant du processus de métaphore est la nécessité d’un changement dans le

domaine sémantique entre les sens source et cible, chaque sens ayant un domaine spécifié dans le champ

de sujet. La classe CrossREF définie dans l’ISO 24613-2 est utilisée pour spécifier un URI correspondant

à l’entrée dbpedia pour chaque sens.
Figure A.4 — Schéma d’un glissement de sens métaphorique en Mixtepec-Mixtec
8 © ISO 2020 – Tous droits réservés
---------------------- Page: 12 ----------------------
ISO/FDIS 24613-3:2020(F)
A.5 Exemple d’emprunt et de formation de mots composés

La Figure A.5 présente un exemple d’emprunt étymologique complexe: l’emprunt du mot français

pamplemousse ’grapefruit’ à un mot hollandais (voir les références [4] et [18]). En plus de l’emprunt,

l’étymologie montre la diachronie du processus de formation de mots qui a eu lieu en hollandais où le

composé pompelmousse a été formé à partir des étymons pompel et limoes, lequel composé ayant ensuite

été emprunté en français. Une instance CrossREF (voir l’ISO 24613-2) est utilisée pour spécifier que

l’étymon emprunté est un mot composé en hollandais et que ses composants sont triés. Les composants

du mot composé sont représentés sous forme de classes Etymon contenant les formes essentielles telles

que Lemma.

Figure A.5 — Schéma d’emprunt et de formation du mot composé français pamplemousse

A.6 Exemple d’utilisation d’informations temporelles

L’exemple de la Figure A.6 présente le codage de l’étymologie de l’élément lexical français chef en

mettant l’accent sur les phases diachroniques et la chronologie de son évolution phonétique (voir la

référence [13]). Dans ce cas, au niveau Etymology, le processus spécifiant le fait que chef est hérité du

latin est codé dans l’attribut type qui prend la valeur «inheritance». Au sein du niveau Etymology, il

existe une série d’éléments Etymon contenant les attributs temporels notBefore et notAfter qui sont

utilisés pour spécifier les intervalles de temps, et le classement séquentiel sous-entend qu’une forme

donnée de l’élément aurait été utilisée. Il faut également noter le tri des éléments Etymon via l’utilisation

des attributs prev et next sur les individus EtyLink concernés.
© ISO 2020 – Tous droits réservés 9
---------------------- Page: 13 ----------------------
ISO/FDIS 24613-3:2020(F)

Figure A.6 — Schéma des changements phonologiques en plusieurs phases du mot français chef

A.7 Ensemble de cognats avec bibliographie

La Figure A.7 illustre un exemple d’ensemble de cognats extrait d’une source réelle (dictionnaire

étymologique) (voir la référence [19]). Plusieurs formes de cognats dans des langues connexes sont

indiquées en vue d’une comparaison à l’élément lexical girl. Bien que LexicalEntry fasse intrinsèquement

partie de CognateSet, cette relation est hypothétique et n’est donc pas représentée en tant que classe

Cognate dans le codage.

Un point important à remarquer dans cette entrée est le fait que l’entrée lexicale girl, dont le sens anglais

contemporain est donné dans la définition de l’entrée principale, était autrefois utilisée pour désigner

une jeune personne de sexe masculin ou féminin, et a donc subi un glissement sémantique au cours

de sa longue période d’utilisation. Chaque cognat est associé à des instances Sense et Definition qui

décrivent conjointement des entrées lexicales rattachées dans les langues germaniques, toutes étant

supposées héritées d’une source commune.
10 © ISO 2020 – Tous droits réservés
---------------------- Page: 14 ----------------------
ISO/FDIS 24613-3:2020(F)
Figure A.7 — Schéma d’un ensemble de cognats associé à l’entrée anglaise girl
A.8 Exemple étymologique complet

Afin de démontrer l’expressivité du modèle actuel et sa capacité à traiter des exemples réalistes de

données étymologiques, les informations étymologiques d’une partie d’une seule entrée, à savoir

[6]

l’entrée du mot latin amārus ‘bitter’ sur la Figure A.8 , sont modélisées sur les Figures A.9 et A.10.

L’entrée se compose d’une liste de deux des mots dérivés ainsi que d’une série d’étymons et de cognats

associés. L’entrée contient également un paragraphe de commentaires textuels sur l’étymologie

d’amārus (le paragraphe qui débute par ‘The suffix –arus’…); ce commentaire n’est pas modélisé

sous forme de données structurées dans cet exemple. Le texte d’origine de l’entrée est illustré sur la

Figure A.8:
© ISO 2020 – Tous droits réservés 11
---------------------- Page: 15 ----------------------
ISO/FDIS 24613-3:2020(F)
[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.8 — Exemple d’entrée du mot latin amārus

La modélisation de cette entrée s’organise en deux parties. La première est un schéma (Figure A.9)

qui détaille la modélisation LMF des deux premières lignes de l’entrée (représentée sur la Figure A.8):

les lignes contenant le lemme, la signification et la catégorie grammaticale de l’entrée lexicale, ainsi

que ses dérivés. Dans la modélisation LMF de l’entrée, l’entrée lexicale amārus est rattachée à ses

dérivés en utilisant l’élément CrossREF (défini dans l’ISO 24613-2). CrossREF est utilisé comme un

moyen de modélisation de l’attestation d’une dérivation ou d’un sens particulier de l’entrée. Les

individus CrossREF en question sont considérés comme des attestations en utilisant l’attribut type, et

en ajoutant toute information supplémentaire (dans ce cas, la présence du signe plus après la citation

bibliographique dans la référence [6] signifie que l’usage linguistique en question peut également

apparaître dans des sources ultérieures) sous forme de texte à l’aide de l’attribut note. Dans cet exemple,

l’élément bibliographique se réfère à un seul auteur (et donc à son corpus d’ouvrages) et l’attribut date

se rapporte à toute sa durée de vie.
12 © ISO 2020 – Tous droits réservés
---------------------- Page: 16 ----------------------
ISO/FDIS 24613-3:2020(F)
[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.9 — Partie 1 de l’entrée amārus — Lemmes, dérivés et bibliographies

La seconde partie de l’exemple (Figure A.10) détaille la modélisation des informations étymologiques

en elles-mêmes (pour faciliter la lecture, les éléments de chacun des deux ensembles de cognats sont

repérés par deux couleurs différentes: cyan et violet). Les cognats sont chacun représentés comme

appartenant à l’un des deux éléments CognateSet et toute l’étymologie est attribuée à la source en

associant la classe Bibliography à l’élément Etymology pour l’entrée lexicale. Deux autres éléments

Etymology sont utilisés pour décrire les étymologies des cognats inclus dans l’élément CognateSet.

L’attribut rootType est utilisé pour indiquer que la forme de l’étymon est reconstruite, ainsi que

l’attribut status pour indiquer un degré d’incertitude.
© ISO 2020 – Tous droits réservés 13
---------------------- Page: 17 ----------------------
ISO/FDIS 24613-3:2020(F)
[6]
SOURCE De Vaan (2008) , reproduction autorisée.
Figure A.10 — Partie 2 de l’entrée amārus — Cognats, étymologie et bibliographie
A.9 Simplification du modèle
Plusieurs exemples de la présente annexe illustrent différentes métho
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.