ISO 24613-5:2022
(Main)Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization
Language resource management — Lexical markup framework (LMF) — Part 5: Lexical base exchange (LBX) serialization
This document describes the serialization of the lexical markup framework (LMF) model defined as an extensible markup language (XML) model derived from the language base exchange (LBX) schema and compliant with the W3C XML schema. This serialization covers the classes, data categories, and mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model), and ISO 24613-3 (etymological extension).
Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 5: Sérialisation de l’échange de bases lexicales (LBX)
Le présent document décrit la sérialisation du modèle de cadre de balisage lexical (LMF) défini en tant que modèle de langage de balisage extensible (XML) issu du schéma d’échange de bases lexicales (LBX) et conforme au schéma W3C XML. Cette sérialisation couvre les classes, les catégories de données et les mécanismes de l’ISO 24613-1 (modèle de base), de l’ISO 24613-2 (modèle de dictionnaire lisible par ordinateur (MRD)) et de l’ISO 24613-3 (extension étymologique).
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del: Serializacija leksikalne osnovne izmenjave (LBX)
Ta dokument opisuje serializacijo modela ogrodja za označevanje leksikonov (LMF), opredeljenega kot model razširljivega označevalnega jezika (XML), ki izhaja iz sheme jezikovne osnovne izmenjave (LBX) in je skladen s shemo W3C XML. Ta serializacija zajema razrede, podatkovne kategorije in mehanizme standardov ISO 24613-1 (jedrni model), ISO 24613-2 (model strojno berljivega slovarja (MRD)) in ISO 24613-3 (etimološka razširitev).
General Information
Relations
Buy Standard
Standards Content (Sample)
SLOVENSKI STANDARD
SIST ISO 24613-5:2023
01-januar-2023
Nadomešča:
SIST ISO 24613:2013
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del:
Serializacija leksikalne osnovne izmenjave (LBX)
Language resource management -- Lexical markup framework (LMF) - Part 5: Lexical
base exchange (LBX) serialization
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 5:
Sérialisation de l’échange de bases lexicales (LBX)
Ta slovenski standard je istoveten z: ISO 24613-5:2022
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24613-5:2023 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24613-5:2023
---------------------- Page: 2 ----------------------
SIST ISO 24613-5:2023
INTERNATIONAL ISO
STANDARD 24613-5
First edition
2022-01
Language resource management —
Lexical markup framework (LMF) —
Part 5:
Lexical base exchange (LBX)
serialization
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 5: Sérialisation de l’échange de bases lexicales (LBX)
Reference number
ISO 24613-5:2022(E)
© ISO 2022
---------------------- Page: 3 ----------------------
SIST ISO 24613-5:2023
ISO 24613-5:2022(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
© ISO 2022 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24613-5:2023
ISO 24613-5:2022(E)
Contents Page
Foreword .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General requirements . 1
5 Serialization of the LMF core model (ISO 24613-1) . 2
5.1 Implementing the LexicalResource class . 2
5.2 Implementing the GlobalInformation class. 2
5.3 Implementing the Lexicon class . 3
5.4 Implementing the LexiconInformation class . 4
5.5 Implementing the LexicalEntry class . 4
5.6 Implementing the OrthographicRepresentation class . 5
5.7 Implementing the Form class . 6
5.7.1 Form class . 6
5.7.2 Lemma class . 6
5.8 Implementing the GrammaticalInformation class . 6
5.9 Implementing the Sense class . 7
5.10 Implementing the Definition class . 8
5.11 Implementing the CrossREF class . 8
6 Serialization of the MRD extension (ISO 24613-2) .10
6.1 Implementing OrthographicRepresentation subclasses . 10
6.2 Implementing the FormRepresentation class . 10
6.3 Implementing the Form subclasses. 11
6.3.1 General principles . 11
6.3.2 Implementing the WordForm class. 11
6.3.3 Implementing the Stem class . 11
6.3.4 Implementing the WordPart class . 11
6.3.5 Implementing the RelatedForm class .12
6.3.6 Implementing the TextRepresentation class .13
6.3.7 Implementing the Translation class . 14
6.3.8 Implementing the Example class . 14
6.4 Implementing the SubjectField class . 15
6.5 Implementing the Bibliography class . 15
7 Implementing the CrossREF mechanism to refer to external media files.15
8 Implementing the classes from the etymological extension (ISO 24613-3) .15
8.1 Implementing the Etymology class . 15
8.2 Implementing the Etymon class . 16
8.2.1 General . 16
8.2.2 Referencing forms in an etymon . 16
8.2.3 Representing the meaning of an etymon . 16
8.2.4 Representing the language of an etymon . 16
8.2.5 Dating an etymon . 17
8.2.6 Providing sources associated with an etymon . 17
8.3 Implementing the EtyLink class . 17
8.4 Implementing the CognateSet class . 17
8.5 Implementing the Cognate class . 17
9 Additional mechanisms .18
9.1 Overview . 18
9.2 XML feature structure implementation . 18
9.3 Representing various labels with . 18
9.4 Providing rendering information with the @rend attribute . 18
iii
© ISO 2022 – All rights reserved
---------------------- Page: 5 ----------------------
SIST ISO 24613-5:2023
ISO 24613-5:2022(E)
Annex A (informative) LBX data category selection .19
Annex B (informative) LBX feature structure implementation .24
Annex C (informative) LBX examples for applying LBX serialization .27
Bibliography .32
iv
© ISO 2022 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24613-5:2023
ISO 24613-5:2022(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-5, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-3:2021
and ISO 24613-4:2021, cancels and replaces ISO 24613:2008, which has been technically revised.
The main change compared to the previous edition is as follows:
— entire revision of the content and its subdivisions into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
© ISO 2022 – All rights reserved
---------------------- Page: 7 ----------------------
SIST ISO 24613-5:2023
---------------------- Page: 8 ----------------------
SIST ISO 24613-5:2023
INTERNATIONAL STANDARD ISO 24613-5:2022(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope
This document describes the serialization of the lexical markup framework (LMF) model defined as
an extensible markup language (XML) model derived from the language base exchange (LBX) schema
and compliant with the W3C XML schema. This serialization covers the classes, data categories, and
mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model),
and ISO 24613-3 (etymological extension).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological
extension
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), September 2009. Best Current
Practice. Available from: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Second Edition). W3C Recommendation 16 August
2006, edited in place 29 September 2006. Available from: https:// www .w3 .org/ TR/ 2006/ REC -xml11
-20060816/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and ISO 24613-3
apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
4 General requirements
This document aims at providing constructs for each LMF class from ISO 24613-1 (core model),
ISO 24613-2 (MRD extension), and ISO 24613-3 (etymological extension). It requires compliance
1
© ISO 2022 – All rights reserved
---------------------- Page: 9 ----------------------
SIST ISO 24613-5:2023
ISO 24613-5:2022(E)
with ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in
the respective parts, and compliance with the W3C XML Schema 1.1 for representing structured
information in XML. LBX extends the original models by means of data category selections and precise
value lists, the creation of new subclasses and the definition of new constraints. In addition, this
document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.
The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this
document does not elaborate on the metadata aspects from LMF, since the LBX schema is by essence
much richer for the representation of all the aspects related to the creation, content, versioning and
database implementation of lexical content at large. Occasionally, slightly equivalent constructs to
explicit requirements from the LMF standard are mentioned.
The XML examples in this document are simplified by omitting namespaces. Except where otherwise
stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within
the scope of the following XML namespace declaration:
xmlns="http:// www .LexicalBaseExchange .org/ 2021/ schema"
Besides, datatypes in this document are defined in compliance to the XML Schema Part 2
recommendation. The “xs:” prefix corresponds to the following namespace:
ht t p:// w w w .w3 . or g / 20 01/ X M L S c hem a
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class
The LexicalResource class shall be implemented in LBX by means of the element
(see Table 1), which groups together one to many lexicons in a single collection. This level may be
omitted in cases where the lexical resource contains only one lexicon so that the resource starts
directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or
several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a
@lexicalResourceID in the element and optionally the element (see 5.5).
Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class
The GlobalInformation class shall be implemented in LBX by means of the element
(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or
as a direct child of a element. allows the encoding of a variety
of administrative, technical, documentary, and bibliographic information attached to the corresponding
lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/
Since the LBX serialization is based on the W3C recommendation for XML, it implements the
@xml: lang attribute to indicate language information corresponding to the content of specific
elements. According to the W3C recommendation, @xml: lang content shall be compliant with
BCP 47. There is no need for a specific implementation of the /language coding/ data category or
the /script coding/ data category in order to ensure compliance of this document with ISO 24613-1.
LBX does allow the inclusion of these data categories in the element in order
to support the validation of equivalent metadata found in the elements
2
© ISO 2022 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24613-5:2023
ISO 24613-5:2022(E)
of one or more lexicons (see 5.4). When included, the /script coding/ shall use the codes from
ISO 15924. The /character encoding/ data category is implemented in the XML declaration of an
LBX conformant document using the @encoding attribute. For instance, an XML-LBX document
encoded as UTF-8 according to the Unicode standard shall begin with the following declaration:
A non-exclusive list of sub-elements, simple types indexed by value, follows:
— “ISO 639-3”, a simple type enumerating the set of language codes used across all lexicons;
— “ISO 15924”, a simple type enumerating the set of scripts used across all lexicons;
— GlobalNotationType, a simple type enumerating the set of notations used across all lexicons;
— GlobalPartOfSpeechType, a simple type enumerating the set of values used across
all lexicons;
— SubjectFieldType, a simple type enumerating the set of values used across all
lexicons.
Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).
5.3 Implementing the Lexicon class
The Lexicon class shall be implemented in LBX by means of the element (see Table 3),
which is a direct child of the element when is used. If the
element is not used, becomes the root element. In cases where a lexical
resource contains a large number of lexicons or several very large lexicons, the lexicon (XML document)
can reference a virtual lexical resource using a @lexicalResourceID in the element (see
5.1). In the case of a virtual lexical resource, where the element is not part of the
same XML document as the element, the lexicon can use an include statement to reference
a relevant element. Other information within the element should be
qualified through the following child element(s) and attributes as direct children of the
element or, optimally, as children of the element (see 5.4):
— , the title of the lexicon;</br>
— @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should</br>
be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID</br>
when there is a design intent to make the entry accessible on the web;</br>
— @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best</br>
practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of</br>
@lexicalResourceID when there is a design intent to make the entry accessible on the web;</br>
— @lexiconType, of datatype "xs: string"; the type of lexicon, e.g. bilingual dictionary, monolingual</br>
dictionary;</br>
— @sourceLanguage, of datatype "xs: string"; the language of the <Lemma> element or its inflected</br>
forms;</br>
— @targetLanguage, of datatype "xs: string"; the language the lemma is translated to, principally</br>
represented in the <Translation> element.</br>
Table 3 — Lexicon class</br>
LMF class LBX construct</br>
/Lexicon/ <Lexicon></br>
3</br>
© ISO 2022 – All rights reserved</br>
</br>
---------------------- Page: 11 ----------------------</br>
SIST ISO 24613-5:2023</br>
ISO 24613-5:2022(E)</br>
5.4 Implementing the LexiconInformation class</br>
The LexiconInformation class shall be implemented in LBX by means of the <LexiconInformation></br>
element (see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include></br>
element or as a direct child of the <Entry> element. <LexiconInformation> allows the encoding of a</br>
variety of administrative, technical, documentary, and bibliographic information attached to the</br>
corresponding lexical entry.</br>
Table 4 — LexiconInformation class</br>
LMF class LBX construct</br>
/LexiconInformation/ <LexiconInformation></br>
When not included in the <Lexicon> element, information qualifying the lexicon shall be included as</br>
elements and attributes in the <LexiconInformation> element. These include (see 5.3):</br>
— <Title>;</br>
— @lexiconID;</br>
— @lexicalResourceID;</br>
— @lexiconType;</br>
— @sourceLanguage;</br>
— @targetLanguage.</br>
The <LexiconInformation> can also include elements and data categories that further qualify</br>
information in the lexicon and can be used to support the validation of the XML document (lexicon).</br>
These elements and data categories should also be included in the global set of elements and data</br>
categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding</br>
values in <GlobalInformation> and <LexiconInformation> should be part of the validation process.</br>
A non-exclusive list of these sub-elements, simple types indexed by value, follows:</br>
— NotationType, a simple type enumerating the set of notations used in a lexicon;</br>
— PartOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon;</br>
— SubjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon.</br>
NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon</br>
information for a subset of lexicons grouped by language by referencing a named language data schema (e.g.</br>
ArabicLanguageData.xsd) (see Clause B.1).</br>
5.5 Implementing the LexicalEntry class</br>
The LexicalEntry class shall be implemented in LBX by means of the <Entry> element (see Table 5).</br>
Lexical information inside <Entry> elements should be encoded through the following child elements:</br>
— <GramFeats> for grammatical information related to the whole entry;</br>
— <Form> for containing the text literal and attributes qualifying the text literal (the Form class is</br>
serialized through subclasses in LBX);</br>
— <Etymology> for etymological aspects;</br>
— <Sense> for semantic information;</br>
— <Xref> for referencing internal or external elements.</br>
4</br>
© ISO 2022 – All rights reserved</br>
</br>
---------------------- Page: 12 ----------------------</br>
SIST ISO 24613-5:2023</br>
ISO 24613-5:2022(E)</br>
Attributes used for the <LexicalEntry> element can include:</br>
— @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be</br>
a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when</br>
there is a design intent to make the entry accessible on the web;</br>
— @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the</br>
id should be a URI and be unique within a language resource; @xml:ID can be used in place of</br>
@entryID when there is a design intent to make the lexicon accessible on the web;</br>
— @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection</br>
when there is more than one lexicon.</br>
Table 5 — LexicalEntry class</br>
LMF class LBX construct</br>
/LexicalEntry/ <Entry></br>
The following example in French illustrates the encoding of a simple dictionary entry with two senses.</br>
EXAMPLE</br>
<Entry xml:lang="fr"></br>
<Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7);</br>
anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology></br>
<Lemma></br>
<GramFeats></br>
<POS>noun</POS></br>
<Gender>fem</Gender></br>
</GramFeats></br>
<FormRep xml:lang="fr" notation="French">langouste</FormRep></br>
<FormRep xml:lang="fr" notation="IPA">lägust</FormRep></br>
</Lemma></br>
</br>
<Sense senseNR="1"></br>
<Def></br>
<DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes</br>
antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est</br>
très appréciée.</DefRep></br>
</Def></br>
</Sense></br>
</br>
<Sense senseNR="2"></br>
<Note type="socioCultural">Fig. et fam. (vulg.).</Note></br>
<Def></br>
<DefRep xml:lang="fr">Femme, maîtresse</DefRep></br>
</Def></br>
</Sense></br>
</Entry></br>
</br>
NOTE 1 The style in the above example is appropriate for use in a lexical resource that contains a collection</br>
of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler style can</br>
be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> can be used in place</br>
of the equivalent <FormRep> elements and the <Def> element can directly contain the text content rather than</br>
employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of simplification</br>
using the <Orth> and <Pron> elements.</br>
NOTE 2 The @notation value “French” is short for “Canonical French”.</br>
5.6 Implementing the OrthographicRepresentation class</br>
Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition</br>
classes. Orthographic representations shall be implemented in LBX by means of elements corresponding</br>
to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (machine-readable</br>
dictionary (MRD) model), or possible new OrthographicRepresentation subclasses derived through the</br>
5</br>
© ISO 2022 – All rights reserved</br>
</br>
---------------------- Page: 13 ------------</br>
<b>...</b>
INTERNATIONAL ISO
STANDARD 24613-5
First edition
2022-01
Language resource management —
Lexical markup framework (LMF) —
Part 5:
Lexical base exchange (LBX)
serialization
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 5: Sérialisation de l’échange de bases lexicales (LBX)
Reference number
ISO 24613-5:2022(E)
© ISO 2022
---------------------- Page: 1 ----------------------
ISO 24613-5:2022(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
© ISO 2022 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24613-5:2022(E)
Contents Page
Foreword .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General requirements . 1
5 Serialization of the LMF core model (ISO 24613-1) . 2
5.1 Implementing the LexicalResource class . 2
5.2 Implementing the GlobalInformation class. 2
5.3 Implementing the Lexicon class . 3
5.4 Implementing the LexiconInformation class . 4
5.5 Implementing the LexicalEntry class . 4
5.6 Implementing the OrthographicRepresentation class . 5
5.7 Implementing the Form class . 6
5.7.1 Form class . 6
5.7.2 Lemma class . 6
5.8 Implementing the GrammaticalInformation class . 6
5.9 Implementing the Sense class . 7
5.10 Implementing the Definition class . 8
5.11 Implementing the CrossREF class . 8
6 Serialization of the MRD extension (ISO 24613-2) .10
6.1 Implementing OrthographicRepresentation subclasses . 10
6.2 Implementing the FormRepresentation class . 10
6.3 Implementing the Form subclasses. 11
6.3.1 General principles . 11
6.3.2 Implementing the WordForm class. 11
6.3.3 Implementing the Stem class . 11
6.3.4 Implementing the WordPart class . 11
6.3.5 Implementing the RelatedForm class .12
6.3.6 Implementing the TextRepresentation class .13
6.3.7 Implementing the Translation class . 14
6.3.8 Implementing the Example class . 14
6.4 Implementing the SubjectField class . 15
6.5 Implementing the Bibliography class . 15
7 Implementing the CrossREF mechanism to refer to external media files.15
8 Implementing the classes from the etymological extension (ISO 24613-3) .15
8.1 Implementing the Etymology class . 15
8.2 Implementing the Etymon class . 16
8.2.1 General . 16
8.2.2 Referencing forms in an etymon . 16
8.2.3 Representing the meaning of an etymon . 16
8.2.4 Representing the language of an etymon . 16
8.2.5 Dating an etymon . 17
8.2.6 Providing sources associated with an etymon . 17
8.3 Implementing the EtyLink class . 17
8.4 Implementing the CognateSet class . 17
8.5 Implementing the Cognate class . 17
9 Additional mechanisms .18
9.1 Overview . 18
9.2 XML feature structure implementation . 18
9.3 Representing various labels with . 18
9.4 Providing rendering information with the @rend attribute . 18
iii
© ISO 2022 – All rights reserved
---------------------- Page: 3 ----------------------
ISO 24613-5:2022(E)
Annex A (informative) LBX data category selection .19
Annex B (informative) LBX feature structure implementation .24
Annex C (informative) LBX examples for applying LBX serialization .27
Bibliography .32
iv
© ISO 2022 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24613-5:2022(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-5, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-3:2021
and ISO 24613-4:2021, cancels and replaces ISO 24613:2008, which has been technically revised.
The main change compared to the previous edition is as follows:
— entire revision of the content and its subdivisions into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
© ISO 2022 – All rights reserved
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24613-5:2022(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope
This document describes the serialization of the lexical markup framework (LMF) model defined as
an extensible markup language (XML) model derived from the language base exchange (LBX) schema
and compliant with the W3C XML schema. This serialization covers the classes, data categories, and
mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model),
and ISO 24613-3 (etymological extension).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological
extension
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), September 2009. Best Current
Practice. Available from: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Second Edition). W3C Recommendation 16 August
2006, edited in place 29 September 2006. Available from: https:// www .w3 .org/ TR/ 2006/ REC -xml11
-20060816/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and ISO 24613-3
apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
4 General requirements
This document aims at providing constructs for each LMF class from ISO 24613-1 (core model),
ISO 24613-2 (MRD extension), and ISO 24613-3 (etymological extension). It requires compliance
1
© ISO 2022 – All rights reserved
---------------------- Page: 6 ----------------------
ISO 24613-5:2022(E)
with ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in
the respective parts, and compliance with the W3C XML Schema 1.1 for representing structured
information in XML. LBX extends the original models by means of data category selections and precise
value lists, the creation of new subclasses and the definition of new constraints. In addition, this
document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.
The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this
document does not elaborate on the metadata aspects from LMF, since the LBX schema is by essence
much richer for the representation of all the aspects related to the creation, content, versioning and
database implementation of lexical content at large. Occasionally, slightly equivalent constructs to
explicit requirements from the LMF standard are mentioned.
The XML examples in this document are simplified by omitting namespaces. Except where otherwise
stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within
the scope of the following XML namespace declaration:
xmlns="http:// www .LexicalBaseExchange .org/ 2021/ schema"
Besides, datatypes in this document are defined in compliance to the XML Schema Part 2
recommendation. The “xs:” prefix corresponds to the following namespace:
ht t p:// w w w .w3 . or g / 20 01/ X M L S c hem a
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class
The LexicalResource class shall be implemented in LBX by means of the element
(see Table 1), which groups together one to many lexicons in a single collection. This level may be
omitted in cases where the lexical resource contains only one lexicon so that the resource starts
directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or
several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a
@lexicalResourceID in the element and optionally the element (see 5.5).
Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class
The GlobalInformation class shall be implemented in LBX by means of the element
(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or
as a direct child of a element. allows the encoding of a variety
of administrative, technical, documentary, and bibliographic information attached to the corresponding
lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/
Since the LBX serialization is based on the W3C recommendation for XML, it implements the
@xml: lang attribute to indicate language information corresponding to the content of specific
elements. According to the W3C recommendation, @xml: lang content shall be compliant with
BCP 47. There is no need for a specific implementation of the /language coding/ data category or
the /script coding/ data category in order to ensure compliance of this document with ISO 24613-1.
LBX does allow the inclusion of these data categories in the element in order
to support the validation of equivalent metadata found in the elements
2
© ISO 2022 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24613-5:2022(E)
of one or more lexicons (see 5.4). When included, the /script coding/ shall use the codes from
ISO 15924. The /character encoding/ data category is implemented in the XML declaration of an
LBX conformant document using the @encoding attribute. For instance, an XML-LBX document
encoded as UTF-8 according to the Unicode standard shall begin with the following declaration:
A non-exclusive list of sub-elements, simple types indexed by value, follows:
— “ISO 639-3”, a simple type enumerating the set of language codes used across all lexicons;
— “ISO 15924”, a simple type enumerating the set of scripts used across all lexicons;
— GlobalNotationType, a simple type enumerating the set of notations used across all lexicons;
— GlobalPartOfSpeechType, a simple type enumerating the set of values used across
all lexicons;
— SubjectFieldType, a simple type enumerating the set of values used across all
lexicons.
Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).
5.3 Implementing the Lexicon class
The Lexicon class shall be implemented in LBX by means of the element (see Table 3),
which is a direct child of the element when is used. If the
element is not used, becomes the root element. In cases where a lexical
resource contains a large number of lexicons or several very large lexicons, the lexicon (XML document)
can reference a virtual lexical resource using a @lexicalResourceID in the element (see
5.1). In the case of a virtual lexical resource, where the element is not part of the
same XML document as the element, the lexicon can use an include statement to reference
a relevant element. Other information within the element should be
qualified through the following child element(s) and attributes as direct children of the
element or, optimally, as children of the element (see 5.4):
— , the title of the lexicon;</br>
— @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should</br>
be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID</br>
when there is a design intent to make the entry accessible on the web;</br>
— @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best</br>
practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of</br>
@lexicalResourceID when there is a design intent to make the entry accessible on the web;</br>
— @lexiconType, of datatype "xs: string"; the type of lexicon, e.g. bilingual dictionary, monolingual</br>
dictionary;</br>
— @sourceLanguage, of datatype "xs: string"; the language of the <Lemma> element or its inflected</br>
forms;</br>
— @targetLanguage, of datatype "xs: string"; the language the lemma is translated to, principally</br>
represented in the <Translation> element.</br>
Table 3 — Lexicon class</br>
LMF class LBX construct</br>
/Lexicon/ <Lexicon></br>
3</br>
© ISO 2022 – All rights reserved</br>
</br>
---------------------- Page: 8 ----------------------</br>
ISO 24613-5:2022(E)</br>
5.4 Implementing the LexiconInformation class</br>
The LexiconInformation class shall be implemented in LBX by means of the <LexiconInformation></br>
element (see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include></br>
element or as a direct child of the <Entry> element. <LexiconInformation> allows the encoding of a</br>
variety of administrative, technical, documentary, and bibliographic information attached to the</br>
corresponding lexical entry.</br>
Table 4 — LexiconInformation class</br>
LMF class LBX construct</br>
/LexiconInformation/ <LexiconInformation></br>
When not included in the <Lexicon> element, information qualifying the lexicon shall be included as</br>
elements and attributes in the <LexiconInformation> element. These include (see 5.3):</br>
— <Title>;</br>
— @lexiconID;</br>
— @lexicalResourceID;</br>
— @lexiconType;</br>
— @sourceLanguage;</br>
— @targetLanguage.</br>
The <LexiconInformation> can also include elements and data categories that further qualify</br>
information in the lexicon and can be used to support the validation of the XML document (lexicon).</br>
These elements and data categories should also be included in the global set of elements and data</br>
categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding</br>
values in <GlobalInformation> and <LexiconInformation> should be part of the validation process.</br>
A non-exclusive list of these sub-elements, simple types indexed by value, follows:</br>
— NotationType, a simple type enumerating the set of notations used in a lexicon;</br>
— PartOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon;</br>
— SubjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon.</br>
NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon</br>
information for a subset of lexicons grouped by language by referencing a named language data schema (e.g.</br>
ArabicLanguageData.xsd) (see Clause B.1).</br>
5.5 Implementing the LexicalEntry class</br>
The LexicalEntry class shall be implemented in LBX by means of the <Entry> element (see Table 5).</br>
Lexical information inside <Entry> elements should be encoded through the following child elements:</br>
— <GramFeats> for grammatical information related to the whole entry;</br>
— <Form> for containing the text literal and attributes qualifying the text literal (the Form class is</br>
serialized through subclasses in LBX);</br>
— <Etymology> for etymological aspects;</br>
— <Sense> for semantic information;</br>
— <Xref> for referencing internal or external elements.</br>
4</br>
© ISO 2022 – All rights reserved</br>
</br>
---------------------- Page: 9 ----------------------</br>
ISO 24613-5:2022(E)</br>
Attributes used for the <LexicalEntry> element can include:</br>
— @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be</br>
a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when</br>
there is a design intent to make the entry accessible on the web;</br>
— @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the</br>
id should be a URI and be unique within a language resource; @xml:ID can be used in place of</br>
@entryID when there is a design intent to make the lexicon accessible on the web;</br>
— @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection</br>
when there is more than one lexicon.</br>
Table 5 — LexicalEntry class</br>
LMF class LBX construct</br>
/LexicalEntry/ <Entry></br>
The following example in French illustrates the encoding of a simple dictionary entry with two senses.</br>
EXAMPLE</br>
<Entry xml:lang="fr"></br>
<Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7);</br>
anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology></br>
<Lemma></br>
<GramFeats></br>
<POS>noun</POS></br>
<Gender>fem</Gender></br>
</GramFeats></br>
<FormRep xml:lang="fr" notation="French">langouste</FormRep></br>
<FormRep xml:lang="fr" notation="IPA">lägust</FormRep></br>
</Lemma></br>
</br>
<Sense senseNR="1"></br>
<Def></br>
<DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes</br>
antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est</br>
très appréciée.</DefRep></br>
</Def></br>
</Sense></br>
</br>
<Sense senseNR="2"></br>
<Note type="socioCultural">Fig. et fam. (vulg.).</Note></br>
<Def></br>
<DefRep xml:lang="fr">Femme, maîtresse</DefRep></br>
</Def></br>
</Sense></br>
</Entry></br>
</br>
NOTE 1 The style in the above example is appropriate for use in a lexical resource that contains a collection</br>
of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler style can</br>
be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> can be used in place</br>
of the equivalent <FormRep> elements and the <Def> element can directly contain the text content rather than</br>
employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of simplification</br>
using the <Orth> and <Pron> elements.</br>
NOTE 2 The @notation value “French” is short for “Canonical French”.</br>
5.6 Implementing the OrthographicRepresentation class</br>
Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition</br>
classes. Orthographic representations shall be implemented in LBX by means of elements corresponding</br>
to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (machine-readable</br>
dictionary (MRD) model), or possible new OrthographicRepresentation subclasses derived through the</br>
5</br>
© ISO 2022 – All rights reserved</br>
</br>
---------------------- Page: 10 ----------------------</br>
ISO 24613-5:2022(E)</br>
principles for LMF extensions described in ISO 24613-1 (core model). ISO 24613-1:2019, 5.6.1, describes</br>
some of the representation types that can serve as a basis for extending the OrthographicRepresentation</br>
class. ISO 24613-4:2021 (TEI extension), 6.1, lists a number of representation elements that are valid for</br>
use with the Form class. Elements implemented in this part are described in 5.7.2, 5.10, and successive</br>
subclauses from 6.3.2 to 6.3.8.</br>
5.7 Implementing the Form class</br>
5.7.1 Form class</br>
The Form class shall be implemented in LBX by elements that instantiate Form subclasses (see Table 6,</br>
6.2 and 6.3).</br>
Table 6 — Form class</br>
LMF class LBX construct</br>
/Form/ <Form></br>
5.7.2 Lemma class</br>
The Lemma class, a subclass of the Form class, shall be implemented in LBX by means of the <Lemma></br>
element (see Table 7).</br>
Table 7 — Lemma class</br>
LMF class LBX construct</br>
/Lemma/ <Lemma></br>
Orthographic representations in the <Lemma> element shall be implemented in LBX by means of the</br>
<FormRep> element, or by elements that instantiate Form subclasses, including <Orth> and <Pron>.</br>
NOTE 1 The <FormRep>, <Orth>, and <Pron> elements are introduced in 6.2.</br>
NOTE 2 <Orth> and <Pron> can be allowed when justified by design goals.</br>
5.8 Implementing the GrammaticalInformation class</br>
The GrammaticalInformation class groups grammatical features associated with the Lexi</br>
<b>...</b>
NORME ISO
INTERNATIONALE 24613-5
Première édition
2022-01
Gestion des ressources
linguistiques — Cadre de balisage
lexical (LMF) —
Partie 5:
Sérialisation de l’échange de bases
lexicales (LBX)
Language resource management — Lexical markup framework
(LMF) —
Part 5: Lexical base exchange (LBX) serialization
Numéro de référence
ISO 24613-5:2022(F)
© ISO 2022
---------------------- Page: 1 ----------------------
ISO 24613-5:2022(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2022
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii
© ISO 2022 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO 24613-5:2022(F)
Sommaire Page
Avant-propos .v
1 Domaine d’application . 1
2 Références normatives .1
3 Termes et définitions . 1
4 Exigences générales . .2
5 Sérialisation du modèle de base LMF (ISO 24613-1) . 2
5.1 Implémentation de la classe LexicalResource . 2
5.2 Implémentation de la classe GlobalInformation . 2
5.3 Implémentation de la classe Lexicon . 3
5.4 Implémentation de la classe LexiconInformation . 4
5.5 Implémentation de la classe LexicalEntry . 5
5.6 Implémentation de la classe OrthographicRepresentation . 6
5.7 Implémentation de la classe Form . 6
5.7.1 Classe Form . 6
5.7.2 Classe Lemma . . 6
5.8 Implémentation de la classe GrammaticalInformation . 7
5.9 Implémentation de la classe Sense . 8
5.10 Implémentation de la classe Definition . 8
5.11 Implémentation de la classe CrossREF . 9
6 Sérialisation de l’extension MRD (ISO 24613-2) .10
6.1 Implémentation des sous-classes OrthographicRepresentation . 10
6.2 Implémentation de la classe FormRepresentation . 10
6.3 Implémentation des sous-classes Form . 11
6.3.1 Principes généraux . 11
6.3.2 Implémentation de la classe WordForm . 11
6.3.3 Implémentation de la classe Stem .12
6.3.4 Implémentation de la classe WordPart .12
6.3.5 Implémentation de la classe RelatedForm . 13
6.3.6 Implémentation de la classe TextRepresentation . 14
6.3.7 Implémentation de la classe Translation . 15
6.3.8 Implémentation de la classe Example. 15
6.4 Implémentation de la classe SubjectField . 16
6.5 Implémentation de la classe Bibliography . 16
7 Implémentation du mécanisme CrossREF de renvoi à des fichiers multimédia
externes .16
8 Implémentation des classes à partir de l’extension étymologique (ISO 24613-3) .16
8.1 Implémentation de la classe Etymology . 16
8.2 Implémentation de la classe Etymon . 17
8.2.1 General . 17
8.2.2 Référencement des formes dans un étymon . 17
8.2.3 Représentation de la signification d’un étymon . 17
8.2.4 Représentation de la langue d’un étymon . 17
8.2.5 Datation d’un étymon . 17
8.2.6 Citation de sources associées à un étymon . 18
8.3 Implémentation de la classe EtyLink . 18
8.4 Implémentation de la classe CognateSet . 18
8.5 Implémentation de la classe Cognate . 18
9 Mécanismes supplémentaires .19
9.1 Vue d’ensemble . 19
9.2 Implémentation d’une structure de traits XML . 19
9.3 Représentation de diverses étiquettes avec . 19
iii
© ISO 2022 – Tous droits réservés
---------------------- Page: 3 ----------------------
ISO 24613-5:2022(F)
9.4 Transmission d’informations de rendu avec l’attribut @rend . 19
Annexe A (informative) Sélection des catégories de données LBX.20
Annexe B (informative) Implémentation d’une structure de traits en LBX .25
Annexe C (informative) Exemples d’application de la sérialisation LBX.28
Bibliographie .33
iv
© ISO 2022 – Tous droits réservés
---------------------- Page: 4 ----------------------
ISO 24613-5:2022(F)
Avant-propos
L'ISO (Organisation internationale de normalisation) est une fédération mondiale d'organismes
nationaux de normalisation (comités membres de l'ISO). L'élaboration des Normes internationales est
en général confiée aux comités techniques de l'ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l'ISO participent également aux travaux.
L'ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier, de prendre note des différents
critères d'approbation requis pour les différents types de documents ISO. Le présent document a
été rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir
www.iso.org/directives).
L'attention est attirée sur le fait que certains des éléments du présent document peuvent faire l'objet de
droits de propriété intellectuelle ou de droits analogues. L'ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l'élaboration du document sont indiqués dans l'Introduction et/ou dans la liste des déclarations de
brevets reçues par l'ISO (voir www.iso.org/brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l'ISO liés à l'évaluation de la conformité, ou pour toute information au sujet de l'adhésion
de l'ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir www.iso.org/avant-propos.
Le présent document a été élaboré par le comité technique ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Cette première édition de l’ISO 24613-5, utilisée conjointement avec l’ISO 24613-1:2019,
l’ISO 24613-2:2020, l’ISO 24613-3:2021 et l’ISO 24613-4:2021, annule et remplace l’ISO 24613:2008, qui
a fait l’objet d’une révision technique.
La principale modification par rapport à l’édition précédente est la suivante:
— révision complète du contenu et de ses subdivisions en plusieurs parties.
Une liste de toutes les parties de la série ISO 24613 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www.iso.org/fr/members.html.
v
© ISO 2022 – Tous droits réservés
---------------------- Page: 5 ----------------------
NORME INTERNATIONALE ISO 24613-5:2022(F)
Gestion des ressources linguistiques — Cadre de balisage
lexical (LMF) —
Partie 5:
Sérialisation de l’échange de bases lexicales (LBX)
1 Domaine d’application
Le présent document décrit la sérialisation du modèle de cadre de balisage lexical (LMF) défini en tant
que modèle de langage de balisage extensible (XML) issu du schéma d’échange de bases lexicales (LBX)
et conforme au schéma W3C XML. Cette sérialisation couvre les classes, les catégories de données et
les mécanismes de l’ISO 24613-1 (modèle de base), de l’ISO 24613-2 (modèle de dictionnaire lisible par
ordinateur (MRD)) et de l’ISO 24613-3 (extension étymologique).
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s'applique (y compris les
éventuels amendements).
ISO 15924, Information et documentation — Codes pour la représentation des noms d’écritures
ISO 24613-1, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 1: Modèle de
base
ISO 24613-2, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 2: Modèle de
dictionnaire lisible par ordinateur (MRD)
ISO 24613-3, Gestion des ressources linguistiques — Cadre de balisage lexical (LMF) — Partie 3: Extension
étymologique
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), septembre 2009. Best Current
Practice. Disponible à l’adresse: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Seconde édition). W3C Recommendation, 16 août
2006, rééditée le 29 septembre 2006. Disponible à l’adresse: https:// www .w3 .org/ TR/ 2006/
REC-xml11-20060816/
3 Termes et définitions
Pour les besoins du présent document, les termes et les définitions de l’ISO 24613-1 et l’ISO 24613-3
s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp
— IEC Electropedia: disponible à l’adresse https:// www .electropedia .org/
1
© ISO 2022 – Tous droits réservés
---------------------- Page: 6 ----------------------
ISO 24613-5:2022(F)
4 Exigences générales
Le présent document est destiné à fournir des constructions pour chaque classe LMF issue de
l’ISO 24613-1 (modèle de base), de l’ISO 24613-2 (extension MRD) et de l’ISO 24613-3 (extension
étymologique). Il nécessite de se conformer aux ISO 24613-1, ISO 24613-2 et ISO 24613-3 lors de
l’implémentation des catégories de données citées dans les parties respectives, ainsi qu’au Schéma W3C
XML 1.1 pour représenter des informations structurées en XML. LBX étend les modèles d’origine par
des sélections de catégories de données et des listes de valeurs précises, la création de nouvelles sous-
classes et la définition de nouvelles contraintes. De plus, le présent document respecte les cardinalités
exprimées dans l’ISO 24613-1, l’ISO 24613-2 et l’ISO 24613-3. La sérialisation LBX offre un niveau
de détail plus riche que LMF, afin d’atteindre des objectifs de conception spécifiques. Cependant, le
présent document ne développe pas les aspects de LMF liés aux métadonnées car le schéma LBX est par
essence nettement plus riche pour la représentation de tous les aspects liés à la création, au contenu, au
versionnage et à l’implémentation de bases de données d’un contenu lexical dans son ensemble. Dans
certains cas, des constructions légèrement équivalentes sont mentionnées pour expliciter les exigences
par rapport au LMF normalisé.
Les exemples XML du présent document sont simplifiés en omettant les espaces de nommage.
Sauf spécification contraire, il est présumé que les éléments XML appartiennent à l’espace de nommage
LBX et que les exemples entrent dans le domaine d’application de la déclaration d’espace de nommage
XML suivante:
x m l n s =" ht t p:// w w w . L e x ic a l B a s eE xc h a n g e . or g / 2021/ s c hem a"
De plus, les datatypes spécifiés dans le présent document sont définis conformément à la
recommandation de la Partie 2 concernant le schéma XML. Le préfixe «xs:» correspond à l’espace de
nommage suivante:
ht t p:// w w w .w3 . or g / 20 01/ X M L S c hem a
5 Sérialisation du modèle de base LMF (ISO 24613-1)
5.1 Implémentation de la classe LexicalResource
La classe LexicalResource doit être implémentée en LBX au moyen de l’élément
(voir Tableau 1), qui regroupe un à plusieurs lexiques dans une seule collection. Ce niveau peut être
omis dans les cas où la ressource lexicale ne contient qu’un seul lexique, de sorte que la ressource
débute directement par le niveau lexique. Dans les cas où une ressource lexicale contient un grand
nombre de lexiques ou plusieurs lexiques volumineux, le lexique (document XML) peut faire référence à
une ressource lexicale virtuelle en utilisant un attribut @lexicalResourceID dans l’élément ,
et l’élément facultatif (voir 5.5).
Tableau 1 — Classe LexicalResource
Classe LMF Construction LBX
/LexicalResource/
5.2 Implémentation de la classe GlobalInformation
La classe GlobalInformation doit être implémentée en LBX au moyen de l’élément
(voir Tableau 2), soit en faisant référence à un schéma GlobalInformation.xsd en utilisant un élément
, soit en tant qu’enfant direct d’un élément .
permet de coder une diversité d’informations administratives, techniques, documentaires et
bibliographiques associées à la ressource lexicale correspondante.
2
© ISO 2022 – Tous droits réservés
---------------------- Page: 7 ----------------------
ISO 24613-5:2022(F)
Tableau 2 — Classe GlobalInformation
Classe LMF Construction LBX
/GlobalInformation/
Comme la sérialisation LBX repose sur la recommandation W3C pour XML, elle implémente l’attribut
@xml: lang pour indiquer les informations linguistiques correspondant au contenu d’éléments
spécifiques. D’après la recommandation W3C, le contenu @xml: lang doit être conforme au BCP 47. Il n’est
pas nécessaire d’effectuer une implémentation spécifique de la catégorie de données /language coding/
ou /script coding/ afin de garantir la conformité du présent document à l’ISO 24613-1. LBX permet
d’inclure ces catégories de données dans l’élément afin de faciliter la validation
de métadonnées équivalentes trouvées dans les éléments d’un ou plusieurs
lexiques (voir 5.4). Une fois incluse, la catégorie /script coding/ doit utiliser les codes de l’ISO 15924.
La catégorie de données /character encoding/ est implémentée dans la déclaration XML d’un document
conforme au LBX, en utilisant l’attribut @encoding. Par exemple, un document XML-LBX codé en UTF-8
conformément à la norme Unicode doit débuter par la déclaration suivante:
.
La liste non exhaustive suivante concerne des sous-éléments de types simples
indexés par valeurs:
— «ISO 639-3», un type simple énumérant l’ensemble des codes de langue utilisés dans tous les lexiques;
— «ISO 15924», un type simple énumérant l’ensemble des scripts utilisés dans tous les lexiques;
— GlobalNotationType, un type simple énumérant l’ensemble des notations utilisées dans tous les
lexiques;
— GlobalPartOfSpeechType, un type simple énumérant l’ensemble des valeurs
utilisées dans tous les lexiques;
— SubjectFieldType, un type simple énumérant l’ensemble des valeurs utilisées dans
tous les lexiques.
Des exemples peuvent être consultés dans le schéma de référence LBX, document GlobalInformation
(voir Annexe B).
5.3 Implémentation de la classe Lexicon
La classe Lexicon doit être implémentée en LBX au moyen de l’élément (voir Tableau 3),
qui est un enfant direct de l’élément lorsque ce dernier est utilisé. Si l’élément
n’est pas utilisé, devient l’élément racine. Dans les cas où une ressource
lexicale contient un grand nombre de lexiques ou plusieurs lexiques volumineux, le lexique (document
XML) peut faire référence à une ressource lexicale virtuelle en utilisant un attribut @lexicalResourceID
dans l’élément (voir 5.1). Dans le cas d’une ressource lexicale virtuelle, où l’élément
ne fait pas partie du même document XML que l’élément , le lexique peut
utiliser une instruction include pour faire référence à un élément pertinent. Il
convient de qualifier les autres informations dans l’élément en utilisant les éléments enfants
et les attributs suivants en tant qu’enfants directs de l’élément ou, idéalement, en tant
qu’enfants de l’élément (voir 5.4):
— , le titre du lexique;</br>
— @lexiconID de datatype «xs:ID» en tant qu’identifiant unique pour le lexique; à titre de meilleure</br>
pratique, il convient que l’ID soit un URI et qu’il soit unique au sein d’une ressource linguistique;</br>
@xml:ID peut être utilisé à la place de @lexiconID lorsque la conception cherche à rendre l’entrée</br>
accessible par le web;</br>
— @lexicalResourceID de datatype «xs:ID» en tant qu’identifiant unique pour la ressource lexicale;</br>
à titre de meilleure pratique, il convient que l’ID soit un URI pour le domaine d’application global;</br>
3</br>
© ISO 2022 – Tous droits réservés</br>
</br>
---------------------- Page: 8 ----------------------</br>
ISO 24613-5:2022(F)</br>
de plus, @xml:ID peut être utilisé à la place de @lexicalResourceID lorsque la conception cherche à</br>
rendre l’entrée accessible par le web;</br>
— @lexiconType de datatype «xs: string»; le type de lexique, par exemple dictionnaire bilingue ou</br>
monolingue;</br>
— @sourceLanguage de datatype «xs: string»; la langue de l’élément <Lemma> ou de ses formes</br>
fléchies;</br>
— @targetLanguage de datatype «xs: string»; la langue dans laquelle le lemme est traduit,</br>
principalement représentée dans l’élément <Translation>.</br>
Tableau 3 — Classe Lexicon</br>
Classe LMF Construction LBX</br>
/Lexicon/ <Lexicon></br>
5.4 Implémentation de la classe LexiconInformation</br>
La classe LexiconInformation doit être implémentée en LBX au moyen de l’élément <LexiconInformation></br>
(voir Tableau 4), soit en faisant référence à un schéma LexiconInformation.xsd en utilisant un élément</br>
<xsd: include>, soit en tant qu’enfant direct de l’élément <Entry>. <LexiconInformation> permet de coder</br>
une diversité d’informations administratives, techniques, documentaires et bibliographiques associées</br>
à l’entrée lexicale correspondante.</br>
Tableau 4 — Classe LexiconInformation</br>
Classe LMF Construction LBX</br>
/LexiconInformation/ <LexiconInformation></br>
Lorsque les informations qualifiant le lexique ne sont pas incluses dans l’élément <Lexicon>,</br>
ells doivent être intégrées sous forme d’éléments et d’attributs dans l’élément <LexiconInformation>.</br>
Ces informations comprennent (voir 5.3):</br>
— <Title>;</br>
— @lexiconID;</br>
— @lexicalResourceID;</br>
— @lexiconType;</br>
— @sourceLanguage;</br>
— @targetLanguage.</br>
La classe <LexiconInformation> peut également inclure des éléments et des catégories de données</br>
qui peuvent améliorer la qualification des informations dans le lexique et peuvent être utilisés pour</br>
faciliter la validation du document XML (lexique). Il convient également d’intégrer ces éléments et</br>
catégories de données dans l’ensemble global d’éléments et de catégories de données trouvés dans</br>
l’élément <GlobalInformation> (voir 5.2) et il est recommandé d’inclure dans le processus de validation</br>
une comparaison des valeurs correspondantes dans <GlobalInformation> et <LexiconInformation>.</br>
La liste non exhaustive suivante concerne ces sous-éléments de types simples indexés par valeurs:</br>
— NotationType, un type simple énumérant l’ensemble des notations utilisées dans un lexique;</br>
— PartOfSpeechType, un type simple énumérant l’ensemble des valeurs <partOfSpeech> utilisées</br>
dans un lexique;</br>
4</br>
© ISO 2022 – Tous droits réservés</br>
</br>
---------------------- Page: 9 ----------------------</br>
ISO 24613-5:2022(F)</br>
— SubjectFieldType, un type simple énumérant l’ensemble des valeurs <SubjectField> utilisées dans</br>
un lexique.</br>
NOTE En plus de la construction <LexiconInformation>, LBX permet de concaténer les informations d’un</br>
sous-ensemble de lexiques regroupés par langue, en faisant référence à un schéma de données linguistiques</br>
nommé (par exemple ArabicLanguageData.xsd) (voir paragraphe B.1).</br>
5.5 Implémentation de la classe LexicalEntry</br>
La classe LexicalEntry doit être implémentée en LBX au moyen de l’élément <Entry> (voir Tableau 5).</br>
Il est recommandé de coder les informations lexicales à l’intérieur de l’élément <Entry> en utilisant les</br>
éléments enfants suivants:</br>
— <GramFeats> pour les informations grammaticales associées à l’entrée complète;</br>
— <Form> pour contenir le texte littéral et les attributs qualifiant le texte littéral (la classe Form est</br>
sérialisée par des sous-classes en LBX);</br>
— <Etymology> pour les aspects étymologiques;</br>
— <Sense> pour les informations sémantiques;</br>
— <Xref> pour les références aux éléments internes ou externes.</br>
Les attributs utilisés pour l’élément <LexicalEntry> peuvent comprendre:</br>
— @entryID de datatype «xs:ID» en tant qu’identifiant unique pour une entrée; à titre de meilleure</br>
pratique, il convient que l’ID soit un URI et qu’il soit unique au sein d’une ressource linguistique;</br>
@xml:ID peut être utilisée à la place de @entryID lorsque la conception cherche à rendre l’entrée</br>
accessible par le web;</br>
— @lexiconID de datatype «xs:ID» en tant qu’identifiant unique pour le lexique parent; à titre de</br>
meilleure pratique, il convient que l’ID soit un URI et qu’il soit unique au sein d’une ressource</br>
linguistique; @xml:ID peut être utilité à la place de @entryID lorsque la conception cherche à rendre</br>
l’entrée accessible par le web;</br>
— @lexicalResourceID, une référence au @lexicalResourceID de la collection de lexiques associée</br>
lorsqu’il y a plusieurs lexiques.</br>
Tableau 5 — Classe LexicalEntry</br>
Classe LMF Construction LBX</br>
/LexicalEntry/ <Entry></br>
L’exemple suivant en français illustre le codage d’une entrée simple de dictionnaire décrivant deux sens.</br>
EXEMPLE</br>
<Entry xml:lang="fr"></br>
<Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7);</br>
anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology></br>
<Lemma></br>
<GramFeats></br>
<POS>noun</POS></br>
<Gender>fem</Gender></br>
</GramFeats></br>
<FormRep xml:lang="fr" notation="French">langouste</FormRep></br>
<FormRep xml:lang="fr" notation="IPA">lägust</FormRep></br>
</Lemma></br>
</br>
<Sense senseNR="1"></br>
<Def></br>
<DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes</br>
antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est</br>
5</br>
© ISO 2022 – Tous droits réservés</br>
</br>
---------------------- Page: 10 ----------------------</br>
ISO 24613-5:2022(F)</br>
très appréciée.</DefRep></br>
</Def></br>
</Sense></br>
</br>
<Sense senseNR="2"></br>
<Note type="socioCultural">Fig. et fam. (vulg.).</Note></br>
<Def></br>
<DefRep xml:lang="fr">Femme, maîtresse</DefRep></br>
</Def></br>
</Sense></br>
</Entry></br>
</br>
NOTE 1 Le style de l’exemple ci-dessus peut être utilisé dans une ressource lexicale contenant une collection</br>
de lexiques bilingues dans une diversité de langues sources (français, espagnol, russe et chinois, par exemple).</br>
Un style plus simple peut être utilisé pour une collection de lexiques français monolingues. Par exemple,</br>
<Orth> et <Pron> peuvent être utilisés à la place des éléments <FormRep> équivalents et l’élément <Def> peut</br>
directement contenir le texte plutôt que d’utiliser un élément enfant <DefRep> pour gére</br>
<b>...</b>
SLOVENSKI STANDARD
oSIST ISO/DIS 24613-5:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 5. del:
Serializacija leksikalne osnovne izmenjave (LBX)
Language resource management -- Lexical markup framework (LMF) - Part 5: Lexical
base exchange (LBX) serialization
Gestion des ressources linguistiques -- Cadre de balisage lexical (LMF) - Partie 5:
Sérialisation de l’échange de bases lexicales (LBX)
Ta slovenski standard je istoveten z: ISO/DIS 24613-5
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/DIS 24613-5:2021 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
oSIST ISO/DIS 24613-5:2021
---------------------- Page: 2 ----------------------
oSIST ISO/DIS 24613-5:2021
DRAFT INTERNATIONAL STANDARD
ISO/DIS 24613-5
ISO/TC 37/SC 4 Secretariat: KATS
Voting begins on: Voting terminates on:
2020-12-21 2021-03-15
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 24613-5:2020(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
©
PROVIDE SUPPORTING DOCUMENTATION. ISO 2020
---------------------- Page: 3 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
Contents Page
Foreword .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General requirements . 1
5 Serialization of the LMF core model (ISO 24613-1) . 2
5.1 Implementing the LexicalResource class . 2
5.2 Implementing the GlobalInformation class. 2
5.3 Implementing the Lexicon class . 3
5.4 Implementing the LexiconInformation class . 3
5.5 Implementing the LexicalEntry class . 4
5.6 Implementing the OrthographicRepresentation class . 5
5.7 Implementing the Form class . 6
5.7.1 Form class . 6
5.7.2 Lemma class . 6
5.8 Implementing the GrammaticalInformation class . 6
5.9 Implementing the Sense class . 7
5.10 Implementing the Definition class . 7
5.11 Implementing CrossREF class . 8
6 Serialization of the MRD extension (ISO 24613-2) . 9
6.1 Implementing OrthographicRepresentation for MRD . 9
6.2 Implementing Form representations for the Form subclasses . 9
6.3 Classes derived from the Form class .10
6.3.1 General principles .10
6.3.2 Implementing the WordForm class .10
6.3.3 Implementing the Stem class .11
6.3.4 Implementing the WordPart class .11
6.3.5 Implementing the RelatedForm class .12
6.3.6 Implementing the TextRepresentation class .13
6.3.7 Implementing the Translation class .14
6.3.8 Implementing the Example class .14
6.4 Implementing the SubjectField class .14
6.5 Implementing the Bibliography class .15
7 Implementing theCrossREF mechanism to refer to external media files .15
8 Implementing the classes from the etymological extension (ISO 24613-3) .15
8.1 Implementing the Etymology class .15
8.2 Implementing the Etymon class.15
8.2.1 Referencing forms in an etymon .16
8.2.2 Representing the meaning of an etymon .16
8.2.3 Representing the language of an etymon .16
8.2.4 Dating an etymon .16
8.2.5 Providing sources associated with an etymon .16
8.3 Implementing the EtyLink class .16
8.4 Implementing the CognateSet class .17
8.5 Implementing the Cognate class .17
9 Additional mechanisms .18
9.1 Overview .18
9.2 XML feature structure implementation .18
9.3 Representing various labels with .18
9.4 Providing rendering information with the @rend attribute .18
Annex A (informative) LBX data category selection .19
© ISO 2020 – All rights reserved iii
---------------------- Page: 5 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
Annex B (informative) LBX feature structure implementation .23
Annex C (informative) LBX examples for applying LBX serialization .26
Bibliography .31
iv © ISO 2020 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-5, together with ISO 24613-1 to -4, cancels and replaces ISO 24613:2008,
which has been technically revised.
The main changes compared to the previous edition are as follows:
— entire revision of the content and its subdivisions.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
© ISO 2020 – All rights reserved v
---------------------- Page: 7 ----------------------
oSIST ISO/DIS 24613-5:2021
---------------------- Page: 8 ----------------------
oSIST ISO/DIS 24613-5:2021
DRAFT INTERNATIONAL STANDARD ISO/DIS 24613-5:2020(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope
This document describes the serialization of the LMF model defined as an XML model derived from
the LBX schema and compliant with the W3C XML schema. This serialization covers the classes, data
categories, and mechanisms of ISO 24613-1 (Core model) , ISO 24613-2 (Machine-readable dictionary
(MRD) model), and ISO 24613-3 (Etymological extension).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
BCP 47 Tags for Identifying Languages. A. Phillips; M. Davis. IETF. September 2009. IETF Best Current
Practice. URL: https:// tools .ietf .org/ html/ bcp47
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological
extension
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and in
ISO 24613-3 apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
4 General requirements
This document aims at providing constructs for each LMF class from ISO 24613-1 (Core model),
ISO 24613-2 (MRD extension), and ISO 24613-3 (Etymological extension). It shall be compliant with
ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in the
respective parts. LBX extends the original models by means of data category selections and precise
value lists, the creation of new subclasses, and the definition of new constraints. In addition, this
document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.
The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this
© ISO 2020 – All rights reserved 1
---------------------- Page: 9 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
document does not elaborate on the meta-data aspects from LMF, since the LBX schema is by essence
much richer for the representation of all the aspects related to the creation, content, versioning and
database implementation of lexical content at large. Occasionally, slightly equivalent constructs to
explicit requirements from the LMF standard will be mentioned.
The XML examples in this document are simplified by omitting namespaces. Except where otherwise
stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within
the scope of the following XML namespace declaration:
xmlns=”http:// www .lbx .org/ 2020/ schema”
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class
The LexicalResource class shall be implemented in LBX by means of the element
(see Table 1), which groups together one to many lexicons in a single collection. This level may be
omitted in cases where the lexical resource contains only one lexicon so that the resource starts
directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or
several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a
@lexicalResourceID in the element and optionally the element (see 5.5).
Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class
The GlobalInformation class shall be implemented in LBX by means of the element
(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or
as a direct child of a element. allows the encoding of a variety
of administrative, technical, documentary, and bibliographic information attached to the corresponding
lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/
Since the LBX serialization is based on the W3C recommendation for XML, it implements the @xml:
lang attribute to indicate language information corresponding to the content of specific elements.
According to the W3C recommendation, @xml: lang content shall be compliant with BCP 47. There is
no need for a specific implementation of the /language coding/ data category or the /script coding/
data category in order to ensure compliance of this document with ISO 24613-1. LBX does allow the
inclusion of these data categories in the element in order to support the validation
of equivalent metadata found in the elements of one or more lexicons (see 5.4).
When included, the /script coding/ shall use the codes from ISO 15924. The /character encoding/ data
category is implemented in the XML declaration of an LBX conformant document using the @encoding
attribute. For instance, an XML-LBX document encoded as UTF-8 according to the Unicode standard
shall begin with the following declaration:
A non-exclusive list of sub-elements, simple types indexed by value, follows:
— “ISO639-3”, a simple type enumerating the set of language codes used across all lexicons;
2 © ISO 2020 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST ISO/DIS 24613-5:2021
ISO/DIS 24613-5:2020(E)
— “ISO15924”, a simple type enumerating the set of scripts used across all lexicons;
— globalNotationType, a simple type enumerating the set of notations used across all lexicons;
— globalPartOfSpeechType, a simple type enumerating the set of values used across
all lexicons;
— subjectFieldType, a simple type enumerating the set of values used a across lexicons.
Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).
5.3 Implementing the Lexicon class
The Lexicon class is implemented in LBX by means of the element (see Table 3), which is a
direct child of the element when is used. If the
element is not used, becomes the root element. In cases where a lexical resource contains
a large number of lexicons or several very large lexicons, the lexicon (XML document) can reference
a virtual lexical resource using a @lexicalResourceID in the element (see 5.1). In the
case of a virtual lexical resource, where the element is not part of the same XML
document as the element, the lexicon can use an include statement to reference a relevant
element. Other information within the element should be qualified
through the following child element(s) and attributes as direct children of the element or,
optimally, as children of the element (see 5.4):
— , the title of the lexicon;</br>
— @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should</br>
be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID</br>
when there is a design intent to make the entry accessible on the web;</br>
— @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best</br>
practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of @</br>
lexicalResourceID when there is a design intent to make the entry accessible on the web;</br>
— @lexiconType, of @datatype “xs: string”; the type of lexicon, e.g. bilingual dictionary, monolingual</br>
dictionary;</br>
— @sourceLanguage, of @datatype-”xs: string”; the language of the <Lemma> element or its</br>
inflected forms;</br>
— @targetLanguage, of @datatype ”xs: String”; the language the Lemma is translated to, principally</br>
represented in the <Translation> element.</br>
Table 3 — Lexicon class</br>
LMF class LBX construct</br>
/Lexicon/ <Lexicon></br>
5.4 Implementing the LexiconInformation class</br>
The LexiconInformation class is implemented by means of the LBX <LexiconInformation> element</br>
(see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include> element</br>
or as a direct child of the <Entry> element. <LexicalInformation> allows the encoding of a variety of</br>
administrative, technical, documentary, and bibliographic information attached to the corresponding</br>
lexical entry.</br>
© ISO 2020 – All rights reserved 3</br>
</br>
---------------------- Page: 11 ----------------------</br>
oSIST ISO/DIS 24613-5:2021</br>
ISO/DIS 24613-5:2020(E)</br>
</br>
Table 4 — LexiconInformation class</br>
LMF class LBX construct</br>
/LexiconInformation/ <LexiconInformation></br>
When not included in the <Lexicon> element, information qualifying the lexicon should be included as</br>
elements and attributes in the <LexiconInformation> element. These include (see 5.3):</br>
— <Title>;</br>
— @lexiconID</br>
— @lexicalResourceID;</br>
— @lexiconType;</br>
— @sourceLanguage;</br>
— @targetLanguage.</br>
The <LexiconInformation> can also include elements and data categories that further qualify</br>
information in the lexicon and can be used to support the validation of the XML document (lexicon).</br>
These elements and data categories should also be included in the global set of elements and data</br>
categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding</br>
values in <GlobalInformation> and <LexiconInformation> should be part of the validation process.</br>
A non-exclusive list of these sub-elements, simple types indexed by value, follows:</br>
— notationType, a simple type enumerating the set of notations used in a lexicon;</br>
— partOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon;</br>
— subjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon.</br>
Examples can be found in the LBX reference schema, LexiconInformation document (see B.1).</br>
NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon</br>
information for a subset of lexicons grouped by language by referencing a named language data schema (e.g.</br>
ArabicLanguageData.xsd) (see B.1).</br>
5.5 Implementing the LexicalEntry class</br>
The LexicalEntry class should be implemented by means of the <Entry> element in LBX (see Table 5).</br>
Lexical information inside <Entry> elements should be encoded through the following child elements:</br>
— <GramFeats> for grammatical information related to the whole entry;</br>
— <Form> for containing the text literal and attributes qualifying the text literal (the Form class is</br>
serialized through subclasses in LBX);</br>
— <Etymology> for etymological aspects;</br>
— <Sense> for semantic information;</br>
— <Xref> for referencing internal or external elements.</br>
Attributes used for the <LexicalEntry> element can include:</br>
— @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be</br>
a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when</br>
there is a design intent to make the entry accessible on the web;</br>
4 © ISO 2020 – All rights reserved</br>
</br>
---------------------- Page: 12 ----------------------</br>
oSIST ISO/DIS 24613-5:2021</br>
ISO/DIS 24613-5:2020(E)</br>
</br>
— @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the</br>
id should be a URI and be unique within a language resource; @xml:ID can be used in place of @</br>
entryID when there is a design intent to make the lexicon accessible on the web;</br>
— @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection</br>
when there is more than one lexicon.</br>
Table 5 — LexicalEntry class</br>
LMF class LBX construct</br>
/LexicalEntry/ <Entry></br>
The following example in French illustrates the encoding of a simple dictionary entry with two senses.</br>
EXAMPLE</br>
<Entry xml:lang="fr"></br>
<Etymology> XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7);</br>
anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology></br>
<Lemma></br>
<GramFeats></br>
<POS>noun</POS></br>
<Gender>fem</Gender></br>
</GramFeats></br>
<FormRep xml:lang=”fr” notation=”French”>langouste</FormRep></br>
<FormRep xml:lang=”fr” notation=”IPA”>lägust</FormRep></br>
</Lemma></br>
</br>
<Sense senseNR="1"></br>
<Def></br>
<DefRep> xml:lang=”fr”>Grand crustacé marin (Décapodes macroures) aux pattes</br>
antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est</br>
très appréciée.</DefRep></br>
</Def></br>
</Sense></br>
</br>
<Sense senseNR="2"></br>
<Note type="socioCultural">Fig. et fam. (vulg.).</Note></br>
<Def></br>
<DefRep xml:lang=”fr”> Femme, maîtresse</DefRep></br>
</Def></br>
</Sense></br>
</Entry></br>
NOTE 1 The style in the above example would be appropriate for use in a lexical resource that contains a</br>
collection of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler</br>
style could be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> could be</br>
used in place of the equivalent <FormRep> elements and the <Def> element could directly contain the text content</br>
rather than employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of</br>
simplification using the <Orth> and <Pron> elements.</br>
NOTE 2 The @notation value “French” is short for “Canonical French”.</br>
5.6 Implementing the OrthographicRepresentation class</br>
Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition</br>
classes. LBX typically implements orthographic representations by means of elements corresponding</br>
to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (Machine-readable</br>
dictionary (MRD) model), Some of those elements are introduced in 5.7.2 and 5.10 in association with</br>
classes introduced in ISO 24613-1 (Core model). Those classes (and classes introduced in ISO 24613-2</br>
(MDR)) are</br>
<b>...</b>
INTERNATIONAL ISO
STANDARD 24613-5
First edition
Language resource management —
Lexical markup framework (LMF) —
Part 5:
Lexical base exchange (LBX)
serialization
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 5: Sérialisation de l’échange de bases lexicales (LBX)
PROOF/ÉPREUVE
Reference number
ISO 24613-5:2021(E)
© ISO 2021
---------------------- Page: 1 ----------------------
ISO 24613-5:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24613-5:2021(E)
Contents Page
Foreword .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 General requirements . 1
5 Serialization of the LMF core model (ISO 24613-1) . 2
5.1 Implementing the LexicalResource class . 2
5.2 Implementing the GlobalInformation class. 2
5.3 Implementing the Lexicon class . 3
5.4 Implementing the LexiconInformation class . 4
5.5 Implementing the LexicalEntry class . 4
5.6 Implementing the OrthographicRepresentation class . 5
5.7 Implementing the Form class . 6
5.7.1 Form class . 6
5.7.2 Lemma class . 6
5.8 Implementing the GrammaticalInformation class . 6
5.9 Implementing the Sense class . 7
5.10 Implementing the Definition class . 8
5.11 Implementing the CrossREF class . 8
6 Serialization of the MRD extension (ISO 24613-2) . 9
6.1 Implementing OrthographicRepresentation for MRD . 9
6.2 Implementing Form representations for the Form subclasses . 10
6.3 Classes derived from the Form class . 10
6.3.1 General principles . 10
6.3.2 Implementing the WordForm class. 11
6.3.3 Implementing the Stem class . 11
6.3.4 Implementing the WordPart class . 11
6.3.5 Implementing the RelatedForm class .12
6.3.6 Implementing the TextRepresentation class .13
6.3.7 Implementing the Translation class . 14
6.3.8 Implementing the Example class . 14
6.4 Implementing the SubjectField class . 15
6.5 Implementing the Bibliography class . 15
7 Implementing the CrossREF mechanism to refer to external media files.15
8 Implementing the classes from the etymological extension (ISO 24613-3) .15
8.1 Implementing the Etymology class . 15
8.2 Implementing the Etymon class . 16
8.2.1 Referencing forms in an etymon . 16
8.2.2 Representing the meaning of an etymon . 16
8.2.3 Representing the language of an etymon . 16
8.2.4 Dating an etymon . 16
8.2.5 Providing sources associated with an etymon . 17
8.3 Implementing the EtyLink class . 17
8.4 Implementing the CognateSet class . 17
8.5 Implementing the Cognate class . 17
9 Additional mechanisms .18
9.1 Overview . 18
9.2 XML feature structure implementation . 18
9.3 Representing various labels with . 18
9.4 Providing rendering information with the @rend attribute . 18
Annex A (informative) LBX data category selection .19
iii
© ISO 2021 – All rights reserved PROOF/ÉPREUVE
---------------------- Page: 3 ----------------------
ISO 24613-5:2021(E)
Annex B (informative) LBX feature structure implementation .24
Annex C (informative) LBX examples for applying LBX serialization .27
Bibliography .32
iv
PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24613-5:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
This first edition of ISO 24613-5, together with ISO 24613-1:2019, ISO 24613-2:2020, ISO 24613-3:2021
and ISO 24613-4:2021, cancels and replaces ISO 24613:2008, which has been technically revised.
The main change compared to the previous edition is as follows:
— entire revision of the content and its subdivisions into several parts.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
© ISO 2021 – All rights reserved PROOF/ÉPREUVE
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24613-5:2021(E)
Language resource management — Lexical markup
framework (LMF) —
Part 5:
Lexical base exchange (LBX) serialization
1 Scope
This document describes the serialization of the lexical markup framework (LMF) model defined as
an extensible markup language (XML) model derived from the language base exchange (LBX) schema
and compliant with the W3C XML schema. This serialization covers the classes, data categories, and
mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model),
and ISO 24613-3 (etymological extension).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 24613-1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model
ISO 24613-2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-
readable dictionary (MRD) model
ISO 24613-3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological
extension
IETF BCP 47. Tags for Identifying Languages. Phillips, A., Davis, M. (eds.), September 2009. Best Current
Practice. Available from: https:// tools .ietf .org/ html/ bcp47
W3C. Extensible Markup Language (XML) 1.1 (Second Edition). W3C Recommendation 16 August
2006, edited in place 29 September 2006. Available from: https:// www .w3 .org/ TR/ 2006/ REC -xml11
-20060816/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24613-1 and ISO 24613-3
apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
4 General requirements
This document aims at providing constructs for each LMF class from ISO 24613-1 (core model),
ISO 24613-2 (MRD extension), and ISO 24613-3 (etymological extension). It requires compliance
1
© ISO 2021 – All rights reserved PROOF/ÉPREUVE
---------------------- Page: 6 ----------------------
ISO 24613-5:2021(E)
with ISO 24613-1, ISO 24613-2, and ISO 24613-3 when implementing data categories referred to in
the respective parts, and compliance with the W3C XML Schema 1.1 for representing structured
information in XML. LBX extends the original models by means of data category selections and precise
value lists, the creation of new subclasses and the definition of new constraints. In addition, this
document complies with the cardinalities expressed in ISO 24613-1, ISO 24613-2, and ISO 24613-3.
The LBX serialization is richer in detail than LMF, in order to meet specific design objectives. Still, this
document does not elaborate on the metadata aspects from LMF, since the LBX schema is by essence
much richer for the representation of all the aspects related to the creation, content, versioning and
database implementation of lexical content at large. Occasionally, slightly equivalent constructs to
explicit requirements from the LMF standard are mentioned.
The XML examples in this document are simplified by omitting namespaces. Except where otherwise
stated, it is assumed that XML elements belong to the LBX namespace and that the examples lie within
the scope of the following XML namespace declaration:
xmlns="http:// www .LexicalBaseExchange .org/ 2021/ schema"
Besides, datatypes in this document are defined in compliance to the XML Schema Part 2
recommendation. The “xs:” prefix corresponds to the namespace http://w ww .w3 .org/ 2001/
XMLSchema.
5 Serialization of the LMF core model (ISO 24613-1)
5.1 Implementing the LexicalResource class
The LexicalResource class shall be implemented in LBX by means of the element
(see Table 1), which groups together one to many lexicons in a single collection. This level may be
omitted in cases where the lexical resource contains only one lexicon so that the resource starts
directly with the lexicon level. In cases where a lexical resource contains a large number of lexicons or
several very large lexicons, the lexicon (XML document) can reference a virtual lexical resource using a
@lexicalResourceID in the element and optionally the element (see 5.5).
Table 1 — LexicalResource class
LMF class LBX construct
/LexicalResource/
5.2 Implementing the GlobalInformation class
The GlobalInformation class shall be implemented in LBX by means of the element
(see Table 2) either by referencing a GlobalInformation.xsd schema using an element, or
as a direct child of a element. allows the encoding of a variety
of administrative, technical, documentary, and bibliographic information attached to the corresponding
lexical resource.
Table 2 — GlobalInformation class
LMF class LBX construct
/GlobalInformation/
Since the LBX serialization is based on the W3C recommendation for XML, it implements the @xml:
lang attribute to indicate language information corresponding to the content of specific elements.
According to the W3C recommendation, @xml: lang content shall be compliant with BCP 47. There is
no need for a specific implementation of the /language coding/ data category or the /script coding/
data category in order to ensure compliance of this document with ISO 24613-1. LBX does allow the
inclusion of these data categories in the element in order to support the validation
of equivalent metadata found in the elements of one or more lexicons (see 5.4).
2
PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24613-5:2021(E)
When included, the /script coding/ shall use the codes from ISO 15924. The /character encoding/ data
category is implemented in the XML declaration of an LBX conformant document using the @encoding
attribute. For instance, an XML-LBX document encoded as UTF-8 according to the Unicode standard
shall begin with the following declaration:
A non-exclusive list of sub-elements, simple types indexed by value, follows:
— “ISO 639-3”, a simple type enumerating the set of language codes used across all lexicons;
— “ISO 15924”, a simple type enumerating the set of scripts used across all lexicons;
— globalNotationType, a simple type enumerating the set of notations used across all lexicons;
— globalPartOfSpeechType, a simple type enumerating the set of values used across
all lexicons;
— subjectFieldType, a simple type enumerating the set of values used across all
lexicons.
Examples can be found in the LBX reference schema, GlobalInformation document (see Annex B).
5.3 Implementing the Lexicon class
The Lexicon class shall be implemented in LBX by means of the element (see Table 3),
which is a direct child of the element when is used. If the
element is not used, becomes the root element. In cases where a lexical
resource contains a large number of lexicons or several very large lexicons, the lexicon (XML document)
can reference a virtual lexical resource using a @lexicalResourceID in the element (see
5.1). In the case of a virtual lexical resource, where the element is not part of the
same XML document as the element, the lexicon can use an include statement to reference
a relevant element. Other information within the element should be
qualified through the following child element(s) and attributes as direct children of the
element or, optimally, as children of the element (see 5.4):
— , the title of the lexicon;</br>
— @lexiconID, of datatype xs:ID as a unique identifier for the lexicon; as a best practice, the id should</br>
be a URI and be unique within a language resource; @xml:ID can be used in place of @lexiconID</br>
when there is a design intent to make the entry accessible on the web;</br>
— @lexicalResourceID of datatype xs:ID as a unique identifier for the lexical resource; as a best</br>
practice, the ID should be a URI for global scope; in addition, @xml:ID can be used in place of @</br>
lexicalResourceID when there is a design intent to make the entry accessible on the web;</br>
— @lexiconType, of @datatype "xs: string"; the type of lexicon, e.g. bilingual dictionary, monolingual</br>
dictionary;</br>
— @sourceLanguage, of @datatype-"xs: string"; the language of the <Lemma> element or its inflected</br>
forms;</br>
— @targetLanguage, of @datatype "xs: String"; the language the lemma is translated to, principally</br>
represented in the <Translation> element.</br>
Table 3 — Lexicon class</br>
LMF class LBX construct</br>
/Lexicon/ <Lexicon></br>
3</br>
© ISO 2021 – All rights reserved PROOF/ÉPREUVE</br>
</br>
---------------------- Page: 8 ----------------------</br>
ISO 24613-5:2021(E)</br>
5.4 Implementing the LexiconInformation class</br>
The LexiconInformation class shall be implemented in LBX by means of the <LexiconInformation></br>
element (see Table 4) either by referencing a LexiconInformation.xsd schema using an <xsd: include></br>
element or as a direct child of the <Entry> element. <LexiconInformation> allows the encoding of a</br>
variety of administrative, technical, documentary, and bibliographic information attached to the</br>
corresponding lexical entry.</br>
Table 4 — LexiconInformation class</br>
LMF class LBX construct</br>
/LexiconInformation/ <LexiconInformation></br>
When not included in the <Lexicon> element, information qualifying the lexicon shall be included as</br>
elements and attributes in the <LexiconInformation> element. These include (see 5.3):</br>
— <Title>;</br>
— @lexiconID;</br>
— @lexicalResourceID;</br>
— @lexiconType;</br>
— @sourceLanguage;</br>
— @targetLanguage.</br>
The <LexiconInformation> can also include elements and data categories that further qualify</br>
information in the lexicon and can be used to support the validation of the XML document (lexicon).</br>
These elements and data categories should also be included in the global set of elements and data</br>
categories found in the <GlobalInformation> element (see 5.2) and a comparison of the corresponding</br>
values in <GlobalInformation> and <LexiconInformation> should be part of the validation process.</br>
A non-exclusive list of these sub-elements, simple types indexed by value, follows:</br>
— notationType, a simple type enumerating the set of notations used in a lexicon;</br>
— partOfSpeechType, a simple type enumerating the set of <partOfSpeech> values used in a lexicon;</br>
— subjectFieldType, a simple type enumerating the set of <SubjectField> values used in a lexicon.</br>
NOTE In addition to the <LexiconInformation> construct, LBX allows the concatenation of lexicon</br>
information for a subset of lexicons grouped by language by referencing a named language data schema (e.g.</br>
ArabicLanguageData.xsd) (see Clause B.1).</br>
5.5 Implementing the LexicalEntry class</br>
The LexicalEntry class shall be implemented in LBX by means of the <Entry> element (see Table 5).</br>
Lexical information inside <Entry> elements should be encoded through the following child elements:</br>
— <GramFeats> for grammatical information related to the whole entry;</br>
— <Form> for containing the text literal and attributes qualifying the text literal (the Form class is</br>
serialized through subclasses in LBX);</br>
— <Etymology> for etymological aspects;</br>
— <Sense> for semantic information;</br>
— <Xref> for referencing internal or external elements.</br>
4</br>
PROOF/ÉPREUVE © ISO 2021 – All rights reserved</br>
</br>
---------------------- Page: 9 ----------------------</br>
ISO 24613-5:2021(E)</br>
Attributes used for the <LexicalEntry> element can include:</br>
— @entryID of datatype xs:ID as a unique identifier for an entry; as a best practice, the id should be</br>
a URI and be unique within a language resource; @xml:ID can be used in place of @entryID when</br>
there is a design intent to make the entry accessible on the web;</br>
— @lexiconID of datatype xs:ID as a unique identifier for the parent lexicon; as a best practice, the</br>
id should be a URI and be unique within a language resource; @xml:ID can be used in place of @</br>
entryID when there is a design intent to make the lexicon accessible on the web;</br>
— @lexicalResourceID, a reference to the @lexicalResourceID of the associated lexicon collection</br>
when there is more than one lexicon.</br>
Table 5 — LexicalEntry class</br>
LMF class LBX construct</br>
/LexicalEntry/ <Entry></br>
The following example in French illustrates the encoding of a simple dictionary entry with two senses.</br>
EXAMPLE</br>
<Entry xml:lang="fr"></br>
<Etymology>XIIIe; languste, v. 1120, «sauterelle»; encore dans Corneille (Hymnes, 7);</br>
anc. provençal langosta, altér. du lat. class. locusta «sauterelle».</Etymology></br>
<Lemma></br>
<GramFeats></br>
<POS>noun</POS></br>
<Gender>fem</Gender></br>
</GramFeats></br>
<FormRep xml:lang="fr" notation="French">langouste</FormRep></br>
<FormRep xml:lang="fr" notation="IPA">lägust</FormRep></br>
</Lemma></br>
</br>
<Sense senseNR="1"></br>
<Def></br>
<DefRep xml:lang="fr">Grand crustacé marin (Décapodes macroures) aux pattes</br>
antérieures dépourvues de pinces, aux antennes longues et fortes, et dont la chair est</br>
très appréciée.</DefRep></br>
</Def></br>
</Sense></br>
</br>
<Sense senseNR="2"></br>
<Note type="socioCultural">Fig. et fam. (vulg.).</Note></br>
<Def></br>
<DefRep xml:lang="fr">Femme, maîtresse</DefRep></br>
</Def></br>
</Sense></br>
</Entry></br>
</br>
NOTE 1 The style in the above example is appropriate for use in a lexical resource that contains a collection</br>
of bilingual lexicons in a variety of source languages, e.g. French, Spanish, Russian, Chinese. A simpler style can</br>
be used for a collection of monolingual French lexicons. For example, <Orth> and <Pron> can be used in place</br>
of the equivalent <FormRep> elements and the <Def> element can directly contain the text content rather than</br>
employing a <DefRep> child element for managing text content (see 5.10). See 6.2 for an example of simplification</br>
using the <Orth> and <Pron> elements.</br>
NOTE 2 The @notation value “French” is short for “Canonical French”.</br>
5.6 Implementing the OrthographicRepresentation class</br>
Classes containing an OrthographicRepresentation class include the Form, Lemma, and Definition</br>
classes. Orthographic representations shall be implemented in LBX by means of elements corresponding</br>
to OrthographicRepresentation subclasses that are introduced in ISO 24613-2 (machine-readable</br>
dictionary (MRD) model), or possible new OrthographicRepresentation subclasses derived through the</br>
5</br>
© ISO 2021 – All rights reserved PROOF/ÉPREUVE</br>
</br>
---------------------- Page: 10 ----------------------</br>
ISO 24613-5:2021(E)</br>
principles for LMF extensions described in ISO 24613-1 (Core model). ISO 24613-1:2019, 5.6.1, describes</br>
some of the representation types that can serve as a basis for extending the OrthographicRepresentation</br>
class. ISO 24613-4 (TEI extension), 6.1, lists a number of representation elements that are valid for use</br>
with the Form class. Elements implemented in this part are described in 5.7.2, 5.10, and successive</br>
subclauses from 6.3.2 to 6.3.8.</br>
5.7 Implementing the Form class</br>
5.7.1 Form class</br>
The Form class shall be implemented in LBX by means of Form subclasses (see Table 6, 6.2 and 6.3).</br>
Table 6 — Form class</br>
LMF class LBX construct</br>
/Form/ <Form></br>
5.7.2 Lemma class</br>
The Lemma class, a subclass of the Form class, shall be implemented in LBX by means of the <Lemma></br>
element (see Table 7).</br>
Table 7 — Lemma class</br>
LMF class LBX construct</br>
/Lemma/ <Lemma></br>
Orthographic representations in the <Lemma> element shall be implemented in LBX by means of the</br>
<FormRep> element, or by elements that instantiate <Form> subclasses, including <Orth> and <Pron>.</br>
NOTE 1 The <FormRep>, <Orth>, and <Pron> elements are introduced in 6.2.</br>
NOTE 2 <Orth> and <Pron> can be allowed when justified by design goals.</br>
5.8 Implementing the GrammaticalInformation class</br>
The GrammaticalInformation class groups grammatical features associated with the LexicalEntry</br>
class, Form class, or other classes (e.g. Translation, Sense) in case of specific grammatical restrictions.</br>
The GrammaticalInforma</br>
<b>...</b>
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.