SIST ISO 24617-11:2021
(Main)Language resource management -- Semantic annotation framework (SemAF) - Part 11: Measurable Quantitative information (MQI)
Language resource management -- Semantic annotation framework (SemAF) - Part 11: Measurable Quantitative information (MQI)
This document covers the measurable or magnitudinal aspect of quantity so that it can focus on the technical or practical use of measurements in IR (information retrieval), QA (question answering), TS (text summarization), and other NLP (natural language processing) applications. It is applicable to the domains of technology that carry more applicational relevance than some theoretical issues found in the ordinary use of language.
NOTEÂ Â Â Â Â Â ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative information.
This document also treats temporal durations that are discussed in ISO 24617-1, and spatial measures such as distances that are treated ISO 24617-7, while making them interoperable with other measure types. It also accommodates the treatment of measures or amounts that are introduced in ISO 24617‑6:2016, 8.3.
Gestion des ressources linguistiques -- Cadre d'annotation sémantique - Partie 11: Mesurer l'information quantitative (MQI)
Le présent document porte sur l’aspect mesurable ou quantitatif de la grandeur, de sorte qu’il est possible de se concentrer sur l’utilisation technique ou pratique des mesures dans les applications IR (recherche d’informations), QA (réponse aux questions), TS (résumé de texte) et autres applications NLP (traitement du langage naturel). Il s’applique aux domaines technologiques qui présentent plus d’intérêt sur le plan de l’application que certains problèmes théoriques rencontrés dans l’utilisation ordinaire du langage.
NOTE           L’ISO 24617-12 traite des questions plus générales et théoriques de la quantification et de l’information quantitative.
Le présent document traite également des durées temporelles qui sont abordées dans l’ISO 24617-1 et des mesures spatiales telles que les distances qui sont traitées dans l’ISO 24617-7, tout en les rendant interopérables avec d’autres types de mesures. Il intègre également le traitement des mesures ou des montants qui sont introduits dans l’ISO 24617-6:2016, 8.3.
Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 11. del: Merljive kvantitativne informacije (MQI)
General Information
Buy Standard
Standards Content (Sample)
SLOVENSKI STANDARD
SIST ISO 24617-11:2021
01-oktober-2021
Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 11.
del: Merljive kvantitativne informacije (MQI)
Language resource management -- Semantic annotation framework (SemAF) - Part 11:
Measurable Quantitative information (MQI)
Gestion des ressources linguistiques -- Cadre d'annotation sémantique - Partie 11:
Mesurer l'information quantitative (MQI)
Ta slovenski standard je istoveten z: ISO 24617-11:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24617-11:2021 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24617-11:2021
---------------------- Page: 2 ----------------------
SIST ISO 24617-11:2021
INTERNATIONAL ISO
STANDARD 24617-11
First edition
2021-08
Language resource management —
Semantic annotation framework
(SemAF) —
Part 11:
Measurable quantitative information
(MQI)
Gestion des ressources linguistiques — Cadre d'annotation
sémantique (SemAF) —
Partie 11: Informations quantitatives mesurables (MQI)
Reference number
ISO 24617-11:2021(E)
©
ISO 2021
---------------------- Page: 3 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abstract specification of QML . 3
4.1 Overview . 3
4.2 Characteristics of QML . 4
4.3 Metamodel . 4
4.4 Abstract syntax of QML (QML_as) . 5
4.5 Concrete syntaxes of QML (QML_cs) and its subsets . 6
5 XML-based concrete syntax of QML (QML_csx) . 6
5.1 General . 6
5.2 Tag names with ID prefixes . 6
5.3 Attribute specification of the root . 7
5.4 Attribute specification of the basic element types . 7
5.5 Attribute specification of the link types . 8
5.6 Illustrations of QML_csx . 8
5.6.1 General. 8
5.6.2 Sample data . 8
5.6.3 Procedure of annotation . 9
6 TEI-based concrete syntax of QML (QML_cst) .11
6.1 Concrete syntaxes of QML (QML_cst) .11
6.1.1 Overall .11
6.1.2 Tag names with ID prefixes .11
6.1.3 Attribute specification of the basic element types .11
6.1.4 Attribute specification of the two link types .12
6.2 Illustrations of QML_cst .12
6.2.1 Overall .12
6.2.2 Sample data .12
6.2.3 Illustrations of TEI-based Concrete Syntax.13
Annex A (informative) Illustrations of QML_csx with more samples .16
Annex B (informative) Informal statements of MQI .19
Annex C (informative) The representation of units .20
Bibliography .21
© ISO 2021 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
Introduction
Measurable quantitative information (MQI) such as ‘165 cm’ or ‘60 kg’ of ‘John’ that applies to the height
or weight of the person is very common in ordinary language. MQI describes one of basic properties
that is associated with the magnitude aspect of quantity. The main characteristics of MQI is that
quantitative information is presented as measures expressed in terms of a pair , consisting of
a numerically expressed quantity n and a unit u, which is either basic or derived, or either normalized
or conventionally used. Such information is much more abundant in scientific publications or technical
reports to the extent that it constitutes an essential part of communicative segments of language in
general. The processing of such information is thus required for any successful language resource
management.
In such a big data era, demands from industry and academic communities for a precise acquisition of
measurable quantitative information have increased. For example, business investment companies
frequently need to aggregate various sorts of information covering net sales, gross profit, operating
expenses, operating profit, interest expense, net profit before taxes, net income, etc., of the target
companies from their annual reports. The fast-growing medical informatics research also needs
to process a large amount of medical texts to analyze the dose of medicine, the eligibility criteria of
[8]
clinical trial, the phenotype characters of patients, the lab tests in clinical records, etc. . All these
demands either in industry or in medical research require the accurate and consistent representation
of measurable quantitative information for automated processing, computation, and exchange.
However, in the IR and NLP areas, there is no standardized way of representing measurable quantitative
information currently available. Each application system developed in industrial sectors has hitherto
used its own format to annotate measurable quantitative information. A flexible, interoperable and
standardized measurable quantitative information representation format for IR and NLP tasks to work
with many different application systems is called for.
This document aims at formulating a general annotation scheme with following the principles of
semantic annotation laid down in ISO 24617-6 in general and the basic requirements of ISO 24611,
that facilitates the processing of MQI in scientific and technical language and to make it interoperable
with other semantic annotation schemes, such as ISO 24617. The annotation scheme is designed to be
interoperable with other parts of ISO 24617. It also utilizes various ISO standards on lexical resources
and morpho-syntactic annotation frameworks. It aims at being compatible with other existing relevant
standards.
NOTE ISO 24617-1 and ISO 24617-7, for instance, have proposed a way of annotating measures on time
(durations or time amounts) and space (distances), respectively. ISO 24612 provides a pivotal form (graphic
annotation framework) that makes all the annotation of temporal or spatial measures in these two annotation
schemes.
QML is normalized at the abstract level that allows various serialization formats representing annotated
measurable quantitative information such as an XML-based representation. The normalization of QI
(quantitative information) annotation is stated at the abstract level of annotation, and the standoff
annotation format is adopted at the concrete level of serialization.
Focusing on measurements in scientifico-technological language, this document is expected to
[9]
contribute to information extraction (IR) , question answering (QA), text summarization (TS), and
[10]
other natural language processing (NLP) applications .
© ISO 2021 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-11:2021
---------------------- Page: 8 ----------------------
SIST ISO 24617-11:2021
INTERNATIONAL STANDARD ISO 24617-11:2021(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 11:
Measurable quantitative information (MQI)
1 Scope
This document covers the measurable or magnitudinal aspect of quantity so that it can focus on the
technical or practical use of measurements in IR (information retrieval), QA (question answering), TS
(text summarization), and other NLP (natural language processing) applications. It is applicable to the
domains of technology that carry more applicational relevance than some theoretical issues found in
the ordinary use of language.
NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative
information.
This document also treats temporal durations that are discussed in ISO 24617-1, and spatial
measures such as distances that are treated ISO 24617-7, while making them interoperable with other
measure types. It also accommodates the treatment of measures or amounts that are introduced in
ISO 24617-6:2016, 8.3.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
quantity
property of a measurable object referring to its magnitude or multitude
[SOURCE: ISO/IEC Guide 99:2007, 1.1, modified — Definition substantially redrafted, and Notes
removed.]
© ISO 2021 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
3.2
base quantity
quantity (3.1) in a conventionally chosen subset of a given system of quantities, where no quantity in
the subset can be expressed in terms of the other quantities within that subset
Note 1 to entry: Kinds of quantities include seven base quantities defined by the International System of
Quantities (ISQ).
[SOURCE: ISO/IEC Guide 99:2007, 1.4, modified — "no subset quantity" replaced with "no quantity in
the subset", "the others" replaced with "the other quantities within that subset", and Notes and Example
removed.]
3.3
derived quantity
quantity (3.1), in a system of quantities, defined in terms of the base quantities (3.2) of that system
-1
EXAMPLE Speed is a derived quantity defined by length (distance) over time (LT ), where length (L) and
time (T) are base quantities.
[SOURCE: ISO/IEC Guide 99:2009, 1.5, modified — Example replaced.]
3.4
quantitative information
QI
measurement associated with the quantity (3.1) of a measurable object
3.5
measurable quantitative information
MQI
quantitative information (3.4) that can be expressed in unitized numeric terms
3.6
measurable quantitative information markup language
markup language of measurable quantitative information
quantitative markup language
QML
specification language for the annotation of measurable quantitative information (3.5) extractable from
text or other medium types of language
3.7
measurement unit
unit of measurement
unit
scalar basis, defined and adopted by convention, of measuring objects by multiplying their quantitative
values expressed in real numbers
Note 1 to entry: The expressions that are used in measurement such as “metre”, “litre”, and “µmol/kg” are units
by the definition given above. The multitude expressions such as “bottles”, “boxes”, or “two” as in “two bottles of
milk”, “a box of apples”, and “two coffees” sometimes fail to be regarded as units, but they can also be if they are
accepted as units by convention or agreement in some communities. ISO 24617 SemAF Part 12: Quantification
treats such multitude expressions as genuine units.
[SOURCE: ISO/IEC Guide 99:2007, 1.9, modified — Definition substantilly redrafted, original Notes
removed, new Note 1 to entry added.]
3.8
base unit
measurement unit (3.7) that is adopted by convention for a base quantity (3.2)
Note 1 to entry: There are seven base units chosen by the International System of Units (SI) associated with
seven ISQ base quantities to measure quantities, as shown in Table 1.
2 © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
Table 1 — Base units
SI base unit Associated ISQ base quantity
(unit symbol) (base quantity symbol)
metre (m) length (L)
kilogram (kg) mass (M)
second (s) time (T)
ampere (A) electric current (I)
kelvin (K) thermodynamic temperature (È)
mole (mol) amount of substance (N)
candela (cd) luminous intensity (J)
[SOURCE: ISO/IEC Guide 99:2007, 1.10, modified — Notes and Examples removes, new Note 1 to entry
and Table 1 added.]
3.9
derived unit
measurement unit (3.7) for a derived quantity (3.3)
EXAMPLE The unit “newton” (N) is a derived unit for a derived quantity “force” (F), which is defined to be
-2
“mass times acceleration” (MLT ), where the quantity “acceleration” is a derived quantity defined by “velocity
-1 -1
divided by time” (VT ) and “velocity” defined by “length (distance) divided by time” (LT ).
Note 1 to entry: Table 2 illustrates some of the derived units.
[SOURCE: ISO/IEC Guide 99:2007, 1.11, modified — Examples removed, new Example and Note 1 to
entry added.]
Table 2 — derived units
Derived unit Associated derived quantity
(unit symbol)
kilometre per minute(km/min) speed = length(L)/ time(T)
3 3
gram per cubic metre (gram/m ) density = mass(M)/volume(L )
2 2
kilogram metre per square second (kg x m/s ) force = mass (M) x length(L)/time(T )
2 2
lumen per square metre (lm/m ) Illuminance = luminous intensity (J)/area(M )
4 Abstract specification of QML
4.1 Overview
The quantitative markup language (QML) (3.6) is specified at two levels, abstract and concrete. Some
characteristics of QML are listed in 4.2. The overall structure of QML is represented by a metamodel, as
introduced in 4.3. The abstract syntax of QML as QML_as shall be a set-theoretic specification of QML in
conceptual terms that are independent of ways of representing the annotation (content) of measurable
quantitative information. The concrete syntax of QML as QML_cs shall be a specification of a set of
representation formats, based on QML_as, for the annotation of measurable quantitative information
in a computationally tractable way. The QML_as is introduced in 4.4, while QML_cs is presented in
4.5. Equivalent concrete syntaxes, including an XML-based concrete syntax QML_csx and a TEI-based
concrete syntax QML_cst, are described in Clause 5 and Clause 6, respectively.
NOTE There can be many equivalent concrete syntaxes defined on a single abstract syntax.
© ISO 2021 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
4.2 Characteristics of QML
QML shall have the following characteristics.
a) QML shall focus on the annotation of the measurable attributes of entities. For example, “BMI
2
between 10-20 kg/ m ”.
b) QML shall provide a way to annotate the relations of measures. For example, “age 40 or older” and
“fpg>=100 mg/dl or a1c not less than 5,8 %”.
c) QML shall cover the complex uses of unitized numeric quantities. For example, “14,0 × 109”,
“glycosylated haemoglobin (hba1c) <1,15 times the upper limit of normal”.
d) QML shall facilitate the identification of normalized numeric and units, as the measurable attribute
of an associated entity.
NOTE QML does not specify ways of annotating the normalization (e.g. “millimoles per litre” is normalized to
2
“mmol/l”) or complete specification (e.g. “kg/m” is “kg/m ” for BMI) of units, which will be dealt with in another
part of ISO 24617 addressing automated implementation of MQI.
4.3 Metamodel
The overall structure of measurable quantitative information is represented by the metamodel in
Figure 1.
Figure 1 — Metamodel of measurable quantitative information
This metamodel shall consist of seven class components, represented as square boxes in Figure 1:
a) source data as input to the annotation of MQI,
b) markables extracted from data sources,
c) three types of basic elements: entity, measure, and relator,
d) two types of links: measure link and comparison link.
The element “entity” shall be any object that has the property of a measurable quantity, represented by
“@quantity”, as one of its properties. The “entity”, as is used in this document, shall be a very general
term that refers to any object, not just to individual entities, but also to their properties, such as
4 © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
“height” of a building or “speed” of a car, and also to any kinds of eventualities such as states, processes
or transitions.
EXAMPLE 1 We drove at more than 200 kilometres per hour on a German autobahn.
The speed mentioned by “more than 200 kilometres per hour” applies to the quantitative property of a
motion: e.g. the measure “over 200 kilometres per hour” applies to the motion of driving mentioned in
the example.
The element “measure” represents a measurable quantity of an entity in terms of three attributes:
quantity, unit, and type.
EXAMPLE 2 The height of Mt. Hall is 1 950 metres.
The measure shall consist of a quantity referred to by a numeric expression “1 950” and a unit “metre”.
It applies to the “height” quantity of the geographical object, named “Mt. Hall”.
The element “relator” which is associated with markables such as “equal to”, “greater than”, “<=”,
“between”, or “at least” has only a functional status of relating two or more measures.
EXAMPLE 3 One pound equals 16 ounces.
It is a relator of identity between two measures, “one pound” and “16 ounces”.
EXAMPLE 4 1 foot is less than 1 metre, for it is exactly equal to 30,48 cm.
This example illustrates two types of links between measures: the relation of being “less than”, and that
of being an identity.
A link of the type “measure” shall relate a measure to the quantitative property of an entity. Such a link
is triggered by a measure element.
A link of the type “comparison” shall relate a measure to another or other more measures. Such a link is
often triggered by an element “comparison”.
4.4 Abstract syntax of QML (QML_as)
A markup language QML shall be a specification language for the annotation of MQI. The abstract syntax
of QML shall specifie an annotation scheme in set-theoretic terms based on a conceptual understanding
of MQI. The abstract syntax QML_as is understood to be structured as a triple such that
a) B is a set of three basic element types: entity, measure, and relator;
b) R is a set of two link types: measure and comparison types;
c) @ is a set of assignments that specify the list of attributes and their value types associated with
each of the basic element types in B and each of the link types in R.
Every element in B shall have at least one attribute, @type, and so does every link. The values of @
type are CDATA associated with each of the elements. For instance, the entity of “mountain” is of the
“geographical” type, and the entity named “John” is of the “person” type.
The values of @quantity for an entity are CDATA that may include values such as height, width, or
weight, and so on.
The assignment of measure shall have three attributes: @numeric, @unit, and @type. A possible value
of the attribute @numeric is a real number. A possible value of @unit is one of the units in a system
conventionally accepted such as one of the SI base units or derived units. A possible value of @type is
one of the quantities listed as ISQ base quantities or derived quantities, such as length, mass, voltage,
and so on.
© ISO 2021 – All rights reserved 5
---------------------- Page: 13 ----------------------
SIST ISO 24617-11:2021
ISO 24617-11:2021(E)
4.5 Concrete syntaxes of QML (QML_cs) and its subsets
An abstract syntax shall allow several semantically equivalent concrete syntaxes. QML_as likewise
allows a set of equivalent concrete syntaxes of QML(QML_cs). This document introduces two kinds of
concrete syntaxes, QML_csx and QML_csf, in Clause 5 and Clause 6, respectively.
The two concrete syntaxes, QML_csx and QML_csf, are both based on the abstract syntax QML_as, while
adopting XML as their representation language. They shall comply with the requirement of standoff
annotation in ISO 24612.
These two concrete syntaxes do, however, differ from each other in at least two aspects. Just like the
other Parts of ISO 24617 on semantic annotation, such as ISO 24617-1, ISO 24617-7, and ISO 24617-6,
QML_csx does not separate annotation content structures from their anchoring (referencing)
structures, although this separation is required by LAF for linguistic annotation.
In contrast, QML_csf is feature-structure-based. It shall follow LAF for the separation of the two
structures, anchoring and content structures in representing measurement information in feature
structures. Furthermore, QML_cst, as specified in this document, shall adopt the names of XML
elements and attributes with value type specifications from the TEI P 5 Guidelines of the Text Encoding
Initiative Consortium for the representation of MQI.
5 XML-based concrete syntax of QML (QML_csx)
5.1 General
The XML-based concrete syntax QML_csx is introduced in two steps. The first step is to list the tag
names and ID prefixes of QML_csx in 5.2. The second step is to specify the attribute assignments for the
XML root in 5.3, for each of the basic element types listed in 5.4, and for each of the link types listed in
5.5.
NOTE The root tag is introduced in XML to embed a list of XML elements into a single structure.
5.2 Tag names with ID prefixes
Corresponding to each of the basic element types and the link types for QML_csx, there is a unique tag
and a unique ID prefix, as shown in Table 3.
Table 3 — List of tags and ID prefixes of QML_csx
Tags ID prefixes Comment
Root mqi XML root tag
Basic element types
Entity x object to which
...
INTERNATIONAL ISO
STANDARD 24617-11
First edition
2021-08
Language resource management —
Semantic annotation framework
(SemAF) —
Part 11:
Measurable quantitative information
(MQI)
Gestion des ressources linguistiques — Cadre d'annotation
sémantique (SemAF) —
Partie 11: Informations quantitatives mesurables (MQI)
Reference number
ISO 24617-11:2021(E)
©
ISO 2021
---------------------- Page: 1 ----------------------
ISO 24617-11:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24617-11:2021(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abstract specification of QML . 3
4.1 Overview . 3
4.2 Characteristics of QML . 4
4.3 Metamodel . 4
4.4 Abstract syntax of QML (QML_as) . 5
4.5 Concrete syntaxes of QML (QML_cs) and its subsets . 6
5 XML-based concrete syntax of QML (QML_csx) . 6
5.1 General . 6
5.2 Tag names with ID prefixes . 6
5.3 Attribute specification of the root . 7
5.4 Attribute specification of the basic element types . 7
5.5 Attribute specification of the link types . 8
5.6 Illustrations of QML_csx . 8
5.6.1 General. 8
5.6.2 Sample data . 8
5.6.3 Procedure of annotation . 9
6 TEI-based concrete syntax of QML (QML_cst) .11
6.1 Concrete syntaxes of QML (QML_cst) .11
6.1.1 Overall .11
6.1.2 Tag names with ID prefixes .11
6.1.3 Attribute specification of the basic element types .11
6.1.4 Attribute specification of the two link types .12
6.2 Illustrations of QML_cst .12
6.2.1 Overall .12
6.2.2 Sample data .12
6.2.3 Illustrations of TEI-based Concrete Syntax.13
Annex A (informative) Illustrations of QML_csx with more samples .16
Annex B (informative) Informal statements of MQI .19
Annex C (informative) The representation of units .20
Bibliography .21
© ISO 2021 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24617-11:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24617-11:2021(E)
Introduction
Measurable quantitative information (MQI) such as ‘165 cm’ or ‘60 kg’ of ‘John’ that applies to the height
or weight of the person is very common in ordinary language. MQI describes one of basic properties
that is associated with the magnitude aspect of quantity. The main characteristics of MQI is that
quantitative information is presented as measures expressed in terms of a pair , consisting of
a numerically expressed quantity n and a unit u, which is either basic or derived, or either normalized
or conventionally used. Such information is much more abundant in scientific publications or technical
reports to the extent that it constitutes an essential part of communicative segments of language in
general. The processing of such information is thus required for any successful language resource
management.
In such a big data era, demands from industry and academic communities for a precise acquisition of
measurable quantitative information have increased. For example, business investment companies
frequently need to aggregate various sorts of information covering net sales, gross profit, operating
expenses, operating profit, interest expense, net profit before taxes, net income, etc., of the target
companies from their annual reports. The fast-growing medical informatics research also needs
to process a large amount of medical texts to analyze the dose of medicine, the eligibility criteria of
[8]
clinical trial, the phenotype characters of patients, the lab tests in clinical records, etc. . All these
demands either in industry or in medical research require the accurate and consistent representation
of measurable quantitative information for automated processing, computation, and exchange.
However, in the IR and NLP areas, there is no standardized way of representing measurable quantitative
information currently available. Each application system developed in industrial sectors has hitherto
used its own format to annotate measurable quantitative information. A flexible, interoperable and
standardized measurable quantitative information representation format for IR and NLP tasks to work
with many different application systems is called for.
This document aims at formulating a general annotation scheme with following the principles of
semantic annotation laid down in ISO 24617-6 in general and the basic requirements of ISO 24611,
that facilitates the processing of MQI in scientific and technical language and to make it interoperable
with other semantic annotation schemes, such as ISO 24617. The annotation scheme is designed to be
interoperable with other parts of ISO 24617. It also utilizes various ISO standards on lexical resources
and morpho-syntactic annotation frameworks. It aims at being compatible with other existing relevant
standards.
NOTE ISO 24617-1 and ISO 24617-7, for instance, have proposed a way of annotating measures on time
(durations or time amounts) and space (distances), respectively. ISO 24612 provides a pivotal form (graphic
annotation framework) that makes all the annotation of temporal or spatial measures in these two annotation
schemes.
QML is normalized at the abstract level that allows various serialization formats representing annotated
measurable quantitative information such as an XML-based representation. The normalization of QI
(quantitative information) annotation is stated at the abstract level of annotation, and the standoff
annotation format is adopted at the concrete level of serialization.
Focusing on measurements in scientifico-technological language, this document is expected to
[9]
contribute to information extraction (IR) , question answering (QA), text summarization (TS), and
[10]
other natural language processing (NLP) applications .
© ISO 2021 – All rights reserved v
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24617-11:2021(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 11:
Measurable quantitative information (MQI)
1 Scope
This document covers the measurable or magnitudinal aspect of quantity so that it can focus on the
technical or practical use of measurements in IR (information retrieval), QA (question answering), TS
(text summarization), and other NLP (natural language processing) applications. It is applicable to the
domains of technology that carry more applicational relevance than some theoretical issues found in
the ordinary use of language.
NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative
information.
This document also treats temporal durations that are discussed in ISO 24617-1, and spatial
measures such as distances that are treated ISO 24617-7, while making them interoperable with other
measure types. It also accommodates the treatment of measures or amounts that are introduced in
ISO 24617-6:2016, 8.3.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
quantity
property of a measurable object referring to its magnitude or multitude
[SOURCE: ISO/IEC Guide 99:2007, 1.1, modified — Definition substantially redrafted, and Notes
removed.]
© ISO 2021 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO 24617-11:2021(E)
3.2
base quantity
quantity (3.1) in a conventionally chosen subset of a given system of quantities, where no quantity in
the subset can be expressed in terms of the other quantities within that subset
Note 1 to entry: Kinds of quantities include seven base quantities defined by the International System of
Quantities (ISQ).
[SOURCE: ISO/IEC Guide 99:2007, 1.4, modified — "no subset quantity" replaced with "no quantity in
the subset", "the others" replaced with "the other quantities within that subset", and Notes and Example
removed.]
3.3
derived quantity
quantity (3.1), in a system of quantities, defined in terms of the base quantities (3.2) of that system
-1
EXAMPLE Speed is a derived quantity defined by length (distance) over time (LT ), where length (L) and
time (T) are base quantities.
[SOURCE: ISO/IEC Guide 99:2009, 1.5, modified — Example replaced.]
3.4
quantitative information
QI
measurement associated with the quantity (3.1) of a measurable object
3.5
measurable quantitative information
MQI
quantitative information (3.4) that can be expressed in unitized numeric terms
3.6
measurable quantitative information markup language
markup language of measurable quantitative information
quantitative markup language
QML
specification language for the annotation of measurable quantitative information (3.5) extractable from
text or other medium types of language
3.7
measurement unit
unit of measurement
unit
scalar basis, defined and adopted by convention, of measuring objects by multiplying their quantitative
values expressed in real numbers
Note 1 to entry: The expressions that are used in measurement such as “metre”, “litre”, and “µmol/kg” are units
by the definition given above. The multitude expressions such as “bottles”, “boxes”, or “two” as in “two bottles of
milk”, “a box of apples”, and “two coffees” sometimes fail to be regarded as units, but they can also be if they are
accepted as units by convention or agreement in some communities. ISO 24617 SemAF Part 12: Quantification
treats such multitude expressions as genuine units.
[SOURCE: ISO/IEC Guide 99:2007, 1.9, modified — Definition substantilly redrafted, original Notes
removed, new Note 1 to entry added.]
3.8
base unit
measurement unit (3.7) that is adopted by convention for a base quantity (3.2)
Note 1 to entry: There are seven base units chosen by the International System of Units (SI) associated with
seven ISQ base quantities to measure quantities, as shown in Table 1.
2 © ISO 2021 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24617-11:2021(E)
Table 1 — Base units
SI base unit Associated ISQ base quantity
(unit symbol) (base quantity symbol)
metre (m) length (L)
kilogram (kg) mass (M)
second (s) time (T)
ampere (A) electric current (I)
kelvin (K) thermodynamic temperature (È)
mole (mol) amount of substance (N)
candela (cd) luminous intensity (J)
[SOURCE: ISO/IEC Guide 99:2007, 1.10, modified — Notes and Examples removes, new Note 1 to entry
and Table 1 added.]
3.9
derived unit
measurement unit (3.7) for a derived quantity (3.3)
EXAMPLE The unit “newton” (N) is a derived unit for a derived quantity “force” (F), which is defined to be
-2
“mass times acceleration” (MLT ), where the quantity “acceleration” is a derived quantity defined by “velocity
-1 -1
divided by time” (VT ) and “velocity” defined by “length (distance) divided by time” (LT ).
Note 1 to entry: Table 2 illustrates some of the derived units.
[SOURCE: ISO/IEC Guide 99:2007, 1.11, modified — Examples removed, new Example and Note 1 to
entry added.]
Table 2 — derived units
Derived unit Associated derived quantity
(unit symbol)
kilometre per minute(km/min) speed = length(L)/ time(T)
3 3
gram per cubic metre (gram/m ) density = mass(M)/volume(L )
2 2
kilogram metre per square second (kg x m/s ) force = mass (M) x length(L)/time(T )
2 2
lumen per square metre (lm/m ) Illuminance = luminous intensity (J)/area(M )
4 Abstract specification of QML
4.1 Overview
The quantitative markup language (QML) (3.6) is specified at two levels, abstract and concrete. Some
characteristics of QML are listed in 4.2. The overall structure of QML is represented by a metamodel, as
introduced in 4.3. The abstract syntax of QML as QML_as shall be a set-theoretic specification of QML in
conceptual terms that are independent of ways of representing the annotation (content) of measurable
quantitative information. The concrete syntax of QML as QML_cs shall be a specification of a set of
representation formats, based on QML_as, for the annotation of measurable quantitative information
in a computationally tractable way. The QML_as is introduced in 4.4, while QML_cs is presented in
4.5. Equivalent concrete syntaxes, including an XML-based concrete syntax QML_csx and a TEI-based
concrete syntax QML_cst, are described in Clause 5 and Clause 6, respectively.
NOTE There can be many equivalent concrete syntaxes defined on a single abstract syntax.
© ISO 2021 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO 24617-11:2021(E)
4.2 Characteristics of QML
QML shall have the following characteristics.
a) QML shall focus on the annotation of the measurable attributes of entities. For example, “BMI
2
between 10-20 kg/ m ”.
b) QML shall provide a way to annotate the relations of measures. For example, “age 40 or older” and
“fpg>=100 mg/dl or a1c not less than 5,8 %”.
c) QML shall cover the complex uses of unitized numeric quantities. For example, “14,0 × 109”,
“glycosylated haemoglobin (hba1c) <1,15 times the upper limit of normal”.
d) QML shall facilitate the identification of normalized numeric and units, as the measurable attribute
of an associated entity.
NOTE QML does not specify ways of annotating the normalization (e.g. “millimoles per litre” is normalized to
2
“mmol/l”) or complete specification (e.g. “kg/m” is “kg/m ” for BMI) of units, which will be dealt with in another
part of ISO 24617 addressing automated implementation of MQI.
4.3 Metamodel
The overall structure of measurable quantitative information is represented by the metamodel in
Figure 1.
Figure 1 — Metamodel of measurable quantitative information
This metamodel shall consist of seven class components, represented as square boxes in Figure 1:
a) source data as input to the annotation of MQI,
b) markables extracted from data sources,
c) three types of basic elements: entity, measure, and relator,
d) two types of links: measure link and comparison link.
The element “entity” shall be any object that has the property of a measurable quantity, represented by
“@quantity”, as one of its properties. The “entity”, as is used in this document, shall be a very general
term that refers to any object, not just to individual entities, but also to their properties, such as
4 © ISO 2021 – All rights reserved
---------------------- Page: 9 ----------------------
ISO 24617-11:2021(E)
“height” of a building or “speed” of a car, and also to any kinds of eventualities such as states, processes
or transitions.
EXAMPLE 1 We drove at more than 200 kilometres per hour on a German autobahn.
The speed mentioned by “more than 200 kilometres per hour” applies to the quantitative property of a
motion: e.g. the measure “over 200 kilometres per hour” applies to the motion of driving mentioned in
the example.
The element “measure” represents a measurable quantity of an entity in terms of three attributes:
quantity, unit, and type.
EXAMPLE 2 The height of Mt. Hall is 1 950 metres.
The measure shall consist of a quantity referred to by a numeric expression “1 950” and a unit “metre”.
It applies to the “height” quantity of the geographical object, named “Mt. Hall”.
The element “relator” which is associated with markables such as “equal to”, “greater than”, “<=”,
“between”, or “at least” has only a functional status of relating two or more measures.
EXAMPLE 3 One pound equals 16 ounces.
It is a relator of identity between two measures, “one pound” and “16 ounces”.
EXAMPLE 4 1 foot is less than 1 metre, for it is exactly equal to 30,48 cm.
This example illustrates two types of links between measures: the relation of being “less than”, and that
of being an identity.
A link of the type “measure” shall relate a measure to the quantitative property of an entity. Such a link
is triggered by a measure element.
A link of the type “comparison” shall relate a measure to another or other more measures. Such a link is
often triggered by an element “comparison”.
4.4 Abstract syntax of QML (QML_as)
A markup language QML shall be a specification language for the annotation of MQI. The abstract syntax
of QML shall specifie an annotation scheme in set-theoretic terms based on a conceptual understanding
of MQI. The abstract syntax QML_as is understood to be structured as a triple such that
a) B is a set of three basic element types: entity, measure, and relator;
b) R is a set of two link types: measure and comparison types;
c) @ is a set of assignments that specify the list of attributes and their value types associated with
each of the basic element types in B and each of the link types in R.
Every element in B shall have at least one attribute, @type, and so does every link. The values of @
type are CDATA associated with each of the elements. For instance, the entity of “mountain” is of the
“geographical” type, and the entity named “John” is of the “person” type.
The values of @quantity for an entity are CDATA that may include values such as height, width, or
weight, and so on.
The assignment of measure shall have three attributes: @numeric, @unit, and @type. A possible value
of the attribute @numeric is a real number. A possible value of @unit is one of the units in a system
conventionally accepted such as one of the SI base units or derived units. A possible value of @type is
one of the quantities listed as ISQ base quantities or derived quantities, such as length, mass, voltage,
and so on.
© ISO 2021 – All rights reserved 5
---------------------- Page: 10 ----------------------
ISO 24617-11:2021(E)
4.5 Concrete syntaxes of QML (QML_cs) and its subsets
An abstract syntax shall allow several semantically equivalent concrete syntaxes. QML_as likewise
allows a set of equivalent concrete syntaxes of QML(QML_cs). This document introduces two kinds of
concrete syntaxes, QML_csx and QML_csf, in Clause 5 and Clause 6, respectively.
The two concrete syntaxes, QML_csx and QML_csf, are both based on the abstract syntax QML_as, while
adopting XML as their representation language. They shall comply with the requirement of standoff
annotation in ISO 24612.
These two concrete syntaxes do, however, differ from each other in at least two aspects. Just like the
other Parts of ISO 24617 on semantic annotation, such as ISO 24617-1, ISO 24617-7, and ISO 24617-6,
QML_csx does not separate annotation content structures from their anchoring (referencing)
structures, although this separation is required by LAF for linguistic annotation.
In contrast, QML_csf is feature-structure-based. It shall follow LAF for the separation of the two
structures, anchoring and content structures in representing measurement information in feature
structures. Furthermore, QML_cst, as specified in this document, shall adopt the names of XML
elements and attributes with value type specifications from the TEI P 5 Guidelines of the Text Encoding
Initiative Consortium for the representation of MQI.
5 XML-based concrete syntax of QML (QML_csx)
5.1 General
The XML-based concrete syntax QML_csx is introduced in two steps. The first step is to list the tag
names and ID prefixes of QML_csx in 5.2. The second step is to specify the attribute assignments for the
XML root in 5.3, for each of the basic element types listed in 5.4, and for each of the link types listed in
5.5.
NOTE The root tag is introduced in XML to embed a list of XML elements into a single structure.
5.2 Tag names with ID prefixes
Corresponding to each of the basic element types and the link types for QML_csx, there is a unique tag
and a unique ID prefix, as shown in Table 3.
Table 3 — List of tags and ID prefixes of QML_csx
Tags ID prefixes Comment
Root mqi XML root tag
Basic element types
Entity x object to which a measure applies
Measure me unitized numeric quantities only
Relator c triggers a link relating measures
Link types
Measure link mL relates a measure to an entity and is triggered
by a measure
Comparison link cL relates a measure to another or other more
measures
NOTE The attribute name for each ID in XML is xml:id and each of its values is an ID prefix followed by a
positive integer, e.g. .
6 © ISO 2021 – All rights reserved
---------------------- Page: 11 ----------------------
ISO 24617-11:2021(E)
5.3 Attribute specification of the root
List 1: A list of attributes for in extended BNF (Backus-Naur form)
attributes = identifier, target, [lang], [mediumType], [source]; {* Attributes in square brackets are
optional; otherwise, they are required. *}
identifier = mqi + positive integer;
target = IDREF | CDATA; {* refers to the ID of a sequence of communicative segments in data sources
or such a segment sequence directly quoted*};
lang = CDATA; {* refers to ISO 639 standard on language codes*}
mediumType = CDATA; {* text, video, image, etc.*}
source = CDATA. {* refers to the source of the data*}
5.4 Attribute specification of the basic element types
List 2: A list of attributes for in extended BNF
attributes = identifier, target, type, [comment];
identifier = x + positiv
...
NORME ISO
INTERNATIONALE 24617-11
Première édition
2021-08
Gestion des ressources
linguistiques — Cadre d'annotation
sémantique (SemAF) —
Partie 11:
Informations quantitatives
mesurables (MQI)
Language resource management — Semantic annotation framework
(SemAF) —
Part 11: Measurable quantitative information (MQI)
Numéro de référence
ISO 24617-11:2021(F)
©
ISO 2021
---------------------- Page: 1 ----------------------
ISO 24617-11:2021(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2021
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2021 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO 24617-11:2021(F)
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Spécification abstraite de QML . 3
4.1 Vue d’ensemble . 3
4.2 Caractéristiques de QML . 4
4.3 Métamodèle . 4
4.4 Syntaxe abstraite de QML (QML_as) . 5
4.5 Syntaxes concrètes de QML (QML_cs) et de ses sous-ensembles . 6
5 Syntaxe concrète de QML basée sur XML (QML_csx) . 6
5.1 Généralités . 6
5.2 Noms de balises avec préfixes d’ID . 6
5.3 Spécification des attributs de la racine . 7
5.4 Spécification des attributs des types d’éléments de base . 7
5.5 Spécification des attributs des types de liens . 8
5.6 Illustrations de QML_csx . 8
5.6.1 Généralités . 8
5.6.2 Échantillons de données . 8
5.6.3 Procédure d’annotation . 9
6 Syntaxe concrète de QML basée sur la TEI (QML_cst) .11
6.1 Syntaxes concrètes de QML (QML_cst) .11
6.1.1 Généralités .11
6.1.2 Noms de balises avec préfixes d’ID .11
6.1.3 Spécification des attributs des types d’éléments de base .11
6.1.4 Spécification des attributs des deux types de liens .12
6.2 Illustrations de QML_cst .12
6.2.1 Généralités .12
6.2.2 Échantillons de données .13
6.2.3 Illustrations de la syntaxe concrète basée sur la TEI .13
Annexe A (informative) Illustrations de QML_csx avec davantage d’échantillons .16
Annexe B (informative) Énoncés informels de MQI .19
Annexe C (informative) Représentation des unités .20
Bibliographie .21
© ISO 2021 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO 24617-11:2021(F)
Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www
.iso .org/ directives).
L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www .iso .org/ brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion
de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.
Le présent document a été élaboré par le comité ISO/TC 37, Langage et terminologie, sous-comité SC 4,
Gestion des ressources linguistiques.
Une liste de toutes les parties de la série ISO 24617 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www .iso .org/ fr/ members .html.
iv © ISO 2021 – Tous droits réservés
---------------------- Page: 4 ----------------------
ISO 24617-11:2021(F)
Introduction
Les informations quantitatives mesurables (MQI, Measurable Quantitative Information) telles que
«165 cm» ou «60 kg» de «John» qui s’appliquent à la taille ou au poids de la personne sont très courantes
dans le langage ordinaire. Les MQI décrivent l’une des propriétés de base qui est associée à l’aspect
quantitatif d’une grandeur. Les principales caractéristiques de la norme MQI sont que les informations
quantitatives sont présentées sous forme de mesures exprimées en termes de paire < n, u > , consistant
en une grandeur exprimée numériquement n et une unité u, qui est une unité de base ou une unité
dérivée, ou encore une unité normalisée ou utilisée par convention. Ces informations sont beaucoup
plus abondantes dans les publications scientifiques ou les rapports techniques au point qu’elles
constituent une part essentielle des segments communicatifs du langage en général. Le traitement de
ces informations est donc nécessaire pour une gestion réussie des ressources linguistiques.
À l’époque du «big data», les demandes de l’industrie et des milieux universitaires pour une
acquisition précise des informations quantitatives mesurables ont augmenté. Par exemple, les sociétés
d’investissement dans les entreprises ont fréquemment besoin d’agréger différents types d’informations
couvrant les ventes nettes, la marge brute, les frais d’exploitation, le bénéfice d’exploitation, les frais
d’intérêt, le bénéfice net avant impôts, le revenu net, etc. des sociétés cibles à partir de leurs rapports
annuels. La recherche en informatique médicale, en plein essor, a également besoin de traiter une
grande quantité de textes médicaux pour analyser la dose de médicament, les critères d’éligibilité des
essais cliniques, les caractères phénotypiques des patients, les essais en laboratoire dans les dossiers
[8]
cliniques, etc. . Toutes ces demandes, qu’elles soient liées à l’industrie ou à la recherche médicale,
exigent la représentation précise et cohérente des informations quantitatives mesurables afin de
permettre un traitement, un calcul et un échange automatisés.
Cependant, en IR et en PNL, il n’existe actuellement aucun moyen normalisé de représenter les
informations quantitatives mesurables. Chaque système d’application développé dans les secteurs
industriels utilise jusqu’à présent son propre format pour annoter les informations quantitatives
mesurables. Un format de représentation des informations quantitatives mesurables qui soit flexible,
interopérable et normalisé est nécessaire pour permettre aux tâches d’IR et de PNL de fonctionner avec
de nombreux systèmes d’application différents.
Le présent document vise à formuler un schéma d’annotation général en suivant les principes
d’annotation sémantique définis dans l’ISO 24617-6 en général et les exigences de base de l’ISO 24611, qui
facilite le traitement des MQI dans le langage scientifique et technique et afin de le rendre interopérable
avec d’autres schémas d’annotation sémantique, tels que l’ISO 24617. Le schéma d’annotation est conçu
pour être interopérable avec les autres parties de l’ISO 24617. Il s’appuie également sur diverses normes
ISO relatives aux ressources lexicales et aux cadres d’annotation morpho-syntaxique. Il vise à être
compatible avec les autres normes pertinentes existantes.
NOTE L’ISO 24617-1 et l’ISO 24617-7, par exemple, ont proposé un moyen d’annoter les mesures de temps
(durées ou quantités de temps) et d’espace (distances), respectivement. L’ISO 24612 fournit un formulaire pivot
(cadre d’annotation graphique) qui permet de réaliser toutes les annotations de mesures de temps et d’espace
dans ces deux schémas d’annotation.
Le QML est normalisé à un niveau abstrait qui permet divers formats de sérialisation représentant
les informations quantitatives mesurables annotées, tels qu’une représentation basée sur XML.
La normalisation de l’annotation QI (information quantitative) est indiquée au niveau abstrait de
l’annotation, et le format d’annotation déportée est adopté au niveau concret de la sérialisation.
Axé sur les mesures en langage scientifico-technologique, le présent document est censé contribuer aux
[9]
applications d’extraction d’information (IR) , de réponse aux questions (QA), de résumé de texte (TS)
[10]
et autres applications de traitement du langage naturel (NLP) .
© ISO 2021 – Tous droits réservés v
---------------------- Page: 5 ----------------------
NORME INTERNATIONALE ISO 24617-11:2021(F)
Gestion des ressources linguistiques — Cadre d'annotation
sémantique (SemAF) —
Partie 11:
Informations quantitatives mesurables (MQI)
1 Domaine d’application
Le présent document porte sur l’aspect mesurable ou quantitatif de la grandeur, de sorte qu’il est
possible de se concentrer sur l’utilisation technique ou pratique des mesures dans les applications IR
(recherche d’informations), QA (réponse aux questions), TS (résumé de texte) et autres applications NLP
(traitement du langage naturel). Il s’applique aux domaines technologiques qui présentent plus d’intérêt
sur le plan de l’application que certains problèmes théoriques rencontrés dans l’utilisation ordinaire du
langage.
NOTE L’ISO 24617-12 traite des questions plus générales et théoriques de la quantification et de l’information
quantitative.
Le présent document traite également des durées temporelles qui sont abordées dans l’ISO 24617-1 et
des mesures spatiales telles que les distances qui sont traitées dans l’ISO 24617-7, tout en les rendant
interopérables avec d’autres types de mesures. Il intègre également le traitement des mesures ou des
montants qui sont introduits dans l’ISO 24617-6:2016, 8.3.
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s’applique (y compris les
éventuels amendements).
ISO 24612, Gestion des ressources linguistiques — Cadre d'annotation linguistique (LAF)
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions suivants s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;
— IEC Electropedia: disponible à l’adresse https:// www .electropedia .org/ .
3.1
grandeur
propriété d’un objet mesurable se référant à son ampleur ou à sa multiplicité
[SOURCE: ISO/IEC Guide 99:2007, 1.1, modifiée — La définition a été considérablement remaniée et les
notes ont été supprimées.]
© ISO 2021 – Tous droits réservés 1
---------------------- Page: 6 ----------------------
ISO 24617-11:2021(F)
3.2
grandeur de base
grandeur (3.1) d’un sous-ensemble choisi par convention dans un système de grandeurs donné de façon
qu’aucune grandeur du sous-ensemble ne puisse être exprimée en fonction des autres grandeurs de ce
sous-ensemble
Note 1 à l'article: La nature des grandeurs comprend sept grandeurs de base définies par le Système international
de grandeurs (ISQ).
[SOURCE: ISO/IEC Guide 99:2007, 1.4, modifiée — L’expression «des autres» a été remplacée par «des
autres grandeurs de ce sous-ensemble», et les notes ainsi que l’exemple ont été supprimés.]
3.3
grandeur dérivée
grandeur définie (3.1), dans un système de grandeurs, en fonction des grandeurs de base (3.2) de ce
système
−1
EXEMPLE La vitesse est une grandeur dérivée définie par la longueur (distance) par rapport au temps (LT ),
où la longueur (L) et le temps (T) sont des grandeurs de base.
[SOURCE: ISO/IEC Guide 99:2009, 1.5, modifiée — L’exemple a été remplacé.]
3.4
information quantitative
QI
mesure associée à la grandeur (3.1) d’un objet mesurable
3.5
information quantitative mesurable
MQI
information quantitative (3.4) qui peut être exprimée en termes numériques unifiés
3.6
langage de balisage des informations quantitatives mesurables
langage de balisage quantitatif
QML
langage de spécification pour l’annotation des informations quantitatives mesurables (3.5) extractibles
de textes ou d’autres types de support de langage
3.7
unité de mesure
unité
base scalaire, définie et adoptée par convention, de la mesure des objets par multiplication de leurs
valeurs quantitatives exprimées en nombres réels
Note 1 à l'article: Les expressions utilisées en mesurage telles que «mètre», «litre» et «µmol/kg» sont des unités
selon la définition donnée ci-dessus. Les expressions de multiplicité telles que «bouteilles», «boîtes» ou «deux»
comme dans «deux bouteilles de lait», «une boîte de pommes» et «deux cafés» ne sont parfois pas considérées
comme des unités, mais elles peuvent l’être si elles sont acceptées comme unités par convention ou accord dans
certaines communautés. L’ISO 24617 SemAF Partie 12: Quantification traite ces expressions de multiplicité
comme de véritables unités.
[SOURCE: ISO/IEC Guide 99:2007, 1.9, modifiée — La définition a été considérablement remaniée, les
notes d’origine ont été supprimées et une nouvelle Note 1 à l’article a été ajoutée.]
3.8
unité de base
unité de mesure (3.7) adoptée par convention pour une grandeur de base (3.2)
Note 1 à l'article: Il existe sept unités de base choisies par le Système international d’unités (SI) associées à sept
grandeurs de base ISQ pour mesurer les grandeurs, comme indiqué dans le Tableau 1.
2 © ISO 2021 – Tous droits réservés
---------------------- Page: 7 ----------------------
ISO 24617-11:2021(F)
Tableau 1 — Unités de base
Unité SI de base Grandeur de base de l’ISQ associée
(symbole de l’unité) (symbole de la grandeur de base)
mètre (m) longueur (L)
kilogramme (kg) masse (M)
seconde (s) temps (T)
ampère (A) courant électrique (I)
kelvin (K) température thermodynamique (È)
mole (mol) quantité de matière (N)
candela (cd) intensité lumineuse (J)
[SOURCE: ISO/IEC Guide 99:2007, 1.10, modifiée — Les notes et les exemples ont été supprimés, et une
nouvelle Note 1 à l’article ainsi que le Tableau 1 ont été ajoutés.]
3.9
unité dérivée
unité de mesure (3.7) d’une grandeur dérivée (3.3)
EXEMPLE L’unité «newton» (N) est une unité dérivée pour une grandeur dérivée «force» (F), qui est définie
−2
comme la «masse multipliée par l’accélération» (MLT ), où la grandeur «accélération» est une grandeur dérivée
−1
définie par la «vitesse divisée par le temps» (VT ) et la «vitesse» définie par la «longueur (distance) divisée par
−1
le temps» (LT ).
Note 1 à l'article: Le Tableau 2 illustre certaines des unités dérivées.
[SOURCE: ISO/IEC Guide 99:2007, 1.11, modifiée — Les exemples ont été supprimés et un nouvel
exemple ainsi que la Note 1 à l’article ont été ajoutés.]
Tableau 2 — Unités dérivées
Unité dérivée Grandeur dérivée associée
(symbole de l’unité)
kilomètre par minute (km/min) vitesse = longueur(L)/temps(T)
3 3
gramme par mètre cube (g/m ) masse volumique = masse(M)/volume(L )
2 2
kilogramme mètre par seconde carrée (kg x m/s ) force = masse (M) x longueur(L)/temps(T )
2 2
lumen par mètre carré (lm/m ) éclairement lumineux = intensité lumineuse (J)/aire(M )
4 Spécification abstraite de QML
4.1 Vue d’ensemble
Le langage de balisage quantitatif (QML) (3.6) est spécifié à deux niveaux, abstrait et concret. Certaines
caractéristiques de QML sont énumérées en 4.2. La structure globale de QML est représentée par
un métamodèle, tel que présenté en 4.3. La syntaxe abstraite de QML comme QML_as doit être une
spécification ensembliste de QML en termes conceptuels qui sont indépendants des manières de
représenter l’annotation (contenu) des informations quantitatives mesurables. La syntaxe concrète
de QML comme QML_cs doit être une spécification d’un ensemble de formats de représentation, basé
sur QML_as, pour l’annotation des informations quantitatives mesurables d’une manière traçable
informatiquement. QML_as est présenté en 4.4, tandis que QML_cs est présenté en 4.5. Les syntaxes
concrètes équivalentes, dont une syntaxe concrète QML_csx basée sur XML et une syntaxe concrète
QML_cst basée sur la TEI, sont décrites à l’Article 5 et à l’Article 6 respectivement.
NOTE Il peut y avoir de nombreuses syntaxes concrètes équivalentes définies sur une seule syntaxe
abstraite.
© ISO 2021 – Tous droits réservés 3
---------------------- Page: 8 ----------------------
ISO 24617-11:2021(F)
4.2 Caractéristiques de QML
Le QML doit présenter les caractéristiques suivantes:
a) le QML doit être axé sur l’annotation des attributs mesurables des entités. Par exemple, «IMC entre
2
10-20 kg/m »;
b) le QML doit permettre d’annoter les relations des mesures. Par exemple, «âge 40 ou plus» et
«fpg >= 100 mg/dl ou a1c pas moins de 5,8 %»;
c) le QML doit couvrir les utilisations complexes de grandeurs numériques unifiées. Par exemple,
«14,0 × 109», «hémoglobine glyquée (hba1c) < 1,15 fois la limite supérieure de la normale»;
d) le QML doit faciliter l’identification d’unités et numériques normalisés en tant qu’attribut mesurable
d’une entité associée.
NOTE Le QML ne spécifie pas les moyens d’annoter la normalisation (par exemple, «millimoles par litre»
2
est normalisé par «mmol/l ») ou la spécification complète (par exemple, «kg/m» s’écrit «kg/m » pour l’IMC) des
unités, ce qui sera abordé dans une autre partie de l’ISO 24617 traitant de la mise en œuvre automatisée des MQI.
4.3 Métamodèle
La structure globale des informations quantitatives mesurables est représentée par le métamodèle de
la Figure 1.
Figure 1 — Métamodèle des informations quantitatives mesurables
Ce métamodèle doit se composer de sept composantes de classe, représentées par des cases carrées à la
Figure 1:
a) données sources en entrée pour l’annotation des MQI;
b) marqueurs extraits des sources de données;
c) trois types d’éléments de base: entité, mesure, et relateur;
d) deux types de liens: lien de mesure et lien de comparaison.
L’élément «entité» doit être tout objet qui a la propriété d’une grandeur mesurable, représentée par «@
grandeur», comme l’une de ses propriétés. L’« entité », telle qu’elle est utilisée dans le présent document,
doit être un terme très général qui fait référence à tout objet, non seulement à des entités individuelles,
4 © ISO 2021 – Tous droits réservés
---------------------- Page: 9 ----------------------
ISO 24617-11:2021(F)
mais aussi à leurs propriétés, telles que la «hauteur» d’un bâtiment ou la «vitesse» d’une voiture, ainsi
que toutes sortes d’éventualités telles que des états, des processus ou des transitions.
EXEMPLE 1 Nous avons roulé à plus de 200 kilomètres à l’heure sur une autoroute allemande.
La vitesse mentionnée par «plus de 200 kilomètres à l’heure» s’applique à la propriété quantitative d’un
mouvement: par exemple, la mesure «plus de 200 kilomètres à l’heure» s’applique au mouvement de
rouler mentionné dans l’exemple.
L’élément «mesure» représente une grandeur mesurable d’une entité selon trois attributs: grandeur,
unité et type.
EXEMPLE 2 La hauteur du mont Hall est de 1 950 mètres.
La mesure doit consister en une grandeur désignée par une expression numérique «1 950» et une unité
«mètre». Elle s’applique à la grandeur «hauteur» de l’objet géographique, nommé «mont Hall».
L’élément «relateur» qui est associé aux marqueurs tels que «égal à», «supérieur à», «<=», «entre» ou
«au moins» n’a que le statut fonctionnel de relier deux mesures ou plus.
EXEMPLE 3 Une livre équivaut à 16 onces.
Il s’agit un relateur d’identité entre deux mesures, «une livre» et «16 onces».
EXEMPLE 4 1 ft est inférieur à 1 mètre, car il est exactement égal à 30,48 cm.
Cet exemple illustre deux types de liens entre les mesures: la relation d’être «inférieur à» et celle d’être
une identité.
Un lien de type «mesure» doit relier une mesure à la propriété quantitative d’une entité. Un tel lien est
déclenché par un élément de mesure.
Un lien de type «comparaison» doit relier une mesure à une autre ou à plusieurs autres mesures. Un tel
lien est souvent déclenché par un élément de «comparaison».
4.4 Syntaxe abstraite de QML (QML_as)
Un langage de balisage QML doit être un langage de spécification pour l’annotation des MQI. La syntaxe
abstraite de QML doit spécifier un schéma d’annotation en termes de théorie des ensembles basé sur
une compréhension conceptuelle des MQI. La syntaxe abstraite QML_as est considérée comme ayant
une structure triple < B, R, @ > de sorte que:
a) B est un ensemble de trois types d’éléments de base: entité, mesure et relateur;
b) R est un ensemble de deux types de liens: les types mesure et comparaison;
c) @ est un ensemble d’affectations qui spécifient la liste d’attributs et leurs types de valeur associés
à chacun des types d’éléments de base dans B et à chacun des types de liens dans R.
Chaque élément de B doit posséder au moins un attribut, @type, tout comme chaque lien. Les valeurs de
@type sont des éléments CDATA associées à chacun des éléments. Par exemple, l’entité «montagne» est
de type «géographique» et l’entité nommée «John» est de type «personne».
Les valeurs de @grandeur pour une entité sont des éléments CDATA qui peuvent inclure des valeurs
telles que la hauteur, la largeur ou le poids, etc.
L’affectation de mesure doit posséder trois attributs: @numérique, @unité et @type. Une valeur
possible de l’attribut @numérique est un nombre réel. Une valeur possible de @unité est l’une des
unités d’un système accepté par convention, comme l’une des unités SI de base ou des unités dérivées.
Une valeur possible de @type est l’une des grandeurs répertoriées en tant que grandeurs de base de
l’ISQ ou grandeurs dérivées, telles que la longueur, la masse, la tension, etc.
© ISO 2021 – Tous droits réservés 5
---------------------- Page: 10 ----------------------
ISO 24617-11:2021(F)
4.5 Syntaxes concrètes de QML (QML_cs) et de ses sous-ensembles
Une syntaxe abstraite doit permettre plusieurs syntaxes concrètes sémantiquement équivalentes.
QML_as permet ainsi un ensemble de syntaxes concrètes équivalentes de QML (QML_cs). Ce
document présente deux types de syntaxes concrètes, QML_csx et QML_csf, à l’Article 5 et l’Article 6,
respectivement.
Les deux syntaxes concrètes, QML_csx et QML_csf, sont basées sur la syntaxe abstraite QML_as, tout en
adoptant XML comme langage de représentation. Elles doivent être conformes à l’exigence d’annotation
déportée de l’ISO 24612.
Ces deux syntaxes concrètes diffèrent cependant l’une de l’autre sur au moins deux aspects. Tout
comme les autres parties de l’ISO 24617 relatives à l’annotation sémantique, telles que l’ISO 24617-1,
l’ISO 24617-7 et l’ISO 24617-6, QML_csx ne sépare pas les structures du contenu des annotations de
leurs structures d’ancrage (de référencement), bien que cette séparation soit exigée par le LAF pour
l’annotation linguistique.
En revanche, QML_csf est basé sur la structure des caractéristiques. Il doit suivre le LAF pour la
séparation des deux structures, l’ancrage et les structures du contenu lors de la représentation des
informations de mesure dans les structures des caractéristiques. En outre, QML_cst, tel que spécifié
dans le présent document, doit adopter les noms des éléments et attributs XML avec les spécifications
des types de valeur des lignes directrices TEI P 5
...
SLOVENSKI STANDARD
oSIST ISO/DIS 24617-11:2021
01-marec-2021
Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 11.
del: Merljive kvantitativne informacije (MQI)
Language resource management -- Semantic annotation framework (SemAF) - Part 11:
Measurable Quantitative information (MQI)
Gestion des ressources linguistiques -- Cadre d'annotation sémantique - Partie 11:
Mesurer l'information quantitative (MQI)
Ta slovenski standard je istoveten z: ISO/DIS 24617-11
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/DIS 24617-11:2021 en
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
oSIST ISO/DIS 24617-11:2021
---------------------- Page: 2 ----------------------
oSIST ISO/DIS 24617-11:2021
DRAFT INTERNATIONAL STANDARD
ISO/DIS 24617-11
ISO/TC 37/SC 4 Secretariat: KATS
Voting begins on: Voting terminates on:
2020-03-16 2020-06-08
Language resource management — Semantic annotation
framework (SemAF) —
Part 11:
Measurable Quantitative information (MQI)
Gestion des ressources linguistiques — Cadre d'annotation sémantique —
Partie 11: Mesurer l'information quantitative (MQI)
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 24617-11:2020(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
©
PROVIDE SUPPORTING DOCUMENTATION. ISO 2020
---------------------- Page: 3 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
4 Background and Motivations . 4
5 Purposes and Requirements . 5
6 Abstract Specification of SemAF-MQI . 6
6.1 Overview . 6
6.2 Characteristics of SemAF-MQI . 6
6.3 Metamodel . 6
6.4 Abstract syntax of QML (QML_as) . 8
6.5 Concrete Syntaxes of QML (QML_cs) . 8
7 XML-based Concrete Syntax of QML (QML_csx) . 9
7.1 Overall . 9
7.2 Tag names with ID prefixes . 9
7.3 Attribute specification of the root . 9
7.4 Attribute specification of the basic element types . 9
7.5 Attribute specification of the link types and .10
7.6 Illustrations of QML_csx .11
7.6.1 Overall .11
7.6.2 Sample data .11
7.6.3 Procedure of annotation .11
8 TEI-based Concrete Syntax of QML (QML_cst).13
8.1 Concrete syntaxes of QML (QML_cst) .13
8.1.1 Overall .13
8.1.2 Tag names with ID prefixes .14
8.1.3 Attribute specification of the basic element types .14
8.1.4 Attribute specification of the two link types .15
8.2 Illustrations of QML_cst .15
8.2.1 Overall .15
8.2.2 Sample data .15
8.2.3 Illustrations of TEI-based Concrete Syntax.15
Annex A (informative) Illustrations of QML_csx with more samples .19
Annex B (informative) Informal statements of Measurable Quantitative Information .22
Annex C (informative) The representation of units .23
Bibliography .24
© ISO 2020 – All rights reserved iii
---------------------- Page: 5 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO's adherence to the World Trade Organization (WTO)
principles in the Technical Barriers to Trade (TBT) see the following URL: Error! Hyperlink reference
not valid.
The committee responsible for this document is ISO/TC 37, Language and Terminology, Subcommittee
SC 4, Language resource management
ISO 24617 consists of the following parts under the general title Language resource management —
Semantic annotation framework (SemAF):
— Part 1: Time and events (TimeML)
— Part 2: Dialogue acts (DA)
— Part 3: Named entity
— Part 4: Semantic roles (SR)
— Part 5: Discourse structures (DS)
— Part 6: Principles of semantic annotation (SemAF Principles)
— Part 7: Spatial information
— Part 8: Semantic relations in discourse, core annotation schema (DR-core)
— Part 9: Reference annotation framework (RAF)
— Part 10: Visual information (VoxML)
— Part 11: Measurable quantitative information (MQI)
— Part 12: Quantification
— Part 13: Gestures
iv © ISO 2020 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
Introduction
Measurable quantitative information (MQI) such as ‘165 cm’ or ‘60 kg’ of ‘John’ that applies to the height
or weight of the person is very common in ordinary language. MQI describes one of basic properties
which is associated with the magnitude aspect of quantity. Such information is much more abundant
in scientific publications or technical reports to the extent that it constitutes an essential part of
communicative segments of language in general. The processing of such information is thus required
for any successful language resource management.
This document, named ‘SemAF-MQI’, thus aims to focus on specifying a general annotation scheme
with following the principles of semantic annotation laid down in ISO 24617-6 in general and the basic
requirements of ISO 24611 Linguistic annotation framework (LAF), that facilitates the processing of
MQI in scientific and technical language and to make it interoperable with other semantic annotation
schemes, such as ISO 24617 etc.
NOTE 1 ISO 24617-1:2012 (E) TimeML and ISO 24617-7: 2014 (E) Spatial information, for instance,
have proposed a way of annotating measures on time (durations or time amounts) and space (distances),
th
respectively. The serious disucssion of annotating measures as part of ISO 24617 was initiated at the 11 joint
[1]
ACL-ISO/TC 37/SC 4/WG 2 Workshop on Interoperable Semantic Annotation (ISA-11) and was continued at
[2] [3] [4]
the ISA-13 , ISA-14 , and ISA-15 workshops. ISO 24612: 2012 (E) LAF provides a pivotal form (GrAF, graphic
annotation framework) that makes all the annotation of temporal or spatial measures in these two annotation
schemes interchangeable with those measure annotations in the new document SemAF-MQI.
Focusing on measurements in scientifico-technological language, SemAF-MQI as an ISO standard is
[5]
expected to contribute to information extraction (IR) , question answering (QA), text summarization
[6]
(TS), and other natural language processing (NLP) applications .
NOTE 2 To enhance the readability of this document and to correct some obvisous editorial errors, some
editorial changes were made on the earlier version of CD 24617-11 MQI that had been submitted to the successful
CD ballot (2019-09-11 ~ 2019-11-06) with a 100% approval but with no comments.
• Each item in Bibliography as well as in Clause 2 Normative references was made to be referred to in
the main part of the current version of the docment.
• Three of the illustrative examples in clause 7.6 Illustrations of QML_csx were moved to a newly
created Annex A (informative) without any change of content change in order to lighten the burden
of reading that clause 7.6.
• Incorrect wordings or obvious typos were corrected.
• The white and black coloing of Figure 1 — Metamodel of QML was changed to the multiple coloring
to bring out each of the different components of the metamodel.
© ISO 2020 – All rights reserved v
---------------------- Page: 7 ----------------------
oSIST ISO/DIS 24617-11:2021
---------------------- Page: 8 ----------------------
oSIST ISO/DIS 24617-11:2021
DRAFT INTERNATIONAL STANDARD ISO/DIS 24617-11:2020(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 11:
Measurable Quantitative information (MQI)
1 Scope
As one of the basic physical properties, quantity is associated with multitude (how many) and magnitude
(how much). Focusing on the magnitudinal aspect of quantity, this document, which is named “SemAF-
MQI” henceforth, aims at formulating a specification language for the construction of an annotation
scheme for measurable quantitative information (MQI) in scientifico-technological language. The main
characteristics of SemAF-MQI is that quantitative information is presented as measures expressed in
terms of a pair , consisting of a numerically expressed quantity n and a unit u, which is either
basic or derived, or either normalized or conventionally used.
NOTE 1 MQI stands for “measurable quantitative information”, whereas SemAF-MQI refers to the part 11 of
ISO 24617-11. [See 3.4 for the definition of MQI.]
The scope of SemAF-MQI is restricted to the measurable or magnitudinal aspect of quantity so that it
can focus on the technical or practical use of measurements in IR (information retrieval), QA (question
answering), TS (text summarization), and other NLP (natural language processing) applications. The
scope is restricted to the domains of technology that carry more applicational relevance than some
theoretical issues found in the ordinary use of language. The subsequent part of ISO 24617 (Part 12)
deals with more general and theoretical issues of quantification and quantitative information.
NOTE 2 The scope of this document is intentionally restricted to the measurable or magnitudinal aspect of
quantity so that SemAF-MQI focuses on the technical or practical use of measurements in IR, QA, TS, and other
NLP applications. The scope is restricted to domains of technology that carry more applicational relevance than
theoretical issues found in the ordinary use of language. Fruit as well as meat is, for instance, sold at markets
in terms of weight but not of pieces. Furthermore, the subsequent part of ISO 24617 (Part 12) deals with more
general and theoretical issues of quantification and plurals (e.g., “three apples) including quantitative information
that includes multitudinal aspects.
The scope of SemAF-MQI also treats temporal durations that are discussed in Part 1 of ISO 24617
SemAF-Time (ISO-TimeML) and spatial measures such as distances that are treated in Part 7 of
ISO 24617 Spatial information (ISO-Space), while making them interoperable with other measure types.
It also accommodates the treatment of measures or amounts that are introduced in ISO 24617-6 SemAF
Principles (Clause 8.3).
NOTE 3 The scope of this document (Part 11) also treats temporal durations that are discussed in Part 1 of
ISO 24617 SemAF-Time (TimeML) and spatial measures such as distances that are treated in Part 7 of ISO 24617
Spatial information, while making them interoperable with other measure types. It also accommodates the
treatment of measures or amounts that are introduced in ISO 24617-6 SemAF Principles. Its scope thus covers
temporal durations treated in XSchema and the TEI Guidelines.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612:2012, Language resource management — Linguistic annotation framework (LAF)
© ISO 2020 – All rights reserved 1
---------------------- Page: 9 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
ISO 24617-1:2012, Language resource management — Semantic annotation framework (SemAF) — Part 1:
Time and events (SemAF-Time, ISO-TimeML)
ISO 24617-6:2016, Language resource management — Semantic annotation framework — Part 6:
Principles of semantic annotation (SemAF Principles)
ISO 24617-7:2014, Language resource management — Semantic annotation framework — Part 7: Spatial
information (ISOspace)
ISO/IEC 14977:1996, Information technology - Syntactic metalanguage - Extended BNF
ISO 80000-1:2009, Quantities and units — Part 1: General
NOTE 1 The following two documents are de-facto standards to be followed by SemAF-MQI:
[7]
TEI P5: Guidelines for Electronic Text Encoding and Interchange, The TEI Consortium, 2019 .
[8]
XML Schema, Part 2: Datatypes, 2nd edition, W3C Recommendation, 28 October 2004 .
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http:// www .electropedia .org/
— ISO Online browsing platform: available at https:// www .iso .org/ obp
3.1
quantity
property of a measureable object referring to its magnitude (how much) or multitude (how many).
Note 1 to entry: Compare with ISO 80000-1:2009, 3 Terms and Definitions, 3.1: property of a phenomenon, body,
or substance, where the property has a magnitude that can be expressed by means of a number and a reference.
3.2
base quantity
quantity in a conventionally chosen subset of a given system of quantities, where no quantity in the
subset can be expressed in terms of the other quantities within that subset
Note 1 to entry: Kinds of quantities include seven base quantities defined by the International System of
Quantities (ISQ), as listed in Table 1
Table 1 — ISQ base quantities
base quantities base quantity symbols
length L
mass M
time T
electric current I
thermodynamic temperature Θ
amount of substance N
luminous intensity J
Note 2 to entry: In ISO 80000-1:2009, 3 Terms and Definition, the symbols such as L and M, which are called base
quantity symbols in this document, are called as dimension symbols of quantity
2 © ISO 2020 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
3.3
derived quantity
quantity, in a system of quantities, defined in terms of the base quantities of that system
-1
EXAMPLE Speed is a derived quantity defined by length (distance) over time (LT ), where length (L) and
time (T) are base quantities.
[SOURCE: ISO 80000-1:2009, 3 Terms and Definition, 3.5 derived quantity]
3.4
quantitative information
QI
measure associated with the quantity (3.1) of a measurable object
3.5
measurable quantitative information
MQI
quantitative information (3.3) that can be expressed in unitized numeric terms
3.6
measurable quantitative information markup language
markup language of measurable quantitative information
QML
specification language for the annotation of measurable quantitative information (3.5) extractable
from text or other medium types of language
3.7
unit
unit of measurement
measurement unit
scalar basis, defined and adopted by convention, of measuring objects by multiplying their quantitative
values expressed in real numbers
Note 1 to entry: The expressions that are used in measurement such as “meter”, “liter”, and “µmol/kg” are units
by the definition given above. The multitude expressions such as “bottles”, “boxes”, or “two” as in “two bottles of
milk”, “a box of apples”, and “two coffees” sometimes fail to be regarded as units, but they can also be if they are
accepted as units by convention or agreement in some communities. ISO 24617 SemAF Part 12: Quantification
treats such multitude expressions as genuine units.
Note 2 to entry: There are two major types of units, base and derived
[Refer to ISO 80000-1:2009, 3 Terms and Definitions, 3.9 Unit, 3.10 Base unit, and 3.11 Derived unit.]
[SOURCE: Refer to: ISO 80000-1:2009, 3 Terms and Definitions, 3.9, real scalar quantity, defined and
adopted by convention, with which any other quantity of the same kind can be compared to express the
ratio of the second quantity to the first one as a number.]
3.8
base unit
measurement unit that is adopted by convention for a base quantity (3.2)
Note 1 to entry: There are seven base units chosen by the International System of Units (SI) associated with
seven ISQ base quantities to measure quantities, as shown in Table 2.
Table 2 — base units
SI base unit Associated ISQ base quantity
(unit symbol) (base quantity dimension symbol)
meter (m) length (L)
kilogram (kg) mass (M)
© ISO 2020 – All rights reserved 3
---------------------- Page: 11 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
Table 2 (continued)
SI base unit Associated ISQ base quantity
(unit symbol) (base quantity dimension symbol)
second (s) time (T)
ampere (A) electric current (I)
kelvin (K) thermodynamic temperature (Θ)
mole (mol) amount of substance (N)
candela (cd) luminous intensity (J)
[SOURCE: ISO 80000-1:2009, 3 Terms and Definitions, 3.9 Unit, 3.10 Base unit, and 3.11 Derived unit.]
3.9
derived unit
measurement unit for a derived quantity
EXAMPLE The unit “newton” (N) is a derived unit for a derived quantity “force” (F), which is defined to be
-2
“mass times acceleration” (MLT ), where the quantity “acceleration” is a derived quantity defined by “velocity
-1 -1
divided by time” (VT ) and “velocity” defined by “length (distance) divided by time” (LT ).
Note 1 to entry: Table 3 illustrates some of the derived units.
[Refer to ISO 80000-1:2009, 3 Terms and Definitions, 3.9 Unit, 3.10 Base unit, and 3.11 Derived unit.]
Table 3 — derived units
Derived unit Associated derived quantity
(unit symbol)
kilo-meter per minute(km/min) speed= length(L)/ time(T)
3 3
gram per cubic meter (gram/m ) density=mass(M)/volume(L )
2
kilo- gram, meter per square second force = mass (M) x length(L)/time(T )
2
(kg x m/s )
2
lumen per square meter (lm/m ) Illuminance = luminous intensity (J)/
2
area(M )
4 Background and Motivations
Quantity exists as a multitude (e.g., “two watermelons”) or magnitude (“one kilogram of watermelon”).
The two basic divisions of quantity imply the principal distinction between continuity (continuum)
and discontinuity, which are two ways of determining quantity. SemAF-MQI only focuses on the
measurement information in scientific and technical texts. Therefore, quantity is regarded as a
magnitude property in the document, which is consistent with ISO 80000 - 1:2009 Quantities and units.
As in ISO 80000-1:2009, the term “unit” is defined in relation to quantity and is used for real scalar
quantity, defined and adopted by convention, with which any other quantity of the same kind can be
compared to express the ratio of the second quantity to the first one as a number. There are two types
of units: base unit and derived unit.
This document treats complex derived units as unanalyzed wholes. It does not annotate their internal
structures and components, unless it is required by some special use cases. Neither does the standard
require to specify ways of converting one unit to another. Here are some reasons:
1) Complex derived units such as speed “km/h” (LT-1) or acceleration “m/s2” (LT-2) are understood as
they are in ordinary situations.
2) Certain domain specific units cannot be decomposed during their conversion to other equivalent
units. For example, Estimated Glomerular Filtration Rate (eGFR) frequently uses the unit “mL/
2
min/1.73m ” in a medical domain. Thus, a kidney function can be classified into various stages
4 © ISO 2020 – All rights reserved
---------------------- Page: 12 ----------------------
oSIST ISO/DIS 24617-11:2021
ISO/DIS 24617-11:2020(E)
depending on eGFR, where the stage 1 defines “normal eGFR greater than or equal to 90 mL/
2 2
min/1.73m ”. In some cases, the unit can be written as “mL/min/((173/100).m )”. In all these cases,
“1.43” or “173/100” in the units cannot be annotated separately for automatic conversion since they
are combined with other parts together to be a complete unit.
3) Units can be converted automatically in an effective way such as with the use of a conversion
table. For example, by using directly “1 mmol/l” that equals to “18 mg/dl”, the computer can more
effectively convert the unit into another with one single computation rather than convert each part
of unit and then compute the total value.
4) Incomplete units exist. During language processing, there are incomplete units which need to
be detected by using different methods such as by formulating some specific rules or guidelines.
Such rules could be designed to extend a unit into a more complete representation or to complete
missing parts of a derived unit according to some clues such as contextual information or variable-
specific default unit information.
With the recent advent of artificial intelligence technologies, many applications in IR and NLP have been
developed to acquire meta information from unstructured texts as a core module, such as question
answering systems, automatic speech translation systems, and intelligent assistant systems. In the
process of running such systems, texts are usually found containing a large amount of measurable
quantitative information, constituting an essential portion of meta information for information
extraction, text understanding, and data analysis.
Particularly, in such a big data era, demands from industry and academic communities for a precise
acquisition of measurable quantitative information have increased. For example, business investment
companies frequently need to aggregate various sorts of information covering net sales, gross profit,
operating expenses, operating profit, interest expense, net profit before taxes, net income, etc., of the
target companies from their annual reports. The fast-growing medical informatics research also needs
to process a large amount of medical texts
...
FINAL
INTERNATIONAL ISO/FDIS
DRAFT
STANDARD 24617-11
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Semantic annotation framework
Voting begins on:
2021-05-10 (SemAF) —
Voting terminates on:
Part 11:
2021-07-05
Measurable quantitative information
(MQI)
Gestion des ressources linguistiques — Cadre d'annotation
sémantique (SemAF) —
Partie 11: Mesurer l'information quantitative (MQI)
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/FDIS 24617-11:2021(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO 2021
---------------------- Page: 1 ----------------------
ISO/FDIS 24617-11:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/FDIS 24617-11:2021(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abstract specification of QML . 3
4.1 Overview . 3
4.2 Characteristics of QML . 4
4.3 Metamodel . 4
4.4 Abstract syntax of QML (QML_as) . 5
4.5 Concrete syntaxes of QML (QML_cs) and its subsets . 6
5 XML-based concrete syntax of QML (QML_csx) . 6
5.1 General . 6
5.2 Tag names with ID prefixes . 6
5.3 Attribute specification of the root . 7
5.4 Attribute specification of the basic element types . 7
5.5 Attribute specification of the link types . 8
5.6 Illustrations of QML_csx . 8
5.6.1 General. 8
5.6.2 Sample data . 8
5.6.3 Procedure of annotation . 9
6 TEI-based concrete syntax of QML (QML_cst) .11
6.1 Concrete syntaxes of QML (QML_cst) .11
6.1.1 Overall .11
6.1.2 Tag names with ID prefixes .11
6.1.3 Attribute specification of the basic element types .11
6.1.4 Attribute specification of the two link types .12
6.2 Illustrations of QML_cst .12
6.2.1 Overall .12
6.2.2 Sample data .12
6.2.3 Illustrations of TEI-based Concrete Syntax.13
Annex A (informative) Illustrations of QML_csx with more samples .16
Annex B (informative) Informal statements of MQI .19
Annex C (informative) The representation of units .20
Bibliography .21
© ISO 2021 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/FDIS 24617-11:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/FDIS 24617-11:2021(E)
Introduction
Measurable quantitative information (MQI) such as ‘165 cm’ or ‘60 kg’ of ‘John’ that applies to the height
or weight of the person is very common in ordinary language. MQI describes one of basic properties
that is associated with the magnitude aspect of quantity. The main characteristics of MQI is that
quantitative information is presented as measures expressed in terms of a pair , consisting of
a numerically expressed quantity n and a unit u, which is either basic or derived, or either normalized
or conventionally used. Such information is much more abundant in scientific publications or technical
reports to the extent that it constitutes an essential part of communicative segments of language in
general. The processing of such information is thus required for any successful language resource
management.
In such a big data era, demands from industry and academic communities for a precise acquisition of
measurable quantitative information have increased. For example, business investment companies
frequently need to aggregate various sorts of information covering net sales, gross profit, operating
expenses, operating profit, interest expense, net profit before taxes, net income, etc., of the target
companies from their annual reports. The fast-growing medical informatics research also needs
to process a large amount of medical texts to analyze the dose of medicine, the eligibility criteria of
[8]
clinical trial, the phenotype characters of patients, the lab tests in clinical records, etc. . All these
demands either in industry or in medical research require the accurate and consistent representation
of measurable quantitative information for automated processing, computation, and exchange.
However, in the IR and NLP areas, there is no standardized way of representing measurable quantitative
information currently available. Each application system developed in industrial sectors has hitherto
used its own format to annotate measurable quantitative information. A flexible, interoperable and
standardized measurable quantitative information representation format for IR and NLP tasks to work
with many different application systems is called for.
This document aims at formulating a general annotation scheme with following the principles of
semantic annotation laid down in ISO 24617-6 in general and the basic requirements of ISO 24611,
that facilitates the processing of MQI in scientific and technical language and to make it interoperable
with other semantic annotation schemes, such as ISO 24617. The annotation scheme is designed to be
interoperable with other parts of ISO 24617. It also utilizes various ISO standards on lexical resources
and morpho-syntactic annotation frameworks. It aims at being compatible with other existing relevant
standards.
NOTE ISO 24617-1 and ISO 24617-7, for instance, have proposed a way of annotating measures on time
(durations or time amounts) and space (distances), respectively. ISO 24612 provides a pivotal form (graphic
annotation framework) that makes all the annotation of temporal or spatial measures in these two annotation
schemes.
QML is normalized at the abstract level that allows various serialization formats representing annotated
measurable quantitative information such as an XML-based representation. The normalization of QI
(quantitative information) annotation is stated at the abstract level of annotation, and the standoff
annotation format is adopted at the concrete level of serialization.
Focusing on measurements in scientifico-technological language, this document is expected to
[9]
contribute to information extraction (IR) , question answering (QA), text summarization (TS), and
[10]
other natural language processing (NLP) applications .
© ISO 2021 – All rights reserved v
---------------------- Page: 5 ----------------------
FINAL DRAFT INTERNATIONAL STANDARD ISO/FDIS 24617-11:2021(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 11:
Measurable Quantitative information (MQI)
1 Scope
This document covers the measurable or magnitudinal aspect of quantity so that it can focus on the
technical or practical use of measurements in IR (information retrieval), QA (question answering), TS
(text summarization), and other NLP (natural language processing) applications. It is applicable to the
domains of technology that carry more applicational relevance than some theoretical issues found in
the ordinary use of language.
NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative
information.
This document also treats temporal durations that are discussed in ISO 24617-1, and spatial
measures such as distances that are treated ISO 24617-7, while making them interoperable with other
measure types. It also accommodates the treatment of measures or amounts that are introduced in
ISO 24617-6:2016, 8.3.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
quantity
property of a measurable object referring to its magnitude or multitude
[SOURCE: ISO/IEC Guide 99:2007, 1.1, modified — Definition substantially redrafted, and Notes
removed.]
© ISO 2021 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO/FDIS 24617-11:2021(E)
3.2
base quantity
quantity (3.1) in a conventionally chosen subset of a given system of quantities, where no quantity in
the subset can be expressed in terms of the other quantities within that subset
Note 1 to entry: Kinds of quantities include seven base quantities defined by the International System of
Quantities (ISQ).
[SOURCE: ISO/IEC Guide 99:2007, 1.4, modified — "no subset quantity" replaced with "no quantity in
the subset", "the others" replaced with "the other quantities within that subset", and Notes and Example
removed.]
3.3
derived quantity
quantity (3.1), in a system of quantities, defined in terms of the base quantities (3.2) of that system
-1
EXAMPLE Speed is a derived quantity defined by length (distance) over time (LT ), where length (L) and
time (T) are base quantities.
[SOURCE: ISO/IEC Guide 99:2009, 1.5, modified — Example replaced.]
3.4
quantitative information
QI
measurement associated with the quantity (3.1) of a measurable object
3.5
measurable quantitative information
MQI
quantitative information (3.4) that can be expressed in unitized numeric terms
3.6
measurable quantitative information markup language
markup language of measurable quantitative information
quantitative markup language
QML
specification language for the annotation of measurable quantitative information (3.5) extractable from
text or other medium types of language
3.7
measurement unit
unit of measurement
unit
scalar basis, defined and adopted by convention, of measuring objects by multiplying their quantitative
values expressed in real numbers
Note 1 to entry: The expressions that are used in measurement such as “metre”, “litre”, and “µmol/kg” are units
by the definition given above. The multitude expressions such as “bottles”, “boxes”, or “two” as in “two bottles of
milk”, “a box of apples”, and “two coffees” sometimes fail to be regarded as units, but they can also be if they are
accepted as units by convention or agreement in some communities. ISO 24617 SemAF Part 12: Quantification
treats such multitude expressions as genuine units.
[SOURCE: ISO/IEC Guide 99:2007, 1.9, modified — Definition substantilly redrafted, original Notes
removed, new Note 1 to entry added.]
3.8
base unit
measurement unit (3.7) that is adopted by convention for a base quantity (3.2)
Note 1 to entry: There are seven base units chosen by the International System of Units (SI) associated with
seven ISQ base quantities to measure quantities, as shown in Table 1.
2 © ISO 2021 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/FDIS 24617-11:2021(E)
Table 1 — Base units
SI base unit Associated ISQ base quantity
(unit symbol) (base quantity symbol)
metre (m) length (L)
kilogram (kg) mass (M)
second (s) time (T)
ampere (A) electric current (I)
kelvin (K) thermodynamic temperature (È)
mole (mol) amount of substance (N)
candela (cd) luminous intensity (J)
[SOURCE: ISO/IEC Guide 99:2007, 1.10, modified — Notes and Examples removes, new Note 1 to entry
and Table 1 added.]
3.9
derived unit
measurement unit (3.7) for a derived quantity (3.3)
EXAMPLE The unit “newton” (N) is a derived unit for a derived quantity “force” (F), which is defined to be
-2
“mass times acceleration” (MLT ), where the quantity “acceleration” is a derived quantity defined by “velocity
-1 -1
divided by time” (VT ) and “velocity” defined by “length (distance) divided by time” (LT ).
Note 1 to entry: Table 2 illustrates some of the derived units.
[SOURCE: ISO/IEC Guide 99:2007, 1.11, modified — Examples removed, new Example and Note 1 to
entry added.]
Table 2 — derived units
Derived unit Associated derived quantity
(unit symbol)
kilometre per minute(km/min) speed = length(L)/ time(T)
3 3
gram per cubic metre (gram/m ) density = mass(M)/volume(L )
2 2
kilogram metre per square second (kg x m/s ) force = mass (M) x length(L)/time(T )
2 2
lumen per square metre (lm/m ) Illuminance = luminous intensity (J)/area(M )
4 Abstract specification of QML
4.1 Overview
The quantitative markup language (QML) (3.6) is specified at two levels, abstract and concrete. Some
characteristics of QML are listed in 4.2. The overall structure of QML is represented by a metamodel, as
introduced in 4.3. The abstract syntax of QML as QML_as shall be a set-theoretic specification of QML in
conceptual terms that are independent of ways of representing the annotation (content) of measurable
quantitative information. The concrete syntax of QML as QML_cs shall be a specification of a set of
representation formats, based on QML_as, for the annotation of measurable quantitative information
in a computationally tractable way. The QML_as is introduced in 4.4, while QML_cs is presented in
4.5. Equivalent concrete syntaxes, including an XML-based concrete syntax QML_csx and a TEI-based
concrete syntax QML_cst, are described in Clause 5 and Clause 6, respectively.
NOTE There can be many equivalent concrete syntaxes defined on a single abstract syntax.
© ISO 2021 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO/FDIS 24617-11:2021(E)
4.2 Characteristics of QML
QML shall have the following characteristics.
a) QML shall focus on the annotation of the measurable attributes of entities. For example, “BMI
2
between 10-20 kg/ m ”
b) QML shall provide a way to annotate the relations of measures. For example, “age 40 or older” and
“fpg>=100 mg/dl or a1c not less than 5,8 %”
c) QML shall cover the complex uses of unitized numeric quantities. For example, “14,0 × 109”,
“glycosylated haemoglobin (hba1c) <1,15 times the upper limit of normal”.
d) QML shall facilitate the identification of normalized numeric, units, as the measurable attribute of
an associated entity.
NOTE QML does not specify ways of annotating the normalization (e.g. “millimoles per litre” is normalized
to “mmol/L”) or complete specification (e.g. “kg/m” is “kg/m2” for BMI) of units, which will be dealt with in
another part of ISO 24617 addressing automated implementation of MQI.
4.3 Metamodel
The overall structure of measurable quantitative information is represented by the metamodel in
Figure 1.
Figure 1 — Metamodel of measurable quantitative information
This metamodel shall consist of seven class components, represented as square boxes in Figure 1:
a) source data as input to the annotation of MQI,
b) markables extracted from data sources,
c) three types of basic elements: entity, measure, and relator,
d) two types of links: measure link and comparison link.
The element “entity” shall be any object that has the property of a measurable quantity, represented by
“@quantity”, as one of its properties. The “entity”, as is used in this document, shall be a very general
term that refers to any object, not just to individual entities, but also to their properties, such as
4 © ISO 2021 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/FDIS 24617-11:2021(E)
“height” of a building or “speed” of a car, and also to any kinds of eventualities such as states, processes
or transitions.
EXAMPLE 1 We drove at more than 200 kilometres per hour on a German autobahn.
The speed mentioned by “more than 200 kilometres per hour” applies to the quantitative property of a
motion: e.g. the measure “over 200 kilometres per hour” applies to the motion of driving mentioned in
the example.
The element “measure” represents a measurable quantity of an entity in terms of three attributes:
quantity, unit, and type.
EXAMPLE 2 The height of Mt. Hall is 1 950 metres.
The measure shall consist of a quantity referred to by a numeric expression “1 950” and a unit “metre”.
It applies to the “height” quantity of the geographical object, named “Mt. Hall”.
The element “relator” which is associated with markables such as “equal to”, “greater than”, “<=”,
“between”, or “at least” has only a functional status of relating two or more measures.
EXAMPLE 3 One pound equals 16 ounces.
It is a relator of identity between two measures, “one pound” and “16 ounces”.
EXAMPLE 4 1 foot is less than 1 metre, for it is exactly equal to 30,48 cm.
This example illustrates two types of links between measures: the relation of being “less than”, and that
of being an identity.
A link of the type “measure” shall relate a measure to the quantitative property of an entity. Such a link
is triggered by a measure element.
A link of the type “comparison” shall relate a measure to another or other more measures. Such a link is
often triggered by an element “comparison”.
4.4 Abstract syntax of QML (QML_as)
A markup language QML shall be a specification language for the annotation of MQI. The abstract syntax
of QML shall specifies an annotation scheme in set-theoretic terms based on a conceptual understanding
of MQI. The abstract syntax QML_as is understood to be structured as a triple such that
a) B is a set of three basic element types: entity, measure, and relator;
b) R is a set of two link types: measure and comparison types;
c) @ is a set of assignments that specify the list of attributes and their value types associated with
each of the basic element types in B and each of the link types in R.
Every element in B shall have at least one attribute, @type, and so does every link. The values of @
type are CDATA associated with each of the elements. For instance, the entity of “mountain” is of the
“geographical” type, and the entity named “John” is of the “person” type.
The values of @quantity for an entity are CDATA that may include values such as height, width, or
weight, and so on.
The assignment of measure shall have three attributes: @numeric, @unit, and @type. A possible value
of the attribute @numeric is a real number. A possible value of @unit is one of the units in a system
conventionally accepted such as one of the SI base units or derived units. A possible value of @type is
one of the quantities listed as ISQ base quantities or derived quantities, such as length, mass, voltage,
and so on.
© ISO 2021 – All rights reserved 5
---------------------- Page: 10 ----------------------
ISO/FDIS 24617-11:2021(E)
4.5 Concrete syntaxes of QML (QML_cs) and its subsets
An abstract syntax shall allow several semantically equivalent concrete syntaxes. QML_as likewise
allows a set of equivalent concrete syntaxes of QML(QML_cs). This document introduces two kinds of
concrete syntaxes, QML_csx and QML_csf, in Clause 5 and Clause 6, respectively.
The two concrete syntaxes, QML_csx and QML_csf, are both based on the abstract syntax QML_as, while
adopting XML as their representation language. They shall comply with the requirement of standoff
annotation in ISO 24612.
These two concrete syntaxes do, however, differ from each other in at least two aspects. Just like the
other Parts of ISO 24617 on semantic annotation, such as ISO 24617-1, ISO 24617-7, and ISO 24617-6,
QML_csx does not separate annotation content structures from their anchoring (referencing)
structures, although this separation is required by LAF for linguistic annotation.
In contrast, QML_csf is feature-structure-based. It shall follow LAF for the separation of the two
structures, anchoring and content structures in representing measurement information in feature
structures. Furthermore, QML_cst, as specified in this document, shall adopt the names of XML
elements and attributes with value type specifications from the TEI P 5 Guidelines of the Text Encoding
Initiative Consortium for the representation of MQI.
5 XML-based concrete syntax of QML (QML_csx)
5.1 General
The XML-based concrete syntax QML_csx is introduced in two steps. The first step is to list the tag
names and ID prefixes of QML_csx in 5.2. The second step is to specify the attribute assignments for the
XML root in 5.3, for each of the basic element types listed in 5.4, and for each of the link types listed in
5.5.
NOTE The root tag is introduced in XML to embed a list of XML elements into a single structure.
5.2 Tag names with ID prefixes
Corresponding to each of the basic element types and the link types for QML_csx, there is a unique tag
and a unique ID prefix, as shown in Table 3.
Table 3 — List of tags and ID prefixes of QML_csx
Tags ID prefixes Comment
Root mqi XML root tag
Basic element types
Entity x object to which a measure applies
Measure me unitized numeric quantities only
Relator c triggers a link relating measures
Link types
Measure link mL relates a measure to an entity and is triggered
by a measure
Comparison link cL relates a measure to another or other more
measures
NOTE The attribute name for each ID in XML is xml:id and each of its values is an ID prefix followed by a
positive integer, e.g. .
6 © ISO 2021 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/FDIS 24617-11:2021(E)
5.3 Attribute specification of the root
List 1: A list of attributes for in extended BNF (Backus-Naur form)
attributes = identifier, target, [lang], [mediumTyp
...
PROJET
NORME ISO/FDIS
FINAL
INTERNATIONALE 24617-11
ISO/TC 37/SC 4
Gestion des ressources
Secrétariat: KATS
linguistiques — Cadre d'annotation
Début de vote:
2021-05-10 sémantique (SemAF) —
Vote clos le:
Partie 11:
2021-07-05
Informations quantitatives
mesurables (MQI)
Language resource management — Semantic annotation framework
(SemAF) —
Part 11: Measurable quantitative information (MQI)
LES DESTINATAIRES DU PRÉSENT PROJET SONT
INVITÉS À PRÉSENTER, AVEC LEURS OBSER-
VATIONS, NOTIFICATION DES DROITS DE PRO-
PRIÉTÉ DONT ILS AURAIENT ÉVENTUELLEMENT
CONNAISSANCE ET À FOURNIR UNE DOCUMEN-
TATION EXPLICATIVE.
OUTRE LE FAIT D’ÊTRE EXAMINÉS POUR
ÉTABLIR S’ILS SONT ACCEPTABLES À DES FINS
INDUSTRIELLES, TECHNOLOGIQUES ET COM-
Numéro de référence
MERCIALES, AINSI QUE DU POINT DE VUE
ISO/FDIS 24617-11:2021(F)
DES UTILISATEURS, LES PROJETS DE NORMES
INTERNATIONALES DOIVENT PARFOIS ÊTRE
CONSIDÉRÉS DU POINT DE VUE DE LEUR POSSI-
BILITÉ DE DEVENIR DES NORMES POUVANT
SERVIR DE RÉFÉRENCE DANS LA RÉGLEMENTA-
©
TION NATIONALE. ISO 2021
---------------------- Page: 1 ----------------------
ISO/FDIS 24617-11:2021(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2021
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
E-mail: copyright@iso.org
Web: www.iso.org
Publié en Suisse
ii © ISO 2021 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO/FDIS 24617-11:2021(F)
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Spécification abstraite de QML . 3
4.1 Vue d’ensemble . 3
4.2 Caractéristiques de QML . 4
4.3 Métamodèle . 4
4.4 Syntaxe abstraite de QML (QML_as) . 5
4.5 Syntaxes concrètes de QML (QML_cs) et de ses sous-ensembles . 6
5 Syntaxe concrète de QML basée sur XML (QML_csx) . 6
5.1 Généralités . 6
5.2 Noms de balises avec préfixes d’ID . 6
5.3 Spécification des attributs de la racine . 7
5.4 Spécification des attributs des types d’éléments de base . 7
5.5 Spécification des attributs des types de liens . 8
5.6 Illustrations de QML_csx . 9
5.6.1 Généralités . 9
5.6.2 Échantillons de données . 9
5.6.3 Procédure d’annotation . 9
6 Syntaxe concrète de QML basée sur la TEI (QML_cst) .11
6.1 Syntaxes concrètes de QML (QML_cst) .11
6.1.1 Généralités .11
6.1.2 Noms de balises avec préfixes d’ID .11
6.1.3 Spécification des attributs des types d’éléments de base .11
6.1.4 Spécification des attributs des deux types de liens .12
6.2 Illustrations de QML_cst .13
6.2.1 Généralités .13
6.2.2 Échantillons de données .13
6.2.3 Illustrations de la syntaxe concrète basée sur la TEI .13
Annexe A (informative) Illustrations de QML_csx avec davantage d’échantillons .17
Annexe B (informative) Énoncés informels de MQI .20
Annexe C (informative) Représentation des unités .21
Bibliographie .22
© ISO 2021 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO/FDIS 24617-11:2021(F)
Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www
.iso .org/ directives).
L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www .iso .org/ brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la nature volontaire des normes, la signification des termes et expressions
spécifiques de l’ISO liés à l’évaluation de la conformité, ou pour toute information au sujet de l’adhésion
de l’ISO aux principes de l’Organisation mondiale du commerce (OMC) concernant les obstacles
techniques au commerce (OTC), voir le lien suivant: www .iso .org/ iso/ fr/ avant -propos.
Le présent document a été élaboré par le comité ISO/TC 37, Langage et terminologie, sous-comité SC 4,
Gestion des ressources linguistiques.
Une liste de toutes les parties de la série ISO 24617 se trouve sur le site web de l’ISO.
Il convient que l’utilisateur adresse tout retour d’information ou toute question concernant le présent
document à l’organisme national de normalisation de son pays. Une liste exhaustive desdits organismes
se trouve à l’adresse www .iso .org/ fr/ members .html.
iv © ISO 2021 – Tous droits réservés
---------------------- Page: 4 ----------------------
ISO/FDIS 24617-11:2021(F)
Introduction
Les informations quantitatives mesurables (MQI, Measurable Quantitative Information) telles que
«165 cm» ou «60 kg» de «John» qui s’appliquent à la taille ou au poids de la personne sont très courantes
dans le langage ordinaire. Les MQI décrivent l’une des propriétés de base qui est associée à l’aspect
quantitatif d’une grandeur. Les principales caractéristiques de la norme MQI sont que les informations
quantitatives sont présentées sous forme de mesures exprimées en termes de paire < n, u > , consistant
en une grandeur exprimée numériquement n et une unité u, qui est une unité de base ou une unité
dérivée, ou encore une unité normalisée ou utilisée par convention. Ces informations sont beaucoup
plus abondantes dans les publications scientifiques ou les rapports techniques au point qu’elles
constituent une part essentielle des segments communicatifs du langage en général. Le traitement de
ces informations est donc nécessaire pour une gestion réussie des ressources linguistiques.
À l’époque du «big data», les demandes de l’industrie et des milieux universitaires pour une
acquisition précise des informations quantitatives mesurables ont augmenté. Par exemple, les sociétés
d’investissement dans les entreprises ont fréquemment besoin d’agréger différents types d’informations
couvrant les ventes nettes, la marge brute, les frais d’exploitation, le bénéfice d’exploitation, les frais
d’intérêt, le bénéfice net avant impôts, le revenu net, etc. des sociétés cibles à partir de leurs rapports
annuels. La recherche en informatique médicale, en plein essor, a également besoin de traiter une
grande quantité de textes médicaux pour analyser la dose de médicament, les critères d’éligibilité des
essais cliniques, les caractères phénotypiques des patients, les essais en laboratoire dans les dossiers
[8]
cliniques, etc. . Toutes ces demandes, qu’elles soient liées à l’industrie ou à la recherche médicale,
exigent la représentation précise et cohérente des informations quantitatives mesurables afin de
permettre un traitement, un calcul et un échange automatisés.
Cependant, en IR et en PNL, il n’existe actuellement aucun moyen normalisé de représenter les
informations quantitatives mesurables. Chaque système d’application développé dans les secteurs
industriels utilise jusqu’à présent son propre format pour annoter les informations quantitatives
mesurables. Un format de représentation des informations quantitatives mesurables qui soit flexible,
interopérable et normalisé est nécessaire pour permettre aux tâches d’IR et de PNL de fonctionner avec
de nombreux systèmes d’application différents.
Le présent document vise à formuler un schéma d’annotation général en suivant les principes
d’annotation sémantique définis dans l’ISO 24617-6 en général et les exigences de base de l’ISO 24611, qui
facilite le traitement des MQI dans le langage scientifique et technique et afin de le rendre interopérable
avec d’autres schémas d’annotation sémantique, tels que l’ISO 24617. Le schéma d’annotation est conçu
pour être interopérable avec les autres parties de l’ISO 24617. Il s’appuie également sur diverses normes
ISO relatives aux ressources lexicales et aux cadres d’annotation morpho-syntaxique. Il vise à être
compatible avec les autres normes pertinentes existantes.
NOTE L’ISO 24617-1 et l’ISO 24617-7, par exemple, ont proposé un moyen d’annoter les mesures de temps
(durées ou quantités de temps) et d’espace (distances), respectivement. L’ISO 24612 fournit un formulaire pivot
(cadre d’annotation graphique) qui permet de réaliser toutes les annotations de mesures de temps et d’espace
dans ces deux schémas d’annotation.
Le QML est normalisé à un niveau abstrait qui permet divers formats de sérialisation représentant
les informations quantitatives mesurables annotées, tels qu’une représentation basée sur XML.
La normalisation de l’annotation QI (information quantitative) est indiquée au niveau abstrait de
l’annotation, et le format d’annotation déportée est adopté au niveau concret de la sérialisation.
Axé sur les mesures en langage scientifico-technologique, le présent document est censé contribuer aux
[9]
applications d’extraction d’information (IR) , de réponse aux questions (QA), de résumé de texte (TS)
[10]
et autres applications de traitement du langage naturel (NLP) .
© ISO 2021 – Tous droits réservés v
---------------------- Page: 5 ----------------------
PROJET FINAL DE NORME INTERNATIONALE ISO/FDIS 24617-11:2021(F)
Gestion des ressources linguistiques — Cadre d'annotation
sémantique (SemAF) —
Partie 11:
Informations quantitatives mesurables (MQI)
1 Domaine d’application
Le présent document porte sur l’aspect mesurable ou quantitatif de la grandeur, de sorte qu’il est
possible de se concentrer sur l’utilisation technique ou pratique des mesures dans les applications IR
(recherche d’informations), QA (réponse aux questions), TS (résumé de texte) et autres applications NLP
(traitement du langage naturel). Il s’applique aux domaines technologiques qui présentent plus d’intérêt
sur le plan de l’application que certains problèmes théoriques rencontrés dans l’utilisation ordinaire du
langage.
NOTE L’ISO 24617-12 traite des questions plus générales et théoriques de la quantification et de l’information
quantitative.
Le présent document traite également des durées temporelles qui sont abordées dans l’ISO 24617-1 et
des mesures spatiales telles que les distances qui sont traitées dans l’ISO 24617-7, tout en les rendant
interopérables avec d’autres types de mesures. Il intègre également le traitement des mesures ou des
montants qui sont introduits dans l’ISO 24617-6:2016, 8.3.
2 Références normatives
Les documents suivants sont cités dans le texte de sorte qu’ils constituent, pour tout ou partie de leur
contenu, des exigences du présent document. Pour les références datées, seule l’édition citée s’applique.
Pour les références non datées, la dernière édition du document de référence s’applique (y compris les
éventuels amendements).
ISO 24612, Gestion des ressources linguistiques — Cadre d'annotation linguistique (LAF)
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions suivants s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— ISO Online browsing platform: disponible à l’adresse https:// www .iso .org/ obp;
— IEC Electropedia: disponible à l’adresse https:// www .electropedia .org/ .
3.1
grandeur
propriété d’un objet mesurable se référant à son ampleur ou à sa multiplicité
[SOURCE: ISO/IEC Guide 99:2007, 1.1, modifiée — La définition a été considérablement remaniée et les
notes ont été supprimées.]
© ISO 2021 – Tous droits réservés 1
---------------------- Page: 6 ----------------------
ISO/FDIS 24617-11:2021(F)
3.2
grandeur de base
grandeur (3.1) d’un sous-ensemble choisi par convention dans un système de grandeurs donné de façon
qu’aucune grandeur du sous-ensemble ne puisse être exprimée en fonction des autres grandeurs de ce
sous-ensemble
Note 1 à l'article: La nature des grandeurs comprend sept grandeurs de base définies par le Système international
de grandeurs (ISQ).
[SOURCE: ISO/IEC Guide 99:2007, 1.4, modifiée — L’expression «des autres» a été remplacée par «des
autres grandeurs de ce sous-ensemble», et les notes ainsi que l’exemple ont été supprimés.]
3.3
grandeur dérivée
grandeur définie (3.1), dans un système de grandeurs, en fonction des grandeurs de base (3.2) de ce
système
−1
EXEMPLE La vitesse est une grandeur dérivée définie par la longueur (distance) par rapport au temps (LT ),
où la longueur (L) et le temps (T) sont des grandeurs de base.
[SOURCE: ISO/IEC Guide 99:2009, 1.5, modifiée — L’exemple a été remplacé.]
3.4
information quantitative
QI
mesure associée à la grandeur (3.1) d’un objet mesurable
3.5
information quantitative mesurable
MQI
information quantitative (3.4) qui peut être exprimée en termes numériques unifiés
3.6
langage de balisage des informations quantitatives mesurables
langage de balisage des informations quantitatives mesurables
langage de balisage quantitatif
QML
langage de spécification pour l’annotation des informations quantitatives mesurables (3.5) extractibles
de textes ou d’autres types de support de langage
3.7
unité de mesure
unité de mesure
unité
base scalaire, définie et adoptée par convention, de la mesure des objets par multiplication de leurs
valeurs quantitatives exprimées en nombres réels
Note 1 à l'article: Les expressions utilisées en mesurage telles que «mètre», «litre» et «µmol/kg» sont des unités
selon la définition donnée ci-dessus. Les expressions de multiplicité telles que «bouteilles», «boîtes» ou «deux»
comme dans «deux bouteilles de lait», «une boîte de pommes» et «deux cafés» ne sont parfois pas considérées
comme des unités, mais elles peuvent l’être si elles sont acceptées comme unités par convention ou accord dans
certaines communautés. L’ISO 24617 SemAF Partie 12: Quantification traite ces expressions de multiplicité
comme de véritables unités.
[SOURCE: ISO/IEC Guide 99:2007, 1.9, modifiée — La définition a été considérablement remaniée, les
notes d’origine ont été supprimées et une nouvelle Note 1 à l’article a été ajoutée.]
2 © ISO 2021 – Tous droits réservés
---------------------- Page: 7 ----------------------
ISO/FDIS 24617-11:2021(F)
3.8
unité de base
unité de mesure (3.7) adoptée par convention pour une grandeur de base (3.2)
Note 1 à l'article: Il existe sept unités de base choisies par le Système international d’unités (SI) associées à sept
grandeurs de base ISQ pour mesurer les grandeurs, comme indiqué dans le Tableau 1.
Tableau 1 — Unités de base
Unité SI de base Grandeur de base de l’ISQ associée
(symbole de l’unité) (symbole de la grandeur de base)
mètre (m) longueur (L)
kilogramme (kg) masse (M)
seconde (s) temps (T)
ampère (A) courant électrique (I)
kelvin (K) température thermodynamique (È)
mole (mol) quantité de matière (N)
candela (cd) intensité lumineuse (J)
[SOURCE: ISO/IEC Guide 99:2007, 1.10, modifiée — Les notes et les exemples ont été supprimés, et une
nouvelle Note 1 à l’article ainsi que le Tableau 1 ont été ajoutés.]
3.9
unité dérivée
unité de mesure (3.7) d’une grandeur dérivée (3.3)
EXEMPLE L’unité «newton» (N) est une unité dérivée pour une grandeur dérivée «force» (F), qui est définie
−2
comme la «masse multipliée par l’accélération» (MLT ), où la grandeur «accélération» est une grandeur dérivée
−1
définie par la «vitesse divisée par le temps» (VT ) et la «vitesse» définie par la «longueur (distance) divisée par
−1
le temps» (LT ).
Note 1 à l'article: Le Tableau 2 illustre certaines des unités dérivées.
[SOURCE: ISO/IEC Guide 99:2007, 1.11, modifiée — Les exemples ont été supprimés et un nouvel
exemple ainsi que la Note 1 à l’article ont été ajoutés.]
Tableau 2 — Unités dérivées
Unité dérivée Grandeur dérivée associée
(symbole de l’unité)
kilomètre par minute (km/min) vitesse = longueur(L)/temps(T)
3 3
gramme par mètre cube (g/m ) masse volumique = masse(M)/volume(L )
2 2
kilogramme mètre par seconde carrée (kg x m/s ) force = masse (M) x longueur(L)/temps(T )
2 2
lumen par mètre carré (lm/m ) éclairement lumineux = intensité lumineuse (J)/aire(M )
4 Spécification abstraite de QML
4.1 Vue d’ensemble
Le langage de balisage quantitatif (QML) (3.6) est spécifié à deux niveaux, abstrait et concret. Certaines
caractéristiques de QML sont énumérées en 4.2. La structure globale de QML est représentée par
un métamodèle, tel que présenté en 4.3. La syntaxe abstraite de QML comme QML_as doit être une
spécification ensembliste de QML en termes conceptuels qui sont indépendants des manières de
représenter l’annotation (contenu) des informations quantitatives mesurables. La syntaxe concrète
de QML comme QML_cs doit être une spécification d’un ensemble de formats de représentation, basé
© ISO 2021 – Tous droits réservés 3
---------------------- Page: 8 ----------------------
ISO/FDIS 24617-11:2021(F)
sur QML_as, pour l’annotation des informations quantitatives mesurables d’une manière traçable
informatiquement. QML_as est présenté en 4.4, tandis que QML_cs est présenté en 4.5. Les syntaxes
concrètes équivalentes, dont une syntaxe concrète QML_csx basée sur XML et une syntaxe concrète
QML_cst basée sur la TEI, sont décrites à l’Article 5 et à l’Article 6 respectivement.
NOTE Il peut y avoir de nombreuses syntaxes concrètes équivalentes définies sur une seule syntaxe
abstraite.
4.2 Caractéristiques de QML
Le QML doit présenter les caractéristiques suivantes:
a) le QML doit être axé sur l’annotation des attributs mesurables des entités. Par exemple, «IMC entre
2
10-20 kg/m »;
b) le QML doit permettre d’annoter les relations des mesures. Par exemple, «âge 40 ou plus» et
«fpg >= 100 mg/dl ou a1c pas moins de 5,8 %»;
c) le QML doit couvrir les utilisations complexes de grandeurs numériques unifiées. Par exemple,
«14,0 × 109», «hémoglobine glyquée (hba1c) < 1,15 fois la limite supérieure de la normale»;
d) le QML doit faciliter l’identification d’unités numériques normalisées en tant qu’attribut mesurable
d’une entité associée.
NOTE Le QML ne spécifie pas les moyens d’annoter la normalisation (par exemple, «millimoles par litre»
2
est normalisé par «mmol/l ») ou la spécification complète (par exemple, «kg/m» s’écrit «kg/m » pour l’IMC) des
unités, ce qui sera abordé dans une autre partie de l’ISO 24617 traitant de la mise en œuvre automatisée des MQI.
4.3 Métamodèle
La structure globale des informations quantitatives mesurables est représentée par le métamodèle de
la Figure 1.
Figure 1 — Métamodèle des informations quantitatives mesurables
Ce métamodèle doit se composer de sept composantes de classe, représentées par des cases carrées à la
Figure 1:
a) données sources en entrée pour l’annotation des MQI;
b) marqueurs extraits des sources de données;
4 © ISO 2021 – Tous droits réservés
---------------------- Page: 9 ----------------------
ISO/FDIS 24617-11:2021(F)
c) trois types d’éléments de base: entité, mesure, et relateur;
d) deux types de liens: lien de mesure et lien de comparaison.
L’élément «entité» doit être tout objet qui a la propriété d’une grandeur mesurable, représentée par «@
grandeur», comme l’une de ses propriétés. L’« entité », telle qu’elle est utilisée dans le présent document,
doit être un terme très général qui fait référence à tout objet, non seulement à des entités individuelles,
mais aussi à leurs propriétés, telles que la «hauteur» d’un bâtiment ou la «vitesse» d’une voiture, ainsi
que toutes sortes d’éventualités telles que des états, des processus ou des transitions.
EXEMPLE 1 Nous avons roulé à plus de 200 kilomètres à l’heure sur une autoroute allemande.
La vitesse mentionnée par «plus de 200 kilomètres à l’heure» s’applique à la propriété quantitative d’un
mouvement: par exemple, la mesure «plus de 200 kilomètres à l’heure» s’applique au mouvement de
rouler mentionné dans l’exemple.
L’élément «mesure» représente une grandeur mesurable d’une entité selon trois attributs: grandeur,
unité et type.
EXEMPLE 2 La hauteur du mont Hall est de 1 950 mètres.
La mesure doit consister en une grandeur désignée par une expression numérique «1 950» et une unité
«mètre». Elle s’applique à la grandeur «hauteur» de l’objet géographique, nommé «mont Hall».
L’élément «relateur» qui est associé aux marqueurs tels que «égal à», «supérieur à», «<=», «entre» ou
«au moins» n’a que le statut fonctionnel de relier deux mesures ou plus.
EXEMPLE 3 Une livre équivaut à 16 onces.
Il s’agit un relateur d’identité entre deux mesures, «une livre» et «16 onces».
EXEMPLE 4 1 ft est inférieur à 1 mètre, car il est exactement égal à 30,48 cm.
Cet exemple illustre deux types de liens entre les mesures: la relation d’être «inférieur à» et celle d’être
une identité.
Un lien de type «mesure» doit relier une mesure à la propriété quantitative d’une entité. Un tel lien est
déclenché par un élément de mesure.
Un lien de type «comparaison» doit relier une mesure à une autre ou à plusieurs autres mesures. Un tel
lien est souvent déclenché par un élément de «comparaison».
4.4 Syntaxe abstraite de QML (QML_as)
Un langage de balisage QML doit être un langage de spécification pour l’annotation des MQI. La syntaxe
abstraite de QML doit spécifier un schéma d’annotation en termes de théorie des ensembles basé sur
une compréhension conceptuelle des MQI. La syntaxe abstraite QML_as est considérée comme ayant
une structure triple < B, R, @ > de sorte que:
a) B est un ensemble de trois types d’éléments de base: entité, mesure et relateur;
b) R est un ensemble de deux types de liens: les types mesure et comparaison;
c) @ est un ensemble d’affectations qui spécifient la liste d’attributs et leurs types de valeur associés
à chacun des types d’éléments de base dans B et à chacun des types de liens dans R.
Chaque élément de B doit posséder au moins un attribut, @type, tout comme chaque lien. Les valeurs de
@type sont des éléments CDATA associées à chacun des éléments. Par exemple, l’entité «montagne» est
de type «géographique» et l’entité nommée «John» est de type «personne».
Les valeurs de @grandeur pour une entité sont des éléments CDATA qui peuvent inclure des valeurs
telles que la hauteur, la largeur ou le poids, etc.
© ISO 2021 – Tous droits réservés 5
---------------------- Page: 10 ----------------------
ISO/FDIS 24617-11:2021(F)
L’affectation de mesure doit posséder trois attributs: @numérique, @unité et @type. Une valeur
possible de l’attribut @numérique est un nombre réel. Une valeur possible de @unité est l’une des
unités d’un système accepté par convention, comme l’une des unités SI de base ou des unités dérivées.
Une valeur possible de @type est l’une des grandeurs répertoriées en tant que grandeurs de base de
l’ISQ ou grandeurs dérivées, telles que la longueur, la masse, la tension, etc.
4.5 Syntaxes concrètes de QML (QML_cs) et de ses sous-ensembles
Une syntaxe abstraite doit permettre plusieurs syntaxes concrètes sémantiquement équivalentes.
QML_as permet ainsi un ensemble de syntaxes concrètes équivalentes de QML (QML_cs). Ce
document présente deux types de syntaxes concrètes, QML_csx et QML_csf, à l’Article 5 et l’Article 6,
respectivement.
Les deux syntaxes concrètes, QML_csx et QML_csf, sont basées sur la syntaxe abstraite QML_as, tout en
adoptant XML comme langage de représentation. Elles doivent être conformes à l’exigence d’annotation
déportée de l’ISO 24612.
Ces deux syntaxes concrètes diffèrent cependant l’une de l’autre sur
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.