SIST ISO 24623-2:2022
(Main)Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2: Ontology
Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2: Ontology
This document specifies the structure of an ontology for a fine-grained description of the expressive power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them, in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).
Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF) - Partie 2: Ontologie
Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del: Ontologija
Ta dokument določa strukturo ontologije za natančen opis izrazne moči korpusnih poizvedovalnih jezikov (CQL) za potrebe iskanja. Ontologijo sestavljajo tri medsebojno povezane taksonomije konceptov: metamodel CQLF (formalizacija ISOÂ 24623-1); taksonomija izrazne moči, ki opisuje različne vidike izrazne moči CQL-ov; in taksonomija CQL-ov.
Ta dokument določa:
a) taksonomijo metamodela CQLF;
b) najvišji sloj taksonomije izrazne moči (pri čemer koncepte imenujemo »funkcionalnosti«);
c) strukturo slojev taksonomije izrazne moči in razmerja med njimi v obliki subsumpcijskih trditev;
d) formalizacijo povezave med taksonomijo CQL in taksonomijo izrazne moči v obliki pozitivnih oziroma negativnih izjav o skladnosti.
Ta dokument ne opredeljuje celotne vsebine ontologije (glej točko 4).
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
01-februar-2022
Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del:
Ontologija
Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2:
Ontology
Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF) - Partie 2:
Ontologie
Ta slovenski standard je istoveten z: ISO 24623-2:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation and aims . 3
5 Structure and content of a CQLF ontology . 4
5.1 OWL DL formalism . 4
5.2 Structure of the ontology . 5
5.3 CQLF metamodel . 7
5.4 Functionalities . 8
5.5 Frames . 11
5.6 Use cases . 11
5.7 CQLs .12
6 Conformance statements .12
6.1 Positive conformance statements .12
6.2 Negative conformance statements . 13
Annex A (informative) Illustrative example of a CQLF ontology.15
Bibliography .18
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Several families of International Standards codify various aspects of the representation of language
data. These standards describe general corpus-oriented data models in the linguistic annotation
framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic
annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the
lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata
in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to
the standards concerning the representation of language data, the ISO 24623 series focuses on the
exploitation of language data and on ways to satisfy various kinds of information needs targeting these
data.
The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive
construct that establishes means of describing the scope of corpus query languages (CQLs) at a general
level and with a focus on various kinds of data models assumed by query systems, with conformance
conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a
CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well
as the dependencies among them.
Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other
parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on
the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer
taxonomy against which individual CQLs can make positive and negative conformance statements.
Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,
and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers
can enter conformance statements, and where end users can see which CQL to turn to in order to ensure
that their search needs get satisfied. An example of such a platform is given by Reference [13].
v
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope
This document specifies the structure of an ontology for a fine-grained description of the expressive
power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three
interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the
expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a
taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them,
in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in
the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel
ISO/IEC 10646, Information technology — Universal coded character set (UCS)
W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax
(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11
December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the
following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
CQLF module
subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic
Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-
text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,
dependency and containment).
Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid
terminological ambiguity.
[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in
order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]
3.2
functionality
label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)
contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or
more CQLF modules (3.1)
3.3
frame
label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),
understood as one facet of the expressive power of CQLs (3.5)
Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of
multiple functionalities.
3.4
use case
label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for
which it can be determined unambiguously whether a given query expression (3.8) satisfies the search
need (3.6) or not
Note 1 to entry: Use cases are often parameterized, i.e. they contain
...
INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation and aims . 3
5 Structure and content of a CQLF ontology . 4
5.1 OWL DL formalism . 4
5.2 Structure of the ontology . 5
5.3 CQLF metamodel . 7
5.4 Functionalities . 8
5.5 Frames . 11
5.6 Use cases . 11
5.7 CQLs .12
6 Conformance statements .12
6.1 Positive conformance statements .12
6.2 Negative conformance statements . 13
Annex A (informative) Illustrative example of a CQLF ontology.15
Bibliography .18
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Several families of International Standards codify various aspects of the representation of language
data. These standards describe general corpus-oriented data models in the linguistic annotation
framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic
annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the
lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata
in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to
the standards concerning the representation of language data, the ISO 24623 series focuses on the
exploitation of language data and on ways to satisfy various kinds of information needs targeting these
data.
The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive
construct that establishes means of describing the scope of corpus query languages (CQLs) at a general
level and with a focus on various kinds of data models assumed by query systems, with conformance
conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a
CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well
as the dependencies among them.
Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other
parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on
the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer
taxonomy against which individual CQLs can make positive and negative conformance statements.
Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,
and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers
can enter conformance statements, and where end users can see which CQL to turn to in order to ensure
that their search needs get satisfied. An example of such a platform is given by Reference [13].
v
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope
This document specifies the structure of an ontology for a fine-grained description of the expressive
power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three
interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the
expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a
taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them,
in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in
the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel
ISO/IEC 10646, Information technology — Universal coded character set (UCS)
W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax
(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11
December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the
following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
CQLF module
subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic
Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-
text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,
dependency and containment).
Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid
terminological ambiguity.
[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in
order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]
3.2
functionality
label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)
contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or
more CQLF modules (3.1)
3.3
frame
label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),
understood as one facet of the expressive power of CQLs (3.5)
Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of
multiple functionalities.
3.4
use case
label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for
which it can be determined unambiguously whether a given query expression (3.8) satisfies the search
need (3.6) or not
Note 1 to entry: Use cases are often parameterized, i.e. they contain variable elements. Parameterized use cases
are satisfied by parameterized query expressions.
3.5
CQL
corpus query language
formal language designed to retrieve specific information from (large) language data collections, and
thereby incorporate certain abstractions over commonly shared data models that make it possible for
the end user (3.7) (or user agents) to address parts of those data models
Note 1 to entry: A CQL defines a syntactic notation for query expressions (3.8) and the corresponding search
semantics, i.e. an intensional specification of the intended result set. For most current CQLs, semantics are
implicitly defined by a particular implementation.
[SOURCE: ISO 24623-1:2018, 3.4, modified — “CQL” has been added as preferred term, “end user” has
replaced “user” in the definition and Note 1 to entry has been added.]
3.6
search need
information pattern that an end user (3.7) wants to locate in a corpus, based on the primary data stream
and/or simple or complex annotation
3.7
end user
agent who uses a CQL (3.5) to satisfy his or her se
...
INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation and aims . 3
5 Structure and content of a CQLF ontology . 4
5.1 OWL DL formalism . 4
5.2 Structure of the ontology . 5
5.3 CQLF metamodel . 7
5.4 Functionalities . 8
5.5 Frames . 11
5.6 Use cases . 11
5.7 CQLs .12
6 Conformance statements .12
6.1 Positive conformance statements .12
6.2 Negative conformance statements . 13
Annex A (informative) Illustrative example of a CQLF ontology.15
Bibliography .18
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Several families of International Standards codify various aspects of the representation of language
data. These standards describe general corpus-oriented data models in the linguistic annotation
framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic
annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the
lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata
in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to
the standards concerning the representation of language data, the ISO 24623 series focuses on the
exploitation of language data and on ways to satisfy various kinds of information needs targeting these
data.
The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive
construct that establishes means of describing the scope of corpus query languages (CQLs) at a general
level and with a focus on various kinds of data models assumed by query systems, with conformance
conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a
CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well
as the dependencies among them.
Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other
parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on
the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer
taxonomy against which individual CQLs can make positive and negative conformance statements.
Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,
and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers
can enter conformance statements, and where end users can see which CQL to turn to in order to ensure
that their search needs get satisfied. An example of such a platform is given by Reference [13].
v
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope
This document specifies the structure of an ontology for a fine-grained description of the expressive
power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three
interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the
expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a
taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them,
in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in
the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel
ISO/IEC 10646, Information technology — Universal coded character set (UCS)
W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax
(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11
December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the
following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
CQLF module
subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic
Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-
text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,
dependency and containment).
Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid
terminological ambiguity.
[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in
order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]
3.2
functionality
label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)
contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or
more CQLF modules (3.1)
3.3
frame
label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),
understood as one facet of the expressive power of CQLs (3.5)
Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of
multiple functionalities.
3.4
use case
label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for
which it can be determined unambiguously whether a given query expression (3.8) satisfies the search
need (3.6) or not
Note 1 to entry: Use cases are often parameterized, i.e. they contain variable elements. Parameterized use cases
are satisfied by parameterized query expressions.
3.5
CQL
corpus query language
formal language designed to retrieve specific information from (large) language data collections, and
thereby incorporate certain abstractions over commonly shared data models that make it possible for
the end user (3.7) (or user agents) to address parts of those data models
Note 1 to entry: A CQL defines a syntactic notation for query expressions (3.8) and the corresponding search
semantics, i.e. an intensional specification of the intended result set. For most current CQLs, semantics are
implicitly defined by a particular implementation.
[SOURCE: ISO 24623-1:2018, 3.4, modified — “CQL” has been added as preferred term, “end user” has
replaced “user” in the definition and Note 1 to entry has been added.]
3.6
search need
information pattern that an end user (3.7) wants to locate in a corpus, based on the primary data stream
and/or simple or complex annotation
3.7
end user
agent who uses a CQL (3.5) to satisfy his or her se
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.