Language resource management — Corpus query lingua franca (CQLF) — Part 2: Ontology

This document specifies the structure of an ontology for a fine-grained description of the expressive power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a taxonomy of CQLs. This document specifies: a) the taxonomy of the CQLF metamodel; b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”); c) the structure of the layers of the expressive power taxonomy and the relationships between them, in the form of subsumption assertions; d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in the form of positive and negative conformance statements. This document does not define the entire contents of the ontology (see Clause 4).

Gestion des ressources linguistiques — Corpus query lingua franca (CQLF) — Partie 2: Ontologie

Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del: Ontologija

Ta dokument določa strukturo ontologije za natančen opis izrazne moči korpusnih poizvedovalnih jezikov (CQL) za potrebe iskanja. Ontologijo sestavljajo tri medsebojno povezane taksonomije konceptov: metamodel CQLF (formalizacija ISOÂ 24623-1); taksonomija izrazne moči, ki opisuje različne vidike izrazne moči CQL-ov; in taksonomija CQL-ov.
Ta dokument določa:
a) taksonomijo metamodela CQLF;
b) najvišji sloj taksonomije izrazne moči (pri čemer koncepte imenujemo »funkcionalnosti«);
c) strukturo slojev taksonomije izrazne moči in razmerja med njimi v obliki subsumpcijskih trditev;
d) formalizacijo povezave med taksonomijo CQL in taksonomijo izrazne moči v obliki pozitivnih oziroma negativnih izjav o skladnosti.
Ta dokument ne opredeljuje celotne vsebine ontologije (glej točko 4).

General Information

Status
Published
Publication Date
30-Nov-2021
Current Stage
6060 - International Standard published
Start Date
01-Dec-2021
Due Date
15-Jul-2022
Completion Date
01-Dec-2021

Buy Standard

Standard
ISO 24623-2:2022 - BARVE
English language
23 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24623-2:2021 - Language resource management -- Corpus query lingua franca (CQLF)
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/DIS 24623-2:2021 - BARVE
English language
24 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)

SLOVENSKI STANDARD
SIST ISO 24623-2:2022
01-februar-2022
Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del:
Ontologija
Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2:
Ontology
Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF) - Partie 2:
Ontologie
Ta slovenski standard je istoveten z: ISO 24623-2:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24623-2:2022 en
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24623-2:2022

---------------------- Page: 2 ----------------------
SIST ISO 24623-2:2022
INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
ISO 24623-2:2021(E)
© ISO 2021

---------------------- Page: 3 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
  © ISO 2021 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation and aims . 3
5 Structure and content of a CQLF ontology . 4
5.1 OWL DL formalism . 4
5.2 Structure of the ontology . 5
5.3 CQLF metamodel . 7
5.4 Functionalities . 8
5.5 Frames . 11
5.6 Use cases . 11
5.7 CQLs .12
6 Conformance statements .12
6.1 Positive conformance statements .12
6.2 Negative conformance statements . 13
Annex A (informative) Illustrative example of a CQLF ontology.15
Bibliography .18
iii
© ISO 2021 – All rights reserved

---------------------- Page: 5 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
  © ISO 2021 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Introduction
Several families of International Standards codify various aspects of the representation of language
data. These standards describe general corpus-oriented data models in the linguistic annotation
framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic
annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the
lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata
in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to
the standards concerning the representation of language data, the ISO 24623 series focuses on the
exploitation of language data and on ways to satisfy various kinds of information needs targeting these
data.
The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive
construct that establishes means of describing the scope of corpus query languages (CQLs) at a general
level and with a focus on various kinds of data models assumed by query systems, with conformance
conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a
CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well
as the dependencies among them.
Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other
parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on
the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer
taxonomy against which individual CQLs can make positive and negative conformance statements.
Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,
and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers
can enter conformance statements, and where end users can see which CQL to turn to in order to ensure
that their search needs get satisfied. An example of such a platform is given by Reference [13].
v
© ISO 2021 – All rights reserved

---------------------- Page: 7 ----------------------
SIST ISO 24623-2:2022

---------------------- Page: 8 ----------------------
SIST ISO 24623-2:2022
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope
This document specifies the structure of an ontology for a fine-grained description of the expressive
power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three
interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the
expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a
taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them,
in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in
the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel
ISO/IEC 10646, Information technology — Universal coded character set (UCS)
W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax
(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11
December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the
following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
1
© ISO 2021 – All rights reserved

---------------------- Page: 9 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
3.1
CQLF module
subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic
Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-
text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,
dependency and containment).
Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid
terminological ambiguity.
[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in
order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]
3.2
functionality
label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)
contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or
more CQLF modules (3.1)
3.3
frame
label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),
understood as one facet of the expressive power of CQLs (3.5)
Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of
multiple functionalities.
3.4
use case
label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for
which it can be determined unambiguously whether a given query expression (3.8) satisfies the search
need (3.6) or not
Note 1 to entry: Use cases are often parameterized, i.e. they contain variable elements. Parameterized use cases
are satisfied by parameterized query expressions.
3.5
CQL
corpus query language
formal language designed to retrieve specific information from (large) language data collections, and
thereby incorporate certain abstractions over commonly shared data models that make it possible for
the end user (3.7) (or user agents) to address parts of those data models
Note 1 to entry: A CQL defines a syntactic notation for query expressions (3.8) and the corresponding search
semantics, i.e. an intensional specification of the intended result set. For most current CQLs, semantics are
implicitly defined by a particular implementation.
[SOURCE: ISO 24623-1:2018, 3.4, modified — “CQL” has been added as preferred term, “end user” has
replaced “user” in the definition and Note 1 to entry has been added.]
3.6
search need
information pattern that an end user (3.7) wants to locate in a corpus, based on the primary data stream
and/or simple or complex annotation
2
  © ISO 2021 – All rights reserved

---------------------- Page: 10 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
3.7
end user
agent who uses a CQL (3.5) to satisfy his or her search needs (3.6)
Note 1 to entry: This can be done via an interactive graphical user interface (GUI), a command-line tool,
programmatically via some application programming interface (API) or by a software program developed by the
end user.
3.8
query expression
string that is syntactically valid in a given CQL (3.5) and can be executed to return a result set
Note 1 to entry: Query expressions are often parameterized with variable elements. No formal specification
of the parameter substitution procedure is attempted, but entries for parameterized query expressions in the
ontology are required to include informal descriptions of the range of admissible values and any transformations
required.
3.9
parameter
variable element in a query expression (3.8) or in the description of a search need (3.6)
3.10
positive conformance statement
assertion that a given CQL (3.5) supports a given use case (3.4) by means of a query expression (3.8)
3.11
negative conformance statement
assertion that a given CQL (3.5) cannot support a given use case (3.4), frame (3.3) or functionality (3.2)
Note 1 to entry: Negative conformance is due to technical unavailability of specific capabilities in the respective
CQL or limitations on the complexity of query expressions (3.8).
3.12
CQL capability
capability
corpus query language capability
facility provided by CQLs (3.5) to meet a specific aspect of search needs (3.6)
3.13
layer
totality of concepts at the same level of abstraction in a CQLF ontology (3.15)
EXAMPLE Functionalities (3.2), frames (3.3), use cases (3.4).
3.14
token
non-empty contiguous sequence of graphemes or phonemes in a document
[SOURCE: ISO 24611:2012, 3.21, modified — Note 1 to entry has been deleted.]
3.15
CQLF ontology
ontology for a fine-grained description of the expressive power of CQLs (3.5) in terms of search needs
(3.6), which adheres to the structure specified in this document
4 Motivation and aims
CQLs differ widely in their basic sets of capabilities. Whereas some are restricted to rather specific
application scenarios, others are able to cover a wider variety of applications and search needs. It is
therefore both the quality and the quantity of CQL capabilities – as well as the degree to which they
can be combined freely – that determine the expressive power of a CQL. A CQLF ontology as specified
3
© ISO 2021 – All rights reserved

---------------------- Page: 11 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
in this document is not intended to articulate all the possible combinations of capabilities unless these
are justified by genuine usage. Its aim is to provide representative categories for typical search needs
within a taxonomy of CQL capabilities. These typical search needs evolve with general progress in
the fields of corpus linguistics and digital humanities, and with the discovery of new challenges, new
methods and new research questions. In order to accommodate the dynamic nature of the evolving
search needs, most of the content of such an ontology is outside the scope of standardization. This
document provides a structural framework for this dynamic information (by specifying the three-layer
structure of the expressive power taxonomy, the content of the topmost layer of functionalities, and the
relationships between different layers and taxonomies), ensuring that the ontology can adapt to new
search needs that emerge as the relevant disciplines evolve.
In order to provide a normative skeleton for the ontology while at the same time making provisions for
keeping its main content (search needs and corresponding query expressions) dynamic, this document
does not comprise a normative listing of the middle and bottom layer of the expressive power taxonomy
(i.e. frames and use cases). An exhaustive inventory of concepts at these two layers is not possible
due to the fact that existing CQLs differ widely in the complexity of the supported combinations
of functionalities, that new CQLs can be created offering additional combinations or subtypes of
functionalities, and that new search needs emerge from progress in the relevant research fields. The
frames and use cases of a CQLF ontology are expected to be supplied by a moderated community
process, driven by CQL developers as well as end users (see Reference [13]). For illustration, a sample of
[6] [8]
frames and use cases together with conformance statements linking them with the CQP and ANNIS
query languages is provided in Annex A.
The permissive architecture and terminology defined by this document enables research groups to
extend the relevant parts of the ontology with further CQL capabilities and search needs arising in
future.
The following application scenarios are thus made possible:
— describing the scope and capabilities of a given CQL, in terms of conformance statements against a
CQLF ontology (typically carried out by the CQL developers);
— comparing different CQLs with respect to their ability to meet typical search needs;
— identifying suitable CQLs and query tools that support (combinations of) CQL capabilities required
by an end user, together with examples of the respective query syntax;
— guiding the development of new CQLs and query tools by building an inventory of complex search
needs that are important for the community (typically carried out by end users).
5 Structure and content of a CQLF ontology
5.1 OWL DL formalism
[7]
The taxonomic framework for a CQLF ontology is modelled in OWL 2 DL – a dialect of the Web
Ontology Language (OWL) based on the family of description logics (DL) (see Reference [9]) as a
formal framework. All definitions and requirements of the W3C OWL 2 specification shall be followed.
[10][11]
The normative representation and exchange format for a CQLF ontology is RDF/XML . All
labels and annotations shall be represented as sequences of Unicode code points, in accordance with
ISO/IEC 10646.
W3C OWL 2 DL furnishes developers with a set of tools for:
a) stating concept hierarchies and membership of individuals,
b) defining highly expressive property restrictions.
In particular, this document makes use of the AnnotationProperty construct of OWL DL in order to
associate additional information with concepts and individuals.
4
  © ISO 2021 – All rights reserved

---------------------- Page: 12 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
For better readability, CQLF ontology axioms are provided in DL notation in Clauses 5 and 6 rather than
in the RDF/XML exchange format.
[9]
Relevant DL notions :
— concept inclusion ⊑: This operator asserts a logical subsumption relationship between two concept
expressions.
EXAMPLE 1 A ⊑ B asserts that A covers either a subset or the entire set of individuals contained in B. A is
also said to be subsumed by B.
NOTE 1 The same notation is sometimes used to express the opposite relation (A subsumes B) for feature
structures (see Reference [12], p. 496).
— concept equivalence ≡: This operator asserts an equivalence between two concept expressions.
EXAMPLE 2 A ≡ B asserts that A covers exactly the same set of individuals as B.
— intersection/conjunction ⊓: This operator denotes the intersection of two concept expressions,
i.e. the individuals contained in both concept expressions.
NOTE 2 A ⊑ B ⊓ C asserts that A is subsumed by B as well as C. It is equivalent to the assertions A ⊑ B and
A ⊑ C.
— union/disjunction ⊔: This operator denotes the union of two concept expressions, i.e. the
individuals contained in either or both of the concept expressions.
NOTE 3 A ⊑ B ⊔ C does not imply that A is subsumed by either B or C on its own. Some of the individuals
covered by A can be contained in B and others in C.
— top concept ⊤: denotes the set of all individuals in the domain, i.e. the entire universe. Also referred
to as “Thing” or “the root class”.
— bottom concept ⊥: denotes the empty set of individuals in the domain. Also referred to as “Nothing”
or “the empty class”.
— concept assertion ∈: This operator asserts that an individual belongs to a concept. Also known as
“class assertion” because the concepts represent classes (see T-Box below).
EXAMPLE 3 x ∈ A asserts that the individual x is a member of the concept A.
— A-Box: The domain of interest is spanned by a universe of individuals which serve as the fundamental
atoms for the ontology of what shall be modelled. They become members of concepts through
concept assertions (also referred to as “A-Box axioms”) and implicitly through the subsumption
relations expressed by concept inclusion assertions (in the T-Box).
— T-Box: Concepts are represented within the terminological box (T-Box). They are classes into which
individuals are organized by the A-Box axioms. The T-Box thus provides a vocabulary of concepts
and a rule set of hierarchical relations between them (“is-a” relations expressed by concept
inclusion axioms). Ideally, sibling categories cover a mutually exclusive space of sub-categories and/
or individuals.
5.2 Structure of the ontology
The T-Box of a CQLF ontology consists of three separate taxonomies of concepts. The main taxonomy
describes different facets of the expressive power of CQLs. It is called “expressive power taxonomy” and
is divided into three layers.
Concepts in the top layer are called “functionalities”. They represent (families of) individual search
capabilities that can be provided by CQLs at a general level. Functionalities serve as entry points for
navigating the main taxonomy. Functionalities belong to the normative part of the ontology and are
defined in 5.4.
5
© ISO 2021 – All rights reserved

---------------------- Page: 13 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Concepts in the middle layer are called “frames”. They represent typical search needs of end users,
which often involve combinations of multiple functionalities, at a relatively abstract level. For every
frame, subsumption assertions shall indicate which functionalities are required for the search need. A
frame A can also be subsumed by another frame A′ if A extends the search need represented by A′. The
normative part of the ontology does not include any instances of frames; the structure of the frame
layer is defined in 5.5.
Concepts in the bottom layer are called “use cases”. They represent parameterized instantiations of
frames, which should be sufficiently concrete so that it is possible to determine unambiguously whether
a given CQL can satisfy a given use case. For every use case, a subsumption assertion shall indicate
which frame is instantiated by the use case. There can also be subsumption assertions to multiple
frames as well as to other use cases. The normative part of the ontology does not include any instances
of use cases; the structure of the use case layer is defined in 5.6.
The second taxonomy of concepts formalizes the CQLF metamodel defined in ISO 24623-1. Subsumption
assertions link all functionalities to the CQLF metamodel. Both the CQLF metamodel taxonomy (defined
in 5.3) and the subsumption assertions (defined in 5.4) belong to the normative part of the ontology.
The third taxonomy of concepts represents individual CQLs whose expressive power is described with
respect to the ontology. It shall have a flat structure without subsumption assertions between different
CQLs. The normative part of the ontology does not include any instances of CQLs; the structure of the
taxonomy is defined in 5.7.
Individuals in the A-Box are positive conformance statements in the form of parameterized query
expressions. Concept assertions shall assign each individual to a CQL concept (representing the CQL
it is formulated in) and to a use case concept (representing the search need that the query expression
satisfies). The normative part of the ontology does not include any individuals, i.e. its A-Box is empty.
A CQL can also make negative conformance statements to declare that it cannot satisfy specific use
cases, frames or functionalities because of its design limitations. As general disjunction assertions for
concepts, negative conformance statements are part of the T-Box. If neither a positive nor a negative
conformance statement exists between a CQL and a given use case, it shall be considered undetermined
whether or not the CQL can satisfy the corresponding search need. Positive and negative conformance
statements are further defined in Clause 6.
No concept or subsumption assertion shall be made that would lead to logical inconsistencies in the
ontology.
The overall structure of a CQLF ontology is illustrated in Figure 1.
6
  © ISO 2021 – All rights reserved

---------------------- Page: 14 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Figure 1 — General structure of a CQLF ontology
5.3 CQLF metamodel
[5]
The theoretical concept of modules as standardized in the context of ISO 24623-1 is formalized by the
CQLF metamodel taxonomy. It consists of the concepts and subsumption assertions defined below. Each
concept is identified by its label (as r d f s : l ab el annotation), followed by all its subsumption assertions
within the taxonomy.
Abstract root concepts:
— Metamodel: This is the abstract root concept of the CQLF metamodel taxonomy.
— Level ⊑ Metamodel: This is the abstract root concept of all CQLF levels.
— Module ⊑ Metamodel: This is the abstract root concept of all CQLF modules.
7
© ISO 2021 – All rights reserved

---------------------- Page: 15 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
CQLF levels:
— Linear ⊑ Level: Plain-text search as well as search in segmented data.
— Complex ⊑ Level: Search in data annotated with hierarchical structures and/or
...

INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
ISO 24623-2:2021(E)
© ISO 2021

---------------------- Page: 1 ----------------------
ISO 24623-2:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
  © ISO 2021 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24623-2:2021(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation and aims . 3
5 Structure and content of a CQLF ontology . 4
5.1 OWL DL formalism . 4
5.2 Structure of the ontology . 5
5.3 CQLF metamodel . 7
5.4 Functionalities . 8
5.5 Frames . 11
5.6 Use cases . 11
5.7 CQLs .12
6 Conformance statements .12
6.1 Positive conformance statements .12
6.2 Negative conformance statements . 13
Annex A (informative) Illustrative example of a CQLF ontology.15
Bibliography .18
iii
© ISO 2021 – All rights reserved

---------------------- Page: 3 ----------------------
ISO 24623-2:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO’s adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
  © ISO 2021 – All rights reserved

---------------------- Page: 4 ----------------------
ISO 24623-2:2021(E)
Introduction
Several families of International Standards codify various aspects of the representation of language
data. These standards describe general corpus-oriented data models in the linguistic annotation
framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic
annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the
lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata
in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to
the standards concerning the representation of language data, the ISO 24623 series focuses on the
exploitation of language data and on ways to satisfy various kinds of information needs targeting these
data.
The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive
construct that establishes means of describing the scope of corpus query languages (CQLs) at a general
level and with a focus on various kinds of data models assumed by query systems, with conformance
conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a
CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well
as the dependencies among them.
Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other
parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on
the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer
taxonomy against which individual CQLs can make positive and negative conformance statements.
Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,
and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers
can enter conformance statements, and where end users can see which CQL to turn to in order to ensure
that their search needs get satisfied. An example of such a platform is given by Reference [13].
v
© ISO 2021 – All rights reserved

---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope
This document specifies the structure of an ontology for a fine-grained description of the expressive
power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three
interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the
expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a
taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them,
in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in
the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel
ISO/IEC 10646, Information technology — Universal coded character set (UCS)
W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax
(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11
December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the
following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
1
© ISO 2021 – All rights reserved

---------------------- Page: 6 ----------------------
ISO 24623-2:2021(E)
3.1
CQLF module
subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic
Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-
text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,
dependency and containment).
Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid
terminological ambiguity.
[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in
order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]
3.2
functionality
label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)
contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or
more CQLF modules (3.1)
3.3
frame
label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),
understood as one facet of the expressive power of CQLs (3.5)
Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of
multiple functionalities.
3.4
use case
label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for
which it can be determined unambiguously whether a given query expression (3.8) satisfies the search
need (3.6) or not
Note 1 to entry: Use cases are often parameterized, i.e. they contain variable elements. Parameterized use cases
are satisfied by parameterized query expressions.
3.5
CQL
corpus query language
formal language designed to retrieve specific information from (large) language data collections, and
thereby incorporate certain abstractions over commonly shared data models that make it possible for
the end user (3.7) (or user agents) to address parts of those data models
Note 1 to entry: A CQL defines a syntactic notation for query expressions (3.8) and the corresponding search
semantics, i.e. an intensional specification of the intended result set. For most current CQLs, semantics are
implicitly defined by a particular implementation.
[SOURCE: ISO 24623-1:2018, 3.4, modified — “CQL” has been added as preferred term, “end user” has
replaced “user” in the definition and Note 1 to entry has been added.]
3.6
search need
information pattern that an end user (3.7) wants to locate in a corpus, based on the primary data stream
and/or simple or complex annotation
2
  © ISO 2021 – All rights reserved

---------------------- Page: 7 ----------------------
ISO 24623-2:2021(E)
3.7
end user
agent who uses a CQL (3.5) to satisfy his or her search needs (3.6)
Note 1 to entry: This can be done via an interactive graphical user interface (GUI), a command-line tool,
programmatically via some application programming interface (API) or by a software program developed by the
end user.
3.8
query expression
string that is syntactically valid in a given CQL (3.5) and can be executed to return a result set
Note 1 to entry: Query expressions are often parameterized with variable elements. No formal specification
of the parameter substitution procedure is attempted, but entries for parameterized query expressions in the
ontology are required to include informal descriptions of the range of admissible values and any transformations
required.
3.9
parameter
variable element in a query expression (3.8) or in the description of a search need (3.6)
3.10
positive conformance statement
assertion that a given CQL (3.5) supports a given use case (3.4) by means of a query expression (3.8)
3.11
negative conformance statement
assertion that a given CQL (3.5) cannot support a given use case (3.4), frame (3.3) or functionality (3.2)
Note 1 to entry: Negative conformance is due to technical unavailability of specific capabilities in the respective
CQL or limitations on the complexity of query expressions (3.8).
3.12
CQL capability
capability
corpus query language capability
facility provided by CQLs (3.5) to meet a specific aspect of search needs (3.6)
3.13
layer
totality of concepts at the same level of abstraction in a CQLF ontology (3.15)
EXAMPLE Functionalities (3.2), frames (3.3), use cases (3.4).
3.14
token
non-empty contiguous sequence of graphemes or phonemes in a document
[SOURCE: ISO 24611:2012, 3.21, modified — Note 1 to entry has been deleted.]
3.15
CQLF ontology
ontology for a fine-grained description of the expressive power of CQLs (3.5) in terms of search needs
(3.6), which adheres to the structure specified in this document
4 Motivation and aims
CQLs differ widely in their basic sets of capabilities. Whereas some are restricted to rather specific
application scenarios, others are able to cover a wider variety of applications and search needs. It is
therefore both the quality and the quantity of CQL capabilities – as well as the degree to which they
can be combined freely – that determine the expressive power of a CQL. A CQLF ontology as specified
3
© ISO 2021 – All rights reserved

---------------------- Page: 8 ----------------------
ISO 24623-2:2021(E)
in this document is not intended to articulate all the possible combinations of capabilities unless these
are justified by genuine usage. Its aim is to provide representative categories for typical search needs
within a taxonomy of CQL capabilities. These typical search needs evolve with general progress in
the fields of corpus linguistics and digital humanities, and with the discovery of new challenges, new
methods and new research questions. In order to accommodate the dynamic nature of the evolving
search needs, most of the content of such an ontology is outside the scope of standardization. This
document provides a structural framework for this dynamic information (by specifying the three-layer
structure of the expressive power taxonomy, the content of the topmost layer of functionalities, and the
relationships between different layers and taxonomies), ensuring that the ontology can adapt to new
search needs that emerge as the relevant disciplines evolve.
In order to provide a normative skeleton for the ontology while at the same time making provisions for
keeping its main content (search needs and corresponding query expressions) dynamic, this document
does not comprise a normative listing of the middle and bottom layer of the expressive power taxonomy
(i.e. frames and use cases). An exhaustive inventory of concepts at these two layers is not possible
due to the fact that existing CQLs differ widely in the complexity of the supported combinations
of functionalities, that new CQLs can be created offering additional combinations or subtypes of
functionalities, and that new search needs emerge from progress in the relevant research fields. The
frames and use cases of a CQLF ontology are expected to be supplied by a moderated community
process, driven by CQL developers as well as end users (see Reference [13]). For illustration, a sample of
[6] [8]
frames and use cases together with conformance statements linking them with the CQP and ANNIS
query languages is provided in Annex A.
The permissive architecture and terminology defined by this document enables research groups to
extend the relevant parts of the ontology with further CQL capabilities and search needs arising in
future.
The following application scenarios are thus made possible:
— describing the scope and capabilities of a given CQL, in terms of conformance statements against a
CQLF ontology (typically carried out by the CQL developers);
— comparing different CQLs with respect to their ability to meet typical search needs;
— identifying suitable CQLs and query tools that support (combinations of) CQL capabilities required
by an end user, together with examples of the respective query syntax;
— guiding the development of new CQLs and query tools by building an inventory of complex search
needs that are important for the community (typically carried out by end users).
5 Structure and content of a CQLF ontology
5.1 OWL DL formalism
[7]
The taxonomic framework for a CQLF ontology is modelled in OWL 2 DL – a dialect of the Web
Ontology Language (OWL) based on the family of description logics (DL) (see Reference [9]) as a
formal framework. All definitions and requirements of the W3C OWL 2 specification shall be followed.
[10][11]
The normative representation and exchange format for a CQLF ontology is RDF/XML . All
labels and annotations shall be represented as sequences of Unicode code points, in accordance with
ISO/IEC 10646.
W3C OWL 2 DL furnishes developers with a set of tools for:
a) stating concept hierarchies and membership of individuals,
b) defining highly expressive property restrictions.
In particular, this document makes use of the AnnotationProperty construct of OWL DL in order to
associate additional information with concepts and individuals.
4
  © ISO 2021 – All rights reserved

---------------------- Page: 9 ----------------------
ISO 24623-2:2021(E)
For better readability, CQLF ontology axioms are provided in DL notation in Clauses 5 and 6 rather than
in the RDF/XML exchange format.
[9]
Relevant DL notions :
— concept inclusion ⊑: This operator asserts a logical subsumption relationship between two concept
expressions.
EXAMPLE 1 A ⊑ B asserts that A covers either a subset or the entire set of individuals contained in B. A is
also said to be subsumed by B.
NOTE 1 The same notation is sometimes used to express the opposite relation (A subsumes B) for feature
structures (see Reference [12], p. 496).
— concept equivalence ≡: This operator asserts an equivalence between two concept expressions.
EXAMPLE 2 A ≡ B asserts that A covers exactly the same set of individuals as B.
— intersection/conjunction ⊓: This operator denotes the intersection of two concept expressions,
i.e. the individuals contained in both concept expressions.
NOTE 2 A ⊑ B ⊓ C asserts that A is subsumed by B as well as C. It is equivalent to the assertions A ⊑ B and
A ⊑ C.
— union/disjunction ⊔: This operator denotes the union of two concept expressions, i.e. the
individuals contained in either or both of the concept expressions.
NOTE 3 A ⊑ B ⊔ C does not imply that A is subsumed by either B or C on its own. Some of the individuals
covered by A can be contained in B and others in C.
— top concept ⊤: denotes the set of all individuals in the domain, i.e. the entire universe. Also referred
to as “Thing” or “the root class”.
— bottom concept ⊥: denotes the empty set of individuals in the domain. Also referred to as “Nothing”
or “the empty class”.
— concept assertion ∈: This operator asserts that an individual belongs to a concept. Also known as
“class assertion” because the concepts represent classes (see T-Box below).
EXAMPLE 3 x ∈ A asserts that the individual x is a member of the concept A.
— A-Box: The domain of interest is spanned by a universe of individuals which serve as the fundamental
atoms for the ontology of what shall be modelled. They become members of concepts through
concept assertions (also referred to as “A-Box axioms”) and implicitly through the subsumption
relations expressed by concept inclusion assertions (in the T-Box).
— T-Box: Concepts are represented within the terminological box (T-Box). They are classes into which
individuals are organized by the A-Box axioms. The T-Box thus provides a vocabulary of concepts
and a rule set of hierarchical relations between them (“is-a” relations expressed by concept
inclusion axioms). Ideally, sibling categories cover a mutually exclusive space of sub-categories and/
or individuals.
5.2 Structure of the ontology
The T-Box of a CQLF ontology consists of three separate taxonomies of concepts. The main taxonomy
describes different facets of the expressive power of CQLs. It is called “expressive power taxonomy” and
is divided into three layers.
Concepts in the top layer are called “functionalities”. They represent (families of) individual search
capabilities that can be provided by CQLs at a general level. Functionalities serve as entry points for
navigating the main taxonomy. Functionalities belong to the normative part of the ontology and are
defined in 5.4.
5
© ISO 2021 – All rights reserved

---------------------- Page: 10 ----------------------
ISO 24623-2:2021(E)
Concepts in the middle layer are called “frames”. They represent typical search needs of end users,
which often involve combinations of multiple functionalities, at a relatively abstract level. For every
frame, subsumption assertions shall indicate which functionalities are required for the search need. A
frame A can also be subsumed by another frame A′ if A extends the search need represented by A′. The
normative part of the ontology does not include any instances of frames; the structure of the frame
layer is defined in 5.5.
Concepts in the bottom layer are called “use cases”. They represent parameterized instantiations of
frames, which should be sufficiently concrete so that it is possible to determine unambiguously whether
a given CQL can satisfy a given use case. For every use case, a subsumption assertion shall indicate
which frame is instantiated by the use case. There can also be subsumption assertions to multiple
frames as well as to other use cases. The normative part of the ontology does not include any instances
of use cases; the structure of the use case layer is defined in 5.6.
The second taxonomy of concepts formalizes the CQLF metamodel defined in ISO 24623-1. Subsumption
assertions link all functionalities to the CQLF metamodel. Both the CQLF metamodel taxonomy (defined
in 5.3) and the subsumption assertions (defined in 5.4) belong to the normative part of the ontology.
The third taxonomy of concepts represents individual CQLs whose expressive power is described with
respect to the ontology. It shall have a flat structure without subsumption assertions between different
CQLs. The normative part of the ontology does not include any instances of CQLs; the structure of the
taxonomy is defined in 5.7.
Individuals in the A-Box are positive conformance statements in the form of parameterized query
expressions. Concept assertions shall assign each individual to a CQL concept (representing the CQL
it is formulated in) and to a use case concept (representing the search need that the query expression
satisfies). The normative part of the ontology does not include any individuals, i.e. its A-Box is empty.
A CQL can also make negative conformance statements to declare that it cannot satisfy specific use
cases, frames or functionalities because of its design limitations. As general disjunction assertions for
concepts, negative conformance statements are part of the T-Box. If neither a positive nor a negative
conformance statement exists between a CQL and a given use case, it shall be considered undetermined
whether or not the CQL can satisfy the corresponding search need. Positive and negative conformance
statements are further defined in Clause 6.
No concept or subsumption assertion shall be made that would lead to logical inconsistencies in the
ontology.
The overall structure of a CQLF ontology is illustrated in Figure 1.
6
  © ISO 2021 – All rights reserved

---------------------- Page: 11 ----------------------
ISO 24623-2:2021(E)
Figure 1 — General structure of a CQLF ontology
5.3 CQLF metamodel
[5]
The theoretical concept of modules as standardized in the context of ISO 24623-1 is formalized by the
CQLF metamodel taxonomy. It consists of the concepts and subsumption assertions defined below. Each
concept is identified by its label (as r d f s : l ab el annotation), followed by all its subsumption assertions
within the taxonomy.
Abstract root concepts:
— Metamodel: This is the abstract root concept of the CQLF metamodel taxonomy.
— Level ⊑ Metamodel: This is the abstract root concept of all CQLF levels.
— Module ⊑ Metamodel: This is the abstract root concept of all CQLF modules.
7
© ISO 2021 – All rights reserved

---------------------- Page: 12 ----------------------
ISO 24623-2:2021(E)
CQLF levels:
— Linear ⊑ Level: Plain-text search as well as search in segmented data.
— Complex ⊑ Level: Search in data annotated with hierarchical structures and/or dependency
information, or querying simple annotations by means of containment-based queries.
— Concurrent ⊑ Level: Search in multiple concurrent (overlapping, intersecting and often conflicting)
annotations built upon a single data stream.
CQLF modules:
— PlainText ⊑ Module ⊓ Linear: Segmentation-independent string search.
— SimpleAnnotation ⊑ Module ⊓ Linear: Segmentation-based search for annotations describing the
primary data stream; understood more generally as search for annotations of individual objects in
the context of this document.
— Segmentation ⊑ Module ⊓ Linear: Search for segmental annotation, in particular tokens and token
sequences.
— Hierarchical ⊑ Module ⊓ Complex: Tree-based representations, e.g. for phrase-structure
description.
— Dependency ⊑ Module ⊓ Complex: Identification of relationships in which objects function as
nodes linked by directed arcs.
— SpanContainment ⊑ Module ⊓ Complex: Non-recursive simplified hierarchical relationships
encoded as character span containment.
— Paradigmatic ⊑ Module ⊓ Concurrent: Different annotation layers providing data packages
describing the same location.
— Overlapping ⊑ Module ⊓ Concurrent: Concurrent annotations built upon character spans which
overlap in their start and/or end offsets.
As the coarsest categories of corpus
...

SLOVENSKI STANDARD
oSIST ISO/DIS 24623-2:2021
01-marec-2021
Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del:
Ontologija
Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2:
Ontology
Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF)
Ta slovenski standard je istoveten z: ISO/DIS 24623-2
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/DIS 24623-2:2021 en
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/DIS 24623-2:2021

---------------------- Page: 2 ----------------------
oSIST ISO/DIS 24623-2:2021
DRAFT INTERNATIONAL STANDARD
ISO/DIS 24623-2
ISO/TC 37/SC 4 Secretariat: KATS
Voting begins on: Voting terminates on:
2021-01-07 2021-04-01
Language resource management — Corpus Query Lingua
Franca (CQLF) —
Part 2:
Ontology
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 24623-2:2021(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
©
PROVIDE SUPPORTING DOCUMENTATION. ISO 2021

---------------------- Page: 3 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved

---------------------- Page: 4 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Motivation and aims . 3
5 CQLF Ontology . 4
5.1 OWL DL formalism . 4
5.2 Structure of the ontology . 5
5.3 CQLF Metamodel . 7
5.4 Functionalities . 9
5.5 Frames .11
5.6 Use Cases .12
5.7 CQLs .12
6 Conformance statements .13
6.1 Positive conformance statements .13
6.2 Negative conformance statements .14
7 RDF/XML serialization .14
Annex A (informative) Illustrative examples of non-normative elements in the CQLF Ontology .15
Annex B (informative) CQLF Ontology: Moderated community process .18
Bibliography .19
© ISO 2021 – All rights reserved iii

---------------------- Page: 5 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 37, Language and Terminology,
Subcommittee SC 4, Language Resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved

---------------------- Page: 6 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

Introduction
Technical Committee ISO/TC 37, Language and Terminology, Subcommittee SC 4, Language Resource
management has developed several families of standards codifying various aspects of representation
of language data. These standards describe general corpus-oriented data models in the Linguistic
Annotation Framework (LAF, ISO 24612) family, various aspects of the semantic representation in the
family of the Semantic Annotation Framework (SemAF, ISO 24617-1 and others), the representation
of lexical data in the Lexical Markup Framework family (LMF, ISO 24613-1 and others), as well as the
representation of metadata in the Component Metadata Infrastructure (CMDI, ISO 24622-1 and others).
Complementary to the standards concerning the representation of language data, the Corpus Query
Lingua Franca (henceforth CQLF) family of standards (ISO 24623) focuses on the exploitation of
language data and ways to satisfy various kinds of information needs targeting these data.
The CQLF Metamodel, described by ISO 24623-1 (CQLF-1), is a maximally permissive construct that
establishes means of describing the scope of corpus query languages (CQLs) at a general level and
with a focus on various kinds of data models assumed by query systems, with conformance conditions
meant to be satisfied by a wide range of CQLs. The Metamodel provides a “skeleton” for a CQL taxonomy
by setting up basic categories of corpus queries (encoded as CQLF-1 levels and modules) as well as the
dependencies among them.
Consequently, the task of a more concrete characterization of CQLs falls to other members of the CQLF
standard family. This document (ISO 24623-2, “CQLF-2” for short) establishes an ontology which
focuses on the generalized information needs satisfied by corpus queries, and which is structured as
a multi-layer taxonomy against which individual CQLs can make positive and negative conformance
statements.
Establishing this ontology allows, on the one hand, a fine-grained comparison of the expressive power
of CQLs, and, on the other hand, it is going to serve a practical purpose: as a foundation for a platform
where developers can enter conformance statements, and where end users can see which CQL to turn
to in order to ensure that their search needs get satisfied.
© ISO 2021 – All rights reserved v

---------------------- Page: 7 ----------------------
oSIST ISO/DIS 24623-2:2021

---------------------- Page: 8 ----------------------
oSIST ISO/DIS 24623-2:2021
DRAFT INTERNATIONAL STANDARD ISO/DIS 24623-2:2021(E)
Language resource management — Corpus Query Lingua
Franca (CQLF) —
Part 2:
Ontology
1 Scope
This document defines an ontology for a fine-grained description of the expressive power of CQLs in
terms of search needs. The ontology consists of three interrelated taxonomies of concepts: a) the CQLF
Metamodel (a formalization of CQLF-1), b) the Expressive Power taxonomy, which describes different
facets of the expressive power of CQLs, and c) a taxonomy of CQLs.
The normative parts of this document comprise a) the taxonomy of the CQLF Metamodel, b) the
Functionality layer of the Expressive Power taxonomy, c) the structure of the layers of the Expressive
Power taxonomy and the relationships between them, in the form of subsumption assertions, as well as
d) the formalization of the linkage between the CQL taxonomy and the Expressive Power taxonomy, in
the form of positive and negative conformance statements.
This document does not provide a normative listing of the middle and bottom layer of the Expressive
Power taxonomy (called Frames and Use Cases, respectively). An exhaustive inventory of the concepts
at these two layers is not possible due to the fact that existing CQLs differ widely in the complexity of
the supported combinations of Functionalities and that new CQLs can be created offering additional
combinations or subtypes of Functionalities. Frames and Use Cases are expected to be filled in through
a moderated community process, driven by CQL developers as well as end users. An informative annex
to this document contains a sample of Frames and Use Cases together with conformance statements
[5] [7]
linking them with the CQP and ANNIS query languages.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
ISO 24612, Language resource management — Linguistic annotation framework (LAF)
ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel
Motik B. Patel-Schneider, Peter F., and Parsia, B. (2012). OWL 2 Web Ontology Language: Structural
Specification and Functional-Style Syntax (Second Edition). W3C Recommendation, 11 December 2012.
(Latest version available at http:// www .w3 .org/ TR/ owl2 -syntax/ .)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the
following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
© ISO 2021 – All rights reserved 1

---------------------- Page: 9 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

— IEC Electropedia: available at http:// www .electropedia .org/
3.1
CQLF module
subcomponent of the CQLF Metamodel, defined with reference to a specified data model characteristic
Note 1 to entry: The CQLF Metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-
text, segmentation, and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,
dependency, and containment).
[SOURCE: ISO 24623-1:2018, 3.8, modified – “a CQLF level” was replaced with “the CQLF Metamodel” in
order to improve clarity outside the context of ISO 24623-1.]
3.2
Functionality
label for a concept in the CQLF Ontology that represents a family of capabilities contributing to the
expressive power of a CQL, formulated at a general level and linked to one or more CQLF modules
3.3
Frame
label for a concept in the CQLF Ontology that represents a typical search need of end users, understood
as one facet of the expressive power of CQLs
Note 1 to entry: Most Frames arise from the specialization of a Functionality and/or the combination of multiple
Functionalities.
3.4
Use Case
label for a concept in the CQLF Ontology that represents a concrete instantiation of a Frame, for which
it can be determined unambiguously whether a given query expression satisfies the search need or not
Note 1 to entry: Use Cases are often parameterized, i.e. they contain variable elements. Parameterized Use Cases
are satisfied by parameterized query expressions.
3.5
CQL
corpus query language
formal language designed to retrieve specific information from (large) language data collections, and
thereby incorporate certain abstractions over commonly shared data models that make it possible for
the end user (or user agents) to address parts of those data models
Note 1 to entry: A CQL defines a syntactic notation for query expressions and the corresponding search semantics,
i.e. an intensional specification of the intended result set. For most current CQLs, semantics are implicitly defined
by a particular implementation.
[SOURCE: ISO 24623-1:2018, 3.4, modified – “user” was replaced with “end user” in the definition, the
abbreviation CQL was added as preferred term, and Note 1 to entry was added.]
3.6
search need
information pattern that an end user wants to locate in a corpus, based on the primary data stream
and/or simple or complex annotation
3.7
end user
agent who uses a CQL to satisfy his or her search needs
Note 1 to entry: This can be done via an interactive GUI, a command-line tool, programmatically via some API, or
by a software program developed by the end user.
2 © ISO 2021 – All rights reserved

---------------------- Page: 10 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

3.8
query expression
string that is syntactically valid in a given CQL and can be executed to return a result set
Note 1 to entry: Query expressions are often parameterized with variable elements. No formal specification
of the parameter substitution procedure is attempted, but entries for parameterized query expressions in the
ontology are required to include informal descriptions of the range of admissible values and any transformations
required.
3.9
parameter
variable element in a query expression or in the description of a search need
3.10
positive conformance statement
assertion that a given CQL supports a given Use Case by means of a query expression
3.11
negative conformance statement
assertion that a given CQL cannot support a given Use Case, Frame or Functionality
Note 1 to entry: Negative conformance is due to technical unavailability of specific capabilities in the respective
CQL or limitations on the complexity of query expressions.
3.12
CQL capability
corpus query language capability
facility provided by CQLs to meet a specific aspect of search needs
3.13
layer
totality of concepts at the same level of abstraction in the CQLF Ontology
EXAMPLE Functionalities, Frames, Use Cases
3.14
token
non-empty contiguous sequence of graphemes or phonemes in a document
[SOURCE: ISO 24611:2012, 3.21, modified — The note was deleted.]
4 Motivation and aims
CQLs differ widely in their basic sets of capabilities. Whereas some are restricted to rather specific
application scenarios, others are able to cover a wider variety of applications and search needs. It is
therefore both the quality and the quantity of CQL capabilities – as well as the degree of their combination
– that determine the expressive power of a CQL. The CQLF Ontology is not intended to articulate all the
possible combinations of capabilities unless these are justified by genuine usage. Its aim is to provide
representative categories for typical search needs within a taxonomy of CQL capabilities.
Yet another important aspect is the degree of explication that a CQL delivers. Some CQLs are able to
express a particular search need in a condensed and highly specialized manner, while others rely on
complex combinations of a few elementary capabilities. The CQLF Ontology leverages information from
CQLs with a more explicit formalization of capabilities in order to create a systematic taxonomy of
search needs and thus be able to classify CQLs of rather implicit formalization. Their respective degree
of explication is visible in the parameterized query expressions included in all positive conformance
statements.
This document defines the structure of an ontology representing CQL capabilities and search needs in a
taxonomy consisting of three layers of varying degrees of abstraction. It also provides formal means for
stating the conformance of individual CQLs to this central taxonomy.
© ISO 2021 – All rights reserved 3

---------------------- Page: 11 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

End users navigate the taxonomy starting from a compact layer of CQLF capabilities. Selecting a subset
of relevant capabilities allows them to locate relevant search needs in the middle layer of the taxonomy
efficiently, and then, in the bottom layer, choose a concrete instantiation of the search need that is
closest to their requirements. This instantiation links to all CQLs that satisfy the selected search need
and provides a parameterized query expression for each CQL.
The definition of the structure of the CQLF Ontology as described in this document is expected to be
instantiated and expanded in a dynamic community-based project (see Annex B). The permissive
architecture and terminology defined by CQLF-2 enables research groups to extend the relevant parts
of the ontology with further CQL capabilities and search needs.
CQLF-2 is primarily intended for the following application scenarios:
• describing the scope and capabilities of a given CQL, in terms of conformance statements against the
CQLF Ontology (typically carried out by the CQL developers);
• comparing different CQLs with respect to their ability to meet typical search needs;
• identifying suitable CQLs and query tools that support (combinations of) CQL capabilities required
by an end user, together with examples of the respective query syntax; and
• guiding the development of new CQLs and query tools by building an inventory of complex search
needs that are important for the community (typically carried out by end users).
5 CQLF Ontology
5.1 OWL DL formalism
The taxonomic framework of the CQLF Ontology is modelled in OWL 2 DL [6] – a dialect of the Web
Ontology Language (OWL) based on the family of description logics (hence DL, see [8]) as a formal
framework. All definitions and requirements of the OWL 2 Specification shall be followed. The
normative representation and exchange format for the CQLF Ontology is RDF/XML ([9], [10]). All labels
and annotations shall be represented as sequences of Unicode code points, following ISO/IEC 10646.
OWL 2 DL furnishes developers with a set of tools for a) stating concept hierarchies and membership
of individuals and b) defining highly expressive property restrictions. In particular, the CQLF Ontology
makes use of the AnnotationProperty construct of OWL DL in order to associate additional information
with concepts and individuals.
For better readability, CQLF Ontology axioms are provided in DL notation in Clauses 5 and 6; a link to
the complete RDF/XML serialization of the normative part of the ontology can be found in Clause 7.
Before turning to the DL specification of the CQLF Ontology, a few relevant DL notions will be
introduced [8]:
• concept inclusion ⊑: This operator asserts a logical subsumption relationship between two concept
expressions.
EXAMPLE 1 A ⊑ B asserts that A covers either a subset or the entire set of individuals contained in B; A is
also said to be subsumed by B.
NOTE 1 The same notation is sometimes used to express the opposite relation (A subsumes B) for feature
structures [11, p. 496].
• concept equivalence ≡: This operator asserts an equivalence between two concept expressions.
EXAMPLE 2 A ≡ B asserts that A covers exactly the same set of individuals as B.
• intersection/conjunction ⊓: This operator denotes the intersection of two concept expressions,
i.e. the individuals contained in both concept expressions.
4 © ISO 2021 – All rights reserved

---------------------- Page: 12 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

NOTE 2 A ⊑ B ⊓ C asserts that A is subsumed by B as well as C; it is equivalent to the assertions A ⊑ B
and A ⊑ C
• union/disjunction ⊔: This operator denotes the union of two concept expressions, i.e. the
individuals contained in either or both of the concept expressions.
NOTE 3 A ⊑ B ⊔ C does not imply that A is subsumed by either B or C on its own. Some of the individuals
covered by A might be contained in B and others in C.
• top concept ⊤: denotes the set of all individuals in the domain, i.e. the entire universe. Also referred
to as Thing or the root class.
• bottom concept ⊥: denotes the empty set of individuals in the domain. Also referred to as Nothing
or the empty class.
• concept assertion ∈: This operator asserts that an individual belongs to a concept. Also known as
class assertion because concepts represent classes (see T-Box below).
EXAMPLE 3 x ∈ A asserts that the individual x is a member of the concept A.
• A-Box: The domain of interest is spanned by a universe of individuals which serve as the fundamental
atoms for the ontology of what shall be modelled. They become members of concepts through
concept assertions (also referred to as A-Box axioms) and implicitly through the subsumption
relations expressed by concept inclusion assertions (in the T-Box).
• T-Box: Concepts are represented within the terminological box (T-Box). They are classes into which
individuals are organized by the A-Box axioms. The T-Box thus provides a vocabulary of concepts
and a rule set of hierarchical relations between them (“is-a” relations expressed by concept
inclusion axioms). Ideally, sibling categories cover a mutually exclusive space of sub-categories and/
or individuals.
5.2 Structure of the ontology
The T-Box of the CQLF Ontology consists of three separate taxonomies of concepts. The main taxonomy
describes different facets of the expressive power of CQLs. It is called Expressive Power taxonomy and
is divided into three layers.
Concepts in the top layer are called Functionalities. They represent (families of) individual search
capabilities that can be provided by CQLs at a general level. Functionalities serve as entry points for
navigating the main taxonomy. Functionalities belong to the normative part of the ontology and are
defined in 5.4.
Concepts in the middle layer are called Frames. They represent typical search needs of end users,
which often involve combinations of multiple Functionalities, at a relatively abstract level. For every
Frame, subsumption assertions shall indicate which Functionalities are required for the search need. A
Frame A can also be subsumed by another Frame A' if A extends the search need represented by A'. The
normative part of the ontology does not include any instances of Frames; the structure of the Frame
layer is defined in 5.5.
Concepts in the bottom layer are called Use Cases. They represent parameterized instantiations of
Frames, which should be sufficiently concrete so that it is possible to determine unambiguously whether
a given CQL can satisfy a given Use Case. For every Use Case, a subsumption assertion shall indicate
which Frame is instantiated by the Use Case. There can also be subsumption assertions to multiple
Frames as well as to other Use Cases. The normative part of the ontology does not include any instances
of Use Cases; the structure of the Use Case layer is defined in 5.6.
The second taxonomy of concepts formalizes the CQLF Metamodel defined by CQLF-1 (ISO 24623-1).
Subsumption assertions link all Functionalities to the CQLF Metamodel. Both the CQLF Metamodel
taxonomy (defined in 5.3) and the subsumption assertions (defined in 5.4) belong to the normative part
of the ontology.
© ISO 2021 – All rights reserved 5

---------------------- Page: 13 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

The third taxonomy of concepts represents individual CQLs whose expressive power is described with
respect to the CQLF Ontology. It shall have a flat structure without subsumption assertions between
different CQLs. The normative part of the ontology does not include any instances of CQLs; the structure
of the taxonomy is defined in 5.7.
Individuals in the A-Box are positive conformance statements in the form of parameterized query
expressions. Concept assertions shall assign each individual to a CQL concept (representing the CQL it is
formulated in) and to a Use Case concept (representing the search need it satisfies). The normative part
of the ontology does not include any individuals, i.e. its A-Box is empty. A CQL can also make negative
conformance statements to declare that it cannot satisfy specific Use Cases, Frames or Functionalities
because of its design limitations. As general disjunction assertions for concepts, negative conformance
statements are part of the T-Box. If neither a positive nor a negative conformance statement exists
between a CQL and a given Use Case, it shall be considered undetermined whether or not the CQL can
satisfy the corresponding search need. Positive and negative conformance statements are further
defined in Clause 6.
No concept or subsumption assertion shall be made that would lead to logical inconsistencies in the
ontology.
The overall structure of the CQLF Ontology is illust
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.