Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2: Ontology

This document specifies the structure of an ontology for a fine-grained description of the expressive power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISOÂ 24623-1); the expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;
b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);
c) the structure of the layers of the expressive power taxonomy and the relationships between them, in the form of subsumption assertions;
d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in the form of positive and negative conformance statements.
This document does not define the entire contents of the ontology (see Clause 4).

Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF) - Partie 2: Ontologie

Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del: Ontologija

Ta dokument določa strukturo ontologije za natančen opis izrazne moči korpusnih poizvedovalnih jezikov (CQL) za potrebe iskanja. Ontologijo sestavljajo tri medsebojno povezane taksonomije konceptov: metamodel CQLF (formalizacija ISOÂ 24623-1); taksonomija izrazne moči, ki opisuje različne vidike izrazne moči CQL-ov; in taksonomija CQL-ov.
Ta dokument določa:
a) taksonomijo metamodela CQLF;
b) najvišji sloj taksonomije izrazne moči (pri čemer koncepte imenujemo »funkcionalnosti«);
c) strukturo slojev taksonomije izrazne moči in razmerja med njimi v obliki subsumpcijskih trditev;
d) formalizacijo povezave med taksonomijo CQL in taksonomijo izrazne moči v obliki pozitivnih oziroma negativnih izjav o skladnosti.
Ta dokument ne opredeljuje celotne vsebine ontologije (glej točko 4).

General Information

Status
Published
Public Enquiry End Date
14-Mar-2021
Publication Date
27-Dec-2021
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
16-Dec-2021
Due Date
20-Feb-2022
Completion Date
28-Dec-2021

Buy Standard

Standard
SIST ISO 24623-2:2022 - BARVE na PDF-str 15,25
English language
23 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24623-2:2021 - Language resource management -- Corpus query lingua franca (CQLF)
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
oSIST ISO/DIS 24623-2:2021 - BARVE na PDF-str 15,25
English language
24 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (sample)

SLOVENSKI STANDARD
SIST ISO 24623-2:2022
01-februar-2022

Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del:

Ontologija
Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2:
Ontology

Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF) - Partie 2:

Ontologie
Ta slovenski standard je istoveten z: ISO 24623-2:2021
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24623-2:2022 en

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24623-2:2022
---------------------- Page: 2 ----------------------
SIST ISO 24623-2:2022
INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
ISO 24623-2:2021(E)
© ISO 2021
---------------------- Page: 3 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on

the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below

or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction .................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ..................................................................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................................................................... 1

4 Motivation and aims ......................................................................................................................................................................................... 3

5 Structure and content of a CQLF ontology ................................................................................................................................ 4

5.1 OWL DL formalism .............................................................................................................................................................................. 4

5.2 Structure of the ontology .............................................................................................................................................................. 5

5.3 CQLF metamodel ................................................................................................................................................................................... 7

5.4 Functionalities ........................................................................................................................................................................................ 8

5.5 Frames ........................................................................................................................................................................................................ 11

5.6 Use cases ................................................................................................................................................................................................... 11

5.7 CQLs ...............................................................................................................................................................................................................12

6 Conformance statements .........................................................................................................................................................................12

6.1 Positive conformance statements ......................................................................................................................................12

6.2 Negative conformance statements .................................................................................................................................... 13

Annex A (informative) Illustrative example of a CQLF ontology.........................................................................................15

Bibliography .............................................................................................................................................................................................................................18

iii
© ISO 2021 – All rights reserved
---------------------- Page: 5 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO’s adherence to

the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Introduction

Several families of International Standards codify various aspects of the representation of language

data. These standards describe general corpus-oriented data models in the linguistic annotation

framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic

annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the

lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata

in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to

the standards concerning the representation of language data, the ISO 24623 series focuses on the

exploitation of language data and on ways to satisfy various kinds of information needs targeting these

data.

The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive

construct that establishes means of describing the scope of corpus query languages (CQLs) at a general

level and with a focus on various kinds of data models assumed by query systems, with conformance

conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a

CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well

as the dependencies among them.

Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other

parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on

the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer

taxonomy against which individual CQLs can make positive and negative conformance statements.

Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,

and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers

can enter conformance statements, and where end users can see which CQL to turn to in order to ensure

that their search needs get satisfied. An example of such a platform is given by Reference [13].

© ISO 2021 – All rights reserved
---------------------- Page: 7 ----------------------
SIST ISO 24623-2:2022
---------------------- Page: 8 ----------------------
SIST ISO 24623-2:2022
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope

This document specifies the structure of an ontology for a fine-grained description of the expressive

power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three

interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the

expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a

taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;

b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);

c) the structure of the layers of the expressive power taxonomy and the relationships between them,

in the form of subsumption assertions;

d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in

the form of positive and negative conformance statements.

This document does not define the entire contents of the ontology (see Clause 4).

2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 24612, Language resource management — Linguistic annotation framework (LAF)

ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel

ISO/IEC 10646, Information technology — Universal coded character set (UCS)

W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax

(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11

December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the

following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
© ISO 2021 – All rights reserved
---------------------- Page: 9 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
3.1
CQLF module

subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic

Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-

text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,

dependency and containment).

Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid

terminological ambiguity.

[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in

order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]

3.2
functionality

label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)

contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or

more CQLF modules (3.1)
3.3
frame

label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),

understood as one facet of the expressive power of CQLs (3.5)

Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of

multiple functionalities.
3.4
use case

label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for

which it can be determined unambiguously whether a given query expression (3.8) satisfies the search

need (3.6) or not

Note 1 to entry: Use cases are often parameterized, i.e. they contain variable elements. Parameterized use cases

are satisfied by parameterized query expressions.
3.5
CQL
corpus query language

formal language designed to retrieve specific information from (large) language data collections, and

thereby incorporate certain abstractions over commonly shared data models that make it possible for

the end user (3.7) (or user agents) to address parts of those data models

Note 1 to entry: A CQL defines a syntactic notation for query expressions (3.8) and the corresponding search

semantics, i.e. an intensional specification of the intended result set. For most current CQLs, semantics are

implicitly defined by a particular implementation.

[SOURCE: ISO 24623-1:2018, 3.4, modified — “CQL” has been added as preferred term, “end user” has

replaced “user” in the definition and Note 1 to entry has been added.]
3.6
search need

information pattern that an end user (3.7) wants to locate in a corpus, based on the primary data stream

and/or simple or complex annotation
© ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
3.7
end user
agent who uses a CQL (3.5) to satisfy his or her search needs (3.6)

Note 1 to entry: This can be done via an interactive graphical user interface (GUI), a command-line tool,

programmatically via some application programming interface (API) or by a software program developed by the

end user.
3.8
query expression

string that is syntactically valid in a given CQL (3.5) and can be executed to return a result set

Note 1 to entry: Query expressions are often parameterized with variable elements. No formal specification

of the parameter substitution procedure is attempted, but entries for parameterized query expressions in the

ontology are required to include informal descriptions of the range of admissible values and any transformations

required.
3.9
parameter

variable element in a query expression (3.8) or in the description of a search need (3.6)

3.10
positive conformance statement

assertion that a given CQL (3.5) supports a given use case (3.4) by means of a query expression (3.8)

3.11
negative conformance statement

assertion that a given CQL (3.5) cannot support a given use case (3.4), frame (3.3) or functionality (3.2)

Note 1 to entry: Negative conformance is due to technical unavailability of specific capabilities in the respective

CQL or limitations on the complexity of query expressions (3.8).
3.12
CQL capability
capability
corpus query language capability
facility provided by CQLs (3.5) to meet a specific aspect of search needs (3.6)
3.13
layer
totality of concepts at the same level of abstraction in a CQLF ontology (3.15)
EXAMPLE Functionalities (3.2), frames (3.3), use cases (3.4).
3.14
token
non-empty contiguous sequence of graphemes or phonemes in a document
[SOURCE: ISO 24611:2012, 3.21, modified — Note 1 to entry has been deleted.]
3.15
CQLF ontology

ontology for a fine-grained description of the expressive power of CQLs (3.5) in terms of search needs

(3.6), which adheres to the structure specified in this document
4 Motivation and aims

CQLs differ widely in their basic sets of capabilities. Whereas some are restricted to rather specific

application scenarios, others are able to cover a wider variety of applications and search needs. It is

therefore both the quality and the quantity of CQL capabilities – as well as the degree to which they

can be combined freely – that determine the expressive power of a CQL. A CQLF ontology as specified

© ISO 2021 – All rights reserved
---------------------- Page: 11 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)

in this document is not intended to articulate all the possible combinations of capabilities unless these

are justified by genuine usage. Its aim is to provide representative categories for typical search needs

within a taxonomy of CQL capabilities. These typical search needs evolve with general progress in

the fields of corpus linguistics and digital humanities, and with the discovery of new challenges, new

methods and new research questions. In order to accommodate the dynamic nature of the evolving

search needs, most of the content of such an ontology is outside the scope of standardization. This

document provides a structural framework for this dynamic information (by specifying the three-layer

structure of the expressive power taxonomy, the content of the topmost layer of functionalities, and the

relationships between different layers and taxonomies), ensuring that the ontology can adapt to new

search needs that emerge as the relevant disciplines evolve.

In order to provide a normative skeleton for the ontology while at the same time making provisions for

keeping its main content (search needs and corresponding query expressions) dynamic, this document

does not comprise a normative listing of the middle and bottom layer of the expressive power taxonomy

(i.e. frames and use cases). An exhaustive inventory of concepts at these two layers is not possible

due to the fact that existing CQLs differ widely in the complexity of the supported combinations

of functionalities, that new CQLs can be created offering additional combinations or subtypes of

functionalities, and that new search needs emerge from progress in the relevant research fields. The

frames and use cases of a CQLF ontology are expected to be supplied by a moderated community

process, driven by CQL developers as well as end users (see Reference [13]). For illustration, a sample of

[6] [8]

frames and use cases together with conformance statements linking them with the CQP and ANNIS

query languages is provided in Annex A.

The permissive architecture and terminology defined by this document enables research groups to

extend the relevant parts of the ontology with further CQL capabilities and search needs arising in

future.
The following application scenarios are thus made possible:

— describing the scope and capabilities of a given CQL, in terms of conformance statements against a

CQLF ontology (typically carried out by the CQL developers);

— comparing different CQLs with respect to their ability to meet typical search needs;

— identifying suitable CQLs and query tools that support (combinations of) CQL capabilities required

by an end user, together with examples of the respective query syntax;

— guiding the development of new CQLs and query tools by building an inventory of complex search

needs that are important for the community (typically carried out by end users).
5 Structure and content of a CQLF ontology
5.1 OWL DL formalism
[7]

The taxonomic framework for a CQLF ontology is modelled in OWL 2 DL – a dialect of the Web

Ontology Language (OWL) based on the family of description logics (DL) (see Reference [9]) as a

formal framework. All definitions and requirements of the W3C OWL 2 specification shall be followed.

[10][11]

The normative representation and exchange format for a CQLF ontology is RDF/XML . All

labels and annotations shall be represented as sequences of Unicode code points, in accordance with

ISO/IEC 10646.
W3C OWL 2 DL furnishes developers with a set of tools for:
a) stating concept hierarchies and membership of individuals,
b) defining highly expressive property restrictions.

In particular, this document makes use of the AnnotationProperty construct of OWL DL in order to

associate additional information with concepts and individuals.
© ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)

For better readability, CQLF ontology axioms are provided in DL notation in Clauses 5 and 6 rather than

in the RDF/XML exchange format.
[9]
Relevant DL notions :

— concept inclusion ⊑: This operator asserts a logical subsumption relationship between two concept

expressions.

EXAMPLE 1 A ⊑ B asserts that A covers either a subset or the entire set of individuals contained in B. A is

also said to be subsumed by B.

NOTE 1 The same notation is sometimes used to express the opposite relation (A subsumes B) for feature

structures (see Reference [12], p. 496).

— concept equivalence ≡: This operator asserts an equivalence between two concept expressions.

EXAMPLE 2 A ≡ B asserts that A covers exactly the same set of individuals as B.

— intersection/conjunction ⊓: This operator denotes the intersection of two concept expressions,

i.e. the individuals contained in both concept expressions.

NOTE 2 A ⊑ B ⊓ C asserts that A is subsumed by B as well as C. It is equivalent to the assertions A ⊑ B and

A ⊑ C.

— union/disjunction ⊔: This operator denotes the union of two concept expressions, i.e. the

individuals contained in either or both of the concept expressions.

NOTE 3 A ⊑ B ⊔ C does not imply that A is subsumed by either B or C on its own. Some of the individuals

covered by A can be contained in B and others in C.

— top concept ⊤: denotes the set of all individuals in the domain, i.e. the entire universe. Also referred

to as “Thing” or “the root class”.

— bottom concept ⊥: denotes the empty set of individuals in the domain. Also referred to as “Nothing”

or “the empty class”.

— concept assertion ∈: This operator asserts that an individual belongs to a concept. Also known as

“class assertion” because the concepts represent classes (see T-Box below).
EXAMPLE 3 x ∈ A asserts that the individual x is a member of the concept A.

— A-Box: The domain of interest is spanned by a universe of individuals which serve as the fundamental

atoms for the ontology of what shall be modelled. They become members of concepts through

concept assertions (also referred to as “A-Box axioms”) and implicitly through the subsumption

relations expressed by concept inclusion assertions (in the T-Box).

— T-Box: Concepts are represented within the terminological box (T-Box). They are classes into which

individuals are organized by the A-Box axioms. The T-Box thus provides a vocabulary of concepts

and a rule set of hierarchical relations between them (“is-a” relations expressed by concept

inclusion axioms). Ideally, sibling categories cover a mutually exclusive space of sub-categories and/

or individuals.
5.2 Structure of the ontology

The T-Box of a CQLF ontology consists of three separate taxonomies of concepts. The main taxonomy

describes different facets of the expressive power of CQLs. It is called “expressive power taxonomy” and

is divided into three layers.

Concepts in the top layer are called “functionalities”. They represent (families of) individual search

capabilities that can be provided by CQLs at a general level. Functionalities serve as entry points for

navigating the main taxonomy. Functionalities belong to the normative part of the ontology and are

defined in 5.4.
© ISO 2021 – All rights reserved
---------------------- Page: 13 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)

Concepts in the middle layer are called “frames”. They represent typical search needs of end users,

which often involve combinations of multiple functionalities, at a relatively abstract level. For every

frame, subsumption assertions shall indicate which functionalities are required for the search need. A

frame A can also be subsumed by another frame A′ if A extends the search need represented by A′. The

normative part of the ontology does not include any instances of frames; the structure of the frame

layer is defined in 5.5.

Concepts in the bottom layer are called “use cases”. They represent parameterized instantiations of

frames, which should be sufficiently concrete so that it is possible to determine unambiguously whether

a given CQL can satisfy a given use case. For every use case, a subsumption assertion shall indicate

which frame is instantiated by the use case. There can also be subsumption assertions to multiple

frames as well as to other use cases. The normative part of the ontology does not include any instances

of use cases; the structure of the use case layer is defined in 5.6.

The second taxonomy of concepts formalizes the CQLF metamodel defined in ISO 24623-1. Subsumption

assertions link all functionalities to the CQLF metamodel. Both the CQLF metamodel taxonomy (defined

in 5.3) and the subsumption assertions (defined in 5.4) belong to the normative part of the ontology.

The third taxonomy of concepts represents individual CQLs whose expressive power is described with

respect to the ontology. It shall have a flat structure without subsumption assertions between different

CQLs. The normative part of the ontology does not include any instances of CQLs; the structure of the

taxonomy is defined in 5.7.

Individuals in the A-Box are positive conformance statements in the form of parameterized query

expressions. Concept assertions shall assign each individual to a CQL concept (representing the CQL

it is formulated in) and to a use case concept (representing the search need that the query expression

satisfies). The normative part of the ontology does not include any individuals, i.e. its A-Box is empty.

A CQL can also make negative conformance statements to declare that it cannot satisfy specific use

cases, frames or functionalities because of its design limitations. As general disjunction assertions for

concepts, negative conformance statements are part of the T-Box. If neither a positive nor a negative

conformance statement exists between a CQL and a given use case, it shall be considered undetermined

whether or not the CQL can satisfy the corresponding search need. Positive and negative conformance

statements are further defined in Clause 6.

No concept or subsumption assertion shall be made that would lead to logical inconsistencies in the

ontology.
The overall structure of a CQLF ontology is illustrated in Figure 1.
© ISO 2021 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
Figure 1 — General structure of a CQLF ontology
5.3 CQLF metamodel
[5]

The theoretical concept of modules as standardized in the context of ISO 24623-1 is formalized by the

CQLF metamodel taxonomy. It consists of the concepts and subsumption assertions defined below. Each

concept is identified by its label (as r d f s : l ab el annotation), followed by all its subsumption assertions

within the taxonomy.
Abstract root concepts:
— Metamodel: This is the abstract root concept of the CQLF metamodel taxonomy.
— Level ⊑ Metamodel: This is the abstract root concept of all CQLF levels.
— Module ⊑ Metamodel: This is the abstract root concept of all CQLF modules.
© ISO 2021 – All rights reserved
---------------------- Page: 15 ----------------------
SIST ISO 24623-2:2022
ISO 24623-2:2021(E)
CQLF levels:
— Linear ⊑ Level: Plain-text search as well as search in segmented data.
— Complex ⊑ Level: Search in data annotated with hierarchical structures and/or
...

INTERNATIONAL ISO
STANDARD 24623-2
First edition
2021-12
Language resource management —
Corpus query lingua franca (CQLF) —
Part 2:
Ontology
Gestion des ressources linguistiques — Corpus query lingua franca
(CQLF) —
Partie 2: Ontologie
Reference number
ISO 24623-2:2021(E)
© ISO 2021
---------------------- Page: 1 ----------------------
ISO 24623-2:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on

the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below

or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24623-2:2021(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction .................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ..................................................................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................................................................... 1

4 Motivation and aims ......................................................................................................................................................................................... 3

5 Structure and content of a CQLF ontology ................................................................................................................................ 4

5.1 OWL DL formalism .............................................................................................................................................................................. 4

5.2 Structure of the ontology .............................................................................................................................................................. 5

5.3 CQLF metamodel ................................................................................................................................................................................... 7

5.4 Functionalities ........................................................................................................................................................................................ 8

5.5 Frames ........................................................................................................................................................................................................ 11

5.6 Use cases ................................................................................................................................................................................................... 11

5.7 CQLs ...............................................................................................................................................................................................................12

6 Conformance statements .........................................................................................................................................................................12

6.1 Positive conformance statements ......................................................................................................................................12

6.2 Negative conformance statements .................................................................................................................................... 13

Annex A (informative) Illustrative example of a CQLF ontology.........................................................................................15

Bibliography .............................................................................................................................................................................................................................18

iii
© ISO 2021 – All rights reserved
---------------------- Page: 3 ----------------------
ISO 24623-2:2021(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO’s adherence to

the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology,

Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24623-2:2021(E)
Introduction

Several families of International Standards codify various aspects of the representation of language

data. These standards describe general corpus-oriented data models in the linguistic annotation

framework (LAF) (see ISO 24612), various aspects of the semantic representation in the semantic

annotation framework (SemAF) (see ISO 24617-1 and others), the representation of lexical data in the

lexical markup framework (LMF) (see ISO 24613-1 and others), as well as the representation of metadata

in the component metadata infrastructure (CMDI) (see ISO 24622-1 and others). Complementary to

the standards concerning the representation of language data, the ISO 24623 series focuses on the

exploitation of language data and on ways to satisfy various kinds of information needs targeting these

data.

The corpus query lingua franca (CQLF) metamodel, described in ISO 24623-1, is a maximally permissive

construct that establishes means of describing the scope of corpus query languages (CQLs) at a general

level and with a focus on various kinds of data models assumed by query systems, with conformance

conditions meant to be satisfied by a wide range of CQLs. The metamodel provides a “skeleton” for a

CQL taxonomy by setting up basic categories of corpus queries (encoded as levels and modules) as well

as the dependencies among them.

Consequently, the task of a more concrete characterization of CQLs is meant to be covered in other

parts of the ISO 24623 series. This document establishes a framework for an ontology which focuses on

the generalized information needs satisfied by corpus queries, and which is structured as a multi-layer

taxonomy against which individual CQLs can make positive and negative conformance statements.

Such an ontology allows, on the one hand, a fine-grained comparison of the expressive power of CQLs,

and, on the other hand, it serves a practical purpose, i.e. as a foundation for platforms where developers

can enter conformance statements, and where end users can see which CQL to turn to in order to ensure

that their search needs get satisfied. An example of such a platform is given by Reference [13].

© ISO 2021 – All rights reserved
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24623-2:2021(E)
Language resource management — Corpus query lingua
franca (CQLF) —
Part 2:
Ontology
1 Scope

This document specifies the structure of an ontology for a fine-grained description of the expressive

power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three

interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the

expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a

taxonomy of CQLs.
This document specifies:
a) the taxonomy of the CQLF metamodel;

b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”);

c) the structure of the layers of the expressive power taxonomy and the relationships between them,

in the form of subsumption assertions;

d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in

the form of positive and negative conformance statements.

This document does not define the entire contents of the ontology (see Clause 4).

2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 24612, Language resource management — Linguistic annotation framework (LAF)

ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel

ISO/IEC 10646, Information technology — Universal coded character set (UCS)

W3C-OWL 2-SPEC. OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax

(Second Edition). Motik B., Patel-Schneider, P.F., and Parsia, B. eds. W3C Recommendation, 11

December 2012. Available from: http:// www .w3 .org/ TR/ owl2 -syntax/
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the

following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
© ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
ISO 24623-2:2021(E)
3.1
CQLF module

subcomponent of the CQLF metamodel, defined with reference to a specified data-model characteristic

Note 1 to entry: The CQLF metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-

text, segmentation and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,

dependency and containment).

Note 2 to entry: In 5.3, the containment module is formalized by the concept SpanContainment in order to avoid

terminological ambiguity.

[SOURCE: ISO 24623-1:2018, 3.8, modified — “the CQLF metamodel” has replaced “a CQLF level” in

order to improve clarity outside the context of ISO 24623-1; Note 2 to entry has been added.]

3.2
functionality

label for a concept in a CQLF ontology (3.15) that represents a family of CQL capabilities (3.12)

contributing to the expressive power of a CQL (3.5), formulated at a general level and linked to one or

more CQLF modules (3.1)
3.3
frame

label for a concept in a CQLF ontology (3.15) that represents a typical search need (3.6) of end users (3.7),

understood as one facet of the expressive power of CQLs (3.5)

Note 1 to entry: Most frames arise from the specialization of a functionality (3.2) and/or the combination of

multiple functionalities.
3.4
use case

label for a concept in a CQLF ontology (3.15) that represents a concrete instantiation of a frame (3.3), for

which it can be determined unambiguously whether a given query expression (3.8) satisfies the search

need (3.6) or not

Note 1 to entry: Use cases are often parameterized, i.e. they contain variable elements. Parameterized use cases

are satisfied by parameterized query expressions.
3.5
CQL
corpus query language

formal language designed to retrieve specific information from (large) language data collections, and

thereby incorporate certain abstractions over commonly shared data models that make it possible for

the end user (3.7) (or user agents) to address parts of those data models

Note 1 to entry: A CQL defines a syntactic notation for query expressions (3.8) and the corresponding search

semantics, i.e. an intensional specification of the intended result set. For most current CQLs, semantics are

implicitly defined by a particular implementation.

[SOURCE: ISO 24623-1:2018, 3.4, modified — “CQL” has been added as preferred term, “end user” has

replaced “user” in the definition and Note 1 to entry has been added.]
3.6
search need

information pattern that an end user (3.7) wants to locate in a corpus, based on the primary data stream

and/or simple or complex annotation
© ISO 2021 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24623-2:2021(E)
3.7
end user
agent who uses a CQL (3.5) to satisfy his or her search needs (3.6)

Note 1 to entry: This can be done via an interactive graphical user interface (GUI), a command-line tool,

programmatically via some application programming interface (API) or by a software program developed by the

end user.
3.8
query expression

string that is syntactically valid in a given CQL (3.5) and can be executed to return a result set

Note 1 to entry: Query expressions are often parameterized with variable elements. No formal specification

of the parameter substitution procedure is attempted, but entries for parameterized query expressions in the

ontology are required to include informal descriptions of the range of admissible values and any transformations

required.
3.9
parameter

variable element in a query expression (3.8) or in the description of a search need (3.6)

3.10
positive conformance statement

assertion that a given CQL (3.5) supports a given use case (3.4) by means of a query expression (3.8)

3.11
negative conformance statement

assertion that a given CQL (3.5) cannot support a given use case (3.4), frame (3.3) or functionality (3.2)

Note 1 to entry: Negative conformance is due to technical unavailability of specific capabilities in the respective

CQL or limitations on the complexity of query expressions (3.8).
3.12
CQL capability
capability
corpus query language capability
facility provided by CQLs (3.5) to meet a specific aspect of search needs (3.6)
3.13
layer
totality of concepts at the same level of abstraction in a CQLF ontology (3.15)
EXAMPLE Functionalities (3.2), frames (3.3), use cases (3.4).
3.14
token
non-empty contiguous sequence of graphemes or phonemes in a document
[SOURCE: ISO 24611:2012, 3.21, modified — Note 1 to entry has been deleted.]
3.15
CQLF ontology

ontology for a fine-grained description of the expressive power of CQLs (3.5) in terms of search needs

(3.6), which adheres to the structure specified in this document
4 Motivation and aims

CQLs differ widely in their basic sets of capabilities. Whereas some are restricted to rather specific

application scenarios, others are able to cover a wider variety of applications and search needs. It is

therefore both the quality and the quantity of CQL capabilities – as well as the degree to which they

can be combined freely – that determine the expressive power of a CQL. A CQLF ontology as specified

© ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
ISO 24623-2:2021(E)

in this document is not intended to articulate all the possible combinations of capabilities unless these

are justified by genuine usage. Its aim is to provide representative categories for typical search needs

within a taxonomy of CQL capabilities. These typical search needs evolve with general progress in

the fields of corpus linguistics and digital humanities, and with the discovery of new challenges, new

methods and new research questions. In order to accommodate the dynamic nature of the evolving

search needs, most of the content of such an ontology is outside the scope of standardization. This

document provides a structural framework for this dynamic information (by specifying the three-layer

structure of the expressive power taxonomy, the content of the topmost layer of functionalities, and the

relationships between different layers and taxonomies), ensuring that the ontology can adapt to new

search needs that emerge as the relevant disciplines evolve.

In order to provide a normative skeleton for the ontology while at the same time making provisions for

keeping its main content (search needs and corresponding query expressions) dynamic, this document

does not comprise a normative listing of the middle and bottom layer of the expressive power taxonomy

(i.e. frames and use cases). An exhaustive inventory of concepts at these two layers is not possible

due to the fact that existing CQLs differ widely in the complexity of the supported combinations

of functionalities, that new CQLs can be created offering additional combinations or subtypes of

functionalities, and that new search needs emerge from progress in the relevant research fields. The

frames and use cases of a CQLF ontology are expected to be supplied by a moderated community

process, driven by CQL developers as well as end users (see Reference [13]). For illustration, a sample of

[6] [8]

frames and use cases together with conformance statements linking them with the CQP and ANNIS

query languages is provided in Annex A.

The permissive architecture and terminology defined by this document enables research groups to

extend the relevant parts of the ontology with further CQL capabilities and search needs arising in

future.
The following application scenarios are thus made possible:

— describing the scope and capabilities of a given CQL, in terms of conformance statements against a

CQLF ontology (typically carried out by the CQL developers);

— comparing different CQLs with respect to their ability to meet typical search needs;

— identifying suitable CQLs and query tools that support (combinations of) CQL capabilities required

by an end user, together with examples of the respective query syntax;

— guiding the development of new CQLs and query tools by building an inventory of complex search

needs that are important for the community (typically carried out by end users).
5 Structure and content of a CQLF ontology
5.1 OWL DL formalism
[7]

The taxonomic framework for a CQLF ontology is modelled in OWL 2 DL – a dialect of the Web

Ontology Language (OWL) based on the family of description logics (DL) (see Reference [9]) as a

formal framework. All definitions and requirements of the W3C OWL 2 specification shall be followed.

[10][11]

The normative representation and exchange format for a CQLF ontology is RDF/XML . All

labels and annotations shall be represented as sequences of Unicode code points, in accordance with

ISO/IEC 10646.
W3C OWL 2 DL furnishes developers with a set of tools for:
a) stating concept hierarchies and membership of individuals,
b) defining highly expressive property restrictions.

In particular, this document makes use of the AnnotationProperty construct of OWL DL in order to

associate additional information with concepts and individuals.
© ISO 2021 – All rights reserved
---------------------- Page: 9 ----------------------
ISO 24623-2:2021(E)

For better readability, CQLF ontology axioms are provided in DL notation in Clauses 5 and 6 rather than

in the RDF/XML exchange format.
[9]
Relevant DL notions :

— concept inclusion ⊑: This operator asserts a logical subsumption relationship between two concept

expressions.

EXAMPLE 1 A ⊑ B asserts that A covers either a subset or the entire set of individuals contained in B. A is

also said to be subsumed by B.

NOTE 1 The same notation is sometimes used to express the opposite relation (A subsumes B) for feature

structures (see Reference [12], p. 496).

— concept equivalence ≡: This operator asserts an equivalence between two concept expressions.

EXAMPLE 2 A ≡ B asserts that A covers exactly the same set of individuals as B.

— intersection/conjunction ⊓: This operator denotes the intersection of two concept expressions,

i.e. the individuals contained in both concept expressions.

NOTE 2 A ⊑ B ⊓ C asserts that A is subsumed by B as well as C. It is equivalent to the assertions A ⊑ B and

A ⊑ C.

— union/disjunction ⊔: This operator denotes the union of two concept expressions, i.e. the

individuals contained in either or both of the concept expressions.

NOTE 3 A ⊑ B ⊔ C does not imply that A is subsumed by either B or C on its own. Some of the individuals

covered by A can be contained in B and others in C.

— top concept ⊤: denotes the set of all individuals in the domain, i.e. the entire universe. Also referred

to as “Thing” or “the root class”.

— bottom concept ⊥: denotes the empty set of individuals in the domain. Also referred to as “Nothing”

or “the empty class”.

— concept assertion ∈: This operator asserts that an individual belongs to a concept. Also known as

“class assertion” because the concepts represent classes (see T-Box below).
EXAMPLE 3 x ∈ A asserts that the individual x is a member of the concept A.

— A-Box: The domain of interest is spanned by a universe of individuals which serve as the fundamental

atoms for the ontology of what shall be modelled. They become members of concepts through

concept assertions (also referred to as “A-Box axioms”) and implicitly through the subsumption

relations expressed by concept inclusion assertions (in the T-Box).

— T-Box: Concepts are represented within the terminological box (T-Box). They are classes into which

individuals are organized by the A-Box axioms. The T-Box thus provides a vocabulary of concepts

and a rule set of hierarchical relations between them (“is-a” relations expressed by concept

inclusion axioms). Ideally, sibling categories cover a mutually exclusive space of sub-categories and/

or individuals.
5.2 Structure of the ontology

The T-Box of a CQLF ontology consists of three separate taxonomies of concepts. The main taxonomy

describes different facets of the expressive power of CQLs. It is called “expressive power taxonomy” and

is divided into three layers.

Concepts in the top layer are called “functionalities”. They represent (families of) individual search

capabilities that can be provided by CQLs at a general level. Functionalities serve as entry points for

navigating the main taxonomy. Functionalities belong to the normative part of the ontology and are

defined in 5.4.
© ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
ISO 24623-2:2021(E)

Concepts in the middle layer are called “frames”. They represent typical search needs of end users,

which often involve combinations of multiple functionalities, at a relatively abstract level. For every

frame, subsumption assertions shall indicate which functionalities are required for the search need. A

frame A can also be subsumed by another frame A′ if A extends the search need represented by A′. The

normative part of the ontology does not include any instances of frames; the structure of the frame

layer is defined in 5.5.

Concepts in the bottom layer are called “use cases”. They represent parameterized instantiations of

frames, which should be sufficiently concrete so that it is possible to determine unambiguously whether

a given CQL can satisfy a given use case. For every use case, a subsumption assertion shall indicate

which frame is instantiated by the use case. There can also be subsumption assertions to multiple

frames as well as to other use cases. The normative part of the ontology does not include any instances

of use cases; the structure of the use case layer is defined in 5.6.

The second taxonomy of concepts formalizes the CQLF metamodel defined in ISO 24623-1. Subsumption

assertions link all functionalities to the CQLF metamodel. Both the CQLF metamodel taxonomy (defined

in 5.3) and the subsumption assertions (defined in 5.4) belong to the normative part of the ontology.

The third taxonomy of concepts represents individual CQLs whose expressive power is described with

respect to the ontology. It shall have a flat structure without subsumption assertions between different

CQLs. The normative part of the ontology does not include any instances of CQLs; the structure of the

taxonomy is defined in 5.7.

Individuals in the A-Box are positive conformance statements in the form of parameterized query

expressions. Concept assertions shall assign each individual to a CQL concept (representing the CQL

it is formulated in) and to a use case concept (representing the search need that the query expression

satisfies). The normative part of the ontology does not include any individuals, i.e. its A-Box is empty.

A CQL can also make negative conformance statements to declare that it cannot satisfy specific use

cases, frames or functionalities because of its design limitations. As general disjunction assertions for

concepts, negative conformance statements are part of the T-Box. If neither a positive nor a negative

conformance statement exists between a CQL and a given use case, it shall be considered undetermined

whether or not the CQL can satisfy the corresponding search need. Positive and negative conformance

statements are further defined in Clause 6.

No concept or subsumption assertion shall be made that would lead to logical inconsistencies in the

ontology.
The overall structure of a CQLF ontology is illustrated in Figure 1.
© ISO 2021 – All rights reserved
---------------------- Page: 11 ----------------------
ISO 24623-2:2021(E)
Figure 1 — General structure of a CQLF ontology
5.3 CQLF metamodel
[5]

The theoretical concept of modules as standardized in the context of ISO 24623-1 is formalized by the

CQLF metamodel taxonomy. It consists of the concepts and subsumption assertions defined below. Each

concept is identified by its label (as r d f s : l ab el annotation), followed by all its subsumption assertions

within the taxonomy.
Abstract root concepts:
— Metamodel: This is the abstract root concept of the CQLF metamodel taxonomy.
— Level ⊑ Metamodel: This is the abstract root concept of all CQLF levels.
— Module ⊑ Metamodel: This is the abstract root concept of all CQLF modules.
© ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
ISO 24623-2:2021(E)
CQLF levels:
— Linear ⊑ Level: Plain-text search as well as search in segmented data.

— Complex ⊑ Level: Search in data annotated with hierarchical structures and/or dependency

information, or querying simple annotations by means of containment-based queries.

— Concurrent ⊑ Level: Search in multiple concurrent (overlapping, intersecting and often conflicting)

annotations built upon a single data stream.
CQLF modules:
— PlainText ⊑ Module ⊓ Linear: Segmentation-independent string search.

— SimpleAnnotation ⊑ Module ⊓ Linear: Segmentation-based search for annotations describing the

primary data stream; understood more generally as search for annotations of individual objects in

the context of this document.

— Segmentation ⊑ Module ⊓ Linear: Search for segmental annotation, in particular tokens and token

sequences.

— Hierarchical ⊑ Module ⊓ Complex: Tree-based representations, e.g. for phrase-structure

description.

— Dependency ⊑ Module ⊓ Complex: Identification of relationships in which objects function as

nodes linked by directed arcs.

— SpanContainment ⊑ Module ⊓ Complex: Non-recursive simplified hierarchical relationships

encoded as character span containment.

— Paradigmatic ⊑ Module ⊓ Concurrent: Different annotation layers providing data packages

describing the same location.

— Overlapping ⊑ Module ⊓ Concurrent: Concurrent annotations built upon character spans which

overlap in their start and/or end offsets.
As the coarsest categories of corpus
...

SLOVENSKI STANDARD
oSIST ISO/DIS 24623-2:2021
01-marec-2021

Upravljanje jezikovnih virov - Lingua franca za korpusne poizvedbe (CQLF) - 2. del:

Ontologija
Language resource management -- Corpus Query Lingua Franca (CQLF) - Part 2:
Ontology
Gestion des ressources linguistiques -- Corpus Query Lingua Franca (CQLF)
Ta slovenski standard je istoveten z: ISO/DIS 24623-2
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
oSIST ISO/DIS 24623-2:2021 en

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST ISO/DIS 24623-2:2021
---------------------- Page: 2 ----------------------
oSIST ISO/DIS 24623-2:2021
DRAFT INTERNATIONAL STANDARD
ISO/DIS 24623-2
ISO/TC 37/SC 4 Secretariat: KATS
Voting begins on: Voting terminates on:
2021-01-07 2021-04-01
Language resource management — Corpus Query Lingua
Franca (CQLF) —
Part 2:
Ontology
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 24623-2:2021(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION. ISO 2021
---------------------- Page: 3 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Motivation and aims ......................................................................................................................................................................................... 3

5 CQLF Ontology ......................................................................................................................................................................................................... 4

5.1 OWL DL formalism .............................................................................................................................................................................. 4

5.2 Structure of the ontology ............................................................................................................................................................... 5

5.3 CQLF Metamodel ................................................................................................................................................................................... 7

5.4 Functionalities ......................................................................................................................................................................................... 9

5.5 Frames .........................................................................................................................................................................................................11

5.6 Use Cases ...................................................................................................................................................................................................12

5.7 CQLs ...............................................................................................................................................................................................................12

6 Conformance statements ..........................................................................................................................................................................13

6.1 Positive conformance statements ........................................................................................................................................13

6.2 Negative conformance statements ......................................................................................................................................14

7 RDF/XML serialization ................................................................................................................................................................................14

Annex A (informative) Illustrative examples of non-normative elements in the CQLF Ontology ......15

Annex B (informative) CQLF Ontology: Moderated community process ......................................................................18

Bibliography .............................................................................................................................................................................................................................19

© ISO 2021 – All rights reserved iii
---------------------- Page: 5 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 37, Language and Terminology,

Subcommittee SC 4, Language Resource management.
A list of all parts in the ISO 24623 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)
Introduction

Technical Committee ISO/TC 37, Language and Terminology, Subcommittee SC 4, Language Resource

management has developed several families of standards codifying various aspects of representation

of language data. These standards describe general corpus-oriented data models in the Linguistic

Annotation Framework (LAF, ISO 24612) family, various aspects of the semantic representation in the

family of the Semantic Annotation Framework (SemAF, ISO 24617-1 and others), the representation

of lexical data in the Lexical Markup Framework family (LMF, ISO 24613-1 and others), as well as the

representation of metadata in the Component Metadata Infrastructure (CMDI, ISO 24622-1 and others).

Complementary to the standards concerning the representation of language data, the Corpus Query

Lingua Franca (henceforth CQLF) family of standards (ISO 24623) focuses on the exploitation of

language data and ways to satisfy various kinds of information needs targeting these data.

The CQLF Metamodel, described by ISO 24623-1 (CQLF-1), is a maximally permissive construct that

establishes means of describing the scope of corpus query languages (CQLs) at a general level and

with a focus on various kinds of data models assumed by query systems, with conformance conditions

meant to be satisfied by a wide range of CQLs. The Metamodel provides a “skeleton” for a CQL taxonomy

by setting up basic categories of corpus queries (encoded as CQLF-1 levels and modules) as well as the

dependencies among them.

Consequently, the task of a more concrete characterization of CQLs falls to other members of the CQLF

standard family. This document (ISO 24623-2, “CQLF-2” for short) establishes an ontology which

focuses on the generalized information needs satisfied by corpus queries, and which is structured as

a multi-layer taxonomy against which individual CQLs can make positive and negative conformance

statements.

Establishing this ontology allows, on the one hand, a fine-grained comparison of the expressive power

of CQLs, and, on the other hand, it is going to serve a practical purpose: as a foundation for a platform

where developers can enter conformance statements, and where end users can see which CQL to turn

to in order to ensure that their search needs get satisfied.
© ISO 2021 – All rights reserved v
---------------------- Page: 7 ----------------------
oSIST ISO/DIS 24623-2:2021
---------------------- Page: 8 ----------------------
oSIST ISO/DIS 24623-2:2021
DRAFT INTERNATIONAL STANDARD ISO/DIS 24623-2:2021(E)
Language resource management — Corpus Query Lingua
Franca (CQLF) —
Part 2:
Ontology
1 Scope

This document defines an ontology for a fine-grained description of the expressive power of CQLs in

terms of search needs. The ontology consists of three interrelated taxonomies of concepts: a) the CQLF

Metamodel (a formalization of CQLF-1), b) the Expressive Power taxonomy, which describes different

facets of the expressive power of CQLs, and c) a taxonomy of CQLs.

The normative parts of this document comprise a) the taxonomy of the CQLF Metamodel, b) the

Functionality layer of the Expressive Power taxonomy, c) the structure of the layers of the Expressive

Power taxonomy and the relationships between them, in the form of subsumption assertions, as well as

d) the formalization of the linkage between the CQL taxonomy and the Expressive Power taxonomy, in

the form of positive and negative conformance statements.

This document does not provide a normative listing of the middle and bottom layer of the Expressive

Power taxonomy (called Frames and Use Cases, respectively). An exhaustive inventory of the concepts

at these two layers is not possible due to the fact that existing CQLs differ widely in the complexity of

the supported combinations of Functionalities and that new CQLs can be created offering additional

combinations or subtypes of Functionalities. Frames and Use Cases are expected to be filled in through

a moderated community process, driven by CQL developers as well as end users. An informative annex

to this document contains a sample of Frames and Use Cases together with conformance statements

[5] [7]
linking them with the CQP and ANNIS query languages.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
ISO 24612, Language resource management — Linguistic annotation framework (LAF)

ISO 24623-1, Language resource management — Corpus query lingua franca (CQLF) — Part 1: Metamodel

Motik B. Patel-Schneider, Peter F., and Parsia, B. (2012). OWL 2 Web Ontology Language: Structural

Specification and Functional-Style Syntax (Second Edition). W3C Recommendation, 11 December 2012.

(Latest version available at http:// www .w3 .org/ TR/ owl2 -syntax/ .)
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 24612, ISO 24623-1 and the

following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
© ISO 2021 – All rights reserved 1
---------------------- Page: 9 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
CQLF module

subcomponent of the CQLF Metamodel, defined with reference to a specified data model characteristic

Note 1 to entry: The CQLF Metamodel currently distinguishes three modules within CQLF Level 1, Linear (plain-

text, segmentation, and simple annotation), and three modules within CQLF Level 2, Complex (hierarchical,

dependency, and containment).

[SOURCE: ISO 24623-1:2018, 3.8, modified – “a CQLF level” was replaced with “the CQLF Metamodel” in

order to improve clarity outside the context of ISO 24623-1.]
3.2
Functionality

label for a concept in the CQLF Ontology that represents a family of capabilities contributing to the

expressive power of a CQL, formulated at a general level and linked to one or more CQLF modules

3.3
Frame

label for a concept in the CQLF Ontology that represents a typical search need of end users, understood

as one facet of the expressive power of CQLs

Note 1 to entry: Most Frames arise from the specialization of a Functionality and/or the combination of multiple

Functionalities.
3.4
Use Case

label for a concept in the CQLF Ontology that represents a concrete instantiation of a Frame, for which

it can be determined unambiguously whether a given query expression satisfies the search need or not

Note 1 to entry: Use Cases are often parameterized, i.e. they contain variable elements. Parameterized Use Cases

are satisfied by parameterized query expressions.
3.5
CQL
corpus query language

formal language designed to retrieve specific information from (large) language data collections, and

thereby incorporate certain abstractions over commonly shared data models that make it possible for

the end user (or user agents) to address parts of those data models

Note 1 to entry: A CQL defines a syntactic notation for query expressions and the corresponding search semantics,

i.e. an intensional specification of the intended result set. For most current CQLs, semantics are implicitly defined

by a particular implementation.

[SOURCE: ISO 24623-1:2018, 3.4, modified – “user” was replaced with “end user” in the definition, the

abbreviation CQL was added as preferred term, and Note 1 to entry was added.]
3.6
search need

information pattern that an end user wants to locate in a corpus, based on the primary data stream

and/or simple or complex annotation
3.7
end user
agent who uses a CQL to satisfy his or her search needs

Note 1 to entry: This can be done via an interactive GUI, a command-line tool, programmatically via some API, or

by a software program developed by the end user.
2 © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)
3.8
query expression

string that is syntactically valid in a given CQL and can be executed to return a result set

Note 1 to entry: Query expressions are often parameterized with variable elements. No formal specification

of the parameter substitution procedure is attempted, but entries for parameterized query expressions in the

ontology are required to include informal descriptions of the range of admissible values and any transformations

required.
3.9
parameter
variable element in a query expression or in the description of a search need
3.10
positive conformance statement

assertion that a given CQL supports a given Use Case by means of a query expression

3.11
negative conformance statement

assertion that a given CQL cannot support a given Use Case, Frame or Functionality

Note 1 to entry: Negative conformance is due to technical unavailability of specific capabilities in the respective

CQL or limitations on the complexity of query expressions.
3.12
CQL capability
corpus query language capability
facility provided by CQLs to meet a specific aspect of search needs
3.13
layer
totality of concepts at the same level of abstraction in the CQLF Ontology
EXAMPLE Functionalities, Frames, Use Cases
3.14
token
non-empty contiguous sequence of graphemes or phonemes in a document
[SOURCE: ISO 24611:2012, 3.21, modified — The note was deleted.]
4 Motivation and aims

CQLs differ widely in their basic sets of capabilities. Whereas some are restricted to rather specific

application scenarios, others are able to cover a wider variety of applications and search needs. It is

therefore both the quality and the quantity of CQL capabilities – as well as the degree of their combination

– that determine the expressive power of a CQL. The CQLF Ontology is not intended to articulate all the

possible combinations of capabilities unless these are justified by genuine usage. Its aim is to provide

representative categories for typical search needs within a taxonomy of CQL capabilities.

Yet another important aspect is the degree of explication that a CQL delivers. Some CQLs are able to

express a particular search need in a condensed and highly specialized manner, while others rely on

complex combinations of a few elementary capabilities. The CQLF Ontology leverages information from

CQLs with a more explicit formalization of capabilities in order to create a systematic taxonomy of

search needs and thus be able to classify CQLs of rather implicit formalization. Their respective degree

of explication is visible in the parameterized query expressions included in all positive conformance

statements.

This document defines the structure of an ontology representing CQL capabilities and search needs in a

taxonomy consisting of three layers of varying degrees of abstraction. It also provides formal means for

stating the conformance of individual CQLs to this central taxonomy.
© ISO 2021 – All rights reserved 3
---------------------- Page: 11 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

End users navigate the taxonomy starting from a compact layer of CQLF capabilities. Selecting a subset

of relevant capabilities allows them to locate relevant search needs in the middle layer of the taxonomy

efficiently, and then, in the bottom layer, choose a concrete instantiation of the search need that is

closest to their requirements. This instantiation links to all CQLs that satisfy the selected search need

and provides a parameterized query expression for each CQL.

The definition of the structure of the CQLF Ontology as described in this document is expected to be

instantiated and expanded in a dynamic community-based project (see Annex B). The permissive

architecture and terminology defined by CQLF-2 enables research groups to extend the relevant parts

of the ontology with further CQL capabilities and search needs.
CQLF-2 is primarily intended for the following application scenarios:

• describing the scope and capabilities of a given CQL, in terms of conformance statements against the

CQLF Ontology (typically carried out by the CQL developers);

• comparing different CQLs with respect to their ability to meet typical search needs;

• identifying suitable CQLs and query tools that support (combinations of) CQL capabilities required

by an end user, together with examples of the respective query syntax; and

• guiding the development of new CQLs and query tools by building an inventory of complex search

needs that are important for the community (typically carried out by end users).
5 CQLF Ontology
5.1 OWL DL formalism

The taxonomic framework of the CQLF Ontology is modelled in OWL 2 DL [6] – a dialect of the Web

Ontology Language (OWL) based on the family of description logics (hence DL, see [8]) as a formal

framework. All definitions and requirements of the OWL 2 Specification shall be followed. The

normative representation and exchange format for the CQLF Ontology is RDF/XML ([9], [10]). All labels

and annotations shall be represented as sequences of Unicode code points, following ISO/IEC 10646.

OWL 2 DL furnishes developers with a set of tools for a) stating concept hierarchies and membership

of individuals and b) defining highly expressive property restrictions. In particular, the CQLF Ontology

makes use of the AnnotationProperty construct of OWL DL in order to associate additional information

with concepts and individuals.

For better readability, CQLF Ontology axioms are provided in DL notation in Clauses 5 and 6; a link to

the complete RDF/XML serialization of the normative part of the ontology can be found in Clause 7.

Before turning to the DL specification of the CQLF Ontology, a few relevant DL notions will be

introduced [8]:

• concept inclusion ⊑: This operator asserts a logical subsumption relationship between two concept

expressions.

EXAMPLE 1 A ⊑ B asserts that A covers either a subset or the entire set of individuals contained in B; A is

also said to be subsumed by B.

NOTE 1 The same notation is sometimes used to express the opposite relation (A subsumes B) for feature

structures [11, p. 496].

• concept equivalence ≡: This operator asserts an equivalence between two concept expressions.

EXAMPLE 2 A ≡ B asserts that A covers exactly the same set of individuals as B.

• intersection/conjunction ⊓: This operator denotes the intersection of two concept expressions,

i.e. the individuals contained in both concept expressions.
4 © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

NOTE 2 A ⊑ B ⊓ C asserts that A is subsumed by B as well as C; it is equivalent to the assertions A ⊑ B

and A ⊑ C

• union/disjunction ⊔: This operator denotes the union of two concept expressions, i.e. the

individuals contained in either or both of the concept expressions.

NOTE 3 A ⊑ B ⊔ C does not imply that A is subsumed by either B or C on its own. Some of the individuals

covered by A might be contained in B and others in C.

• top concept ⊤: denotes the set of all individuals in the domain, i.e. the entire universe. Also referred

to as Thing or the root class.

• bottom concept ⊥: denotes the empty set of individuals in the domain. Also referred to as Nothing

or the empty class.

• concept assertion ∈: This operator asserts that an individual belongs to a concept. Also known as

class assertion because concepts represent classes (see T-Box below).
EXAMPLE 3 x ∈ A asserts that the individual x is a member of the concept A.

• A-Box: The domain of interest is spanned by a universe of individuals which serve as the fundamental

atoms for the ontology of what shall be modelled. They become members of concepts through

concept assertions (also referred to as A-Box axioms) and implicitly through the subsumption

relations expressed by concept inclusion assertions (in the T-Box).

• T-Box: Concepts are represented within the terminological box (T-Box). They are classes into which

individuals are organized by the A-Box axioms. The T-Box thus provides a vocabulary of concepts

and a rule set of hierarchical relations between them (“is-a” relations expressed by concept

inclusion axioms). Ideally, sibling categories cover a mutually exclusive space of sub-categories and/

or individuals.
5.2 Structure of the ontology

The T-Box of the CQLF Ontology consists of three separate taxonomies of concepts. The main taxonomy

describes different facets of the expressive power of CQLs. It is called Expressive Power taxonomy and

is divided into three layers.

Concepts in the top layer are called Functionalities. They represent (families of) individual search

capabilities that can be provided by CQLs at a general level. Functionalities serve as entry points for

navigating the main taxonomy. Functionalities belong to the normative part of the ontology and are

defined in 5.4.

Concepts in the middle layer are called Frames. They represent typical search needs of end users,

which often involve combinations of multiple Functionalities, at a relatively abstract level. For every

Frame, subsumption assertions shall indicate which Functionalities are required for the search need. A

Frame A can also be subsumed by another Frame A' if A extends the search need represented by A'. The

normative part of the ontology does not include any instances of Frames; the structure of the Frame

layer is defined in 5.5.

Concepts in the bottom layer are called Use Cases. They represent parameterized instantiations of

Frames, which should be sufficiently concrete so that it is possible to determine unambiguously whether

a given CQL can satisfy a given Use Case. For every Use Case, a subsumption assertion shall indicate

which Frame is instantiated by the Use Case. There can also be subsumption assertions to multiple

Frames as well as to other Use Cases. The normative part of the ontology does not include any instances

of Use Cases; the structure of the Use Case layer is defined in 5.6.

The second taxonomy of concepts formalizes the CQLF Metamodel defined by CQLF-1 (ISO 24623-1).

Subsumption assertions link all Functionalities to the CQLF Metamodel. Both the CQLF Metamodel

taxonomy (defined in 5.3) and the subsumption assertions (defined in 5.4) belong to the normative part

of the ontology.
© ISO 2021 – All rights reserved 5
---------------------- Page: 13 ----------------------
oSIST ISO/DIS 24623-2:2021
ISO/DIS 24623-2:2021(E)

The third taxonomy of concepts represents individual CQLs whose expressive power is described with

respect to the CQLF Ontology. It shall have a flat structure without subsumption assertions between

different CQLs. The normative part of the ontology does not include any instances of CQLs; the structure

of the taxonomy is defined in 5.7.

Individuals in the A-Box are positive conformance statements in the form of parameterized query

expressions. Concept assertions shall assign each individual to a CQL concept (representing the CQL it is

formulated in) and to a Use Case concept (representing the search need it satisfies). The normative part

of the ontology does not include any individuals, i.e. its A-Box is empty. A CQL can also make negative

conformance statements to declare that it cannot satisfy specific Use Cases, Frames or Functionalities

because of its design limitations. As general disjunction assertions for concepts, negative conformance

statements are part of the T-Box. If neither a positive nor a negative conformance statement exists

between a CQL and a given Use Case, it shall be considered undetermined whether or not the CQL can

satisfy the corresponding search need. Positive and negative conformance statements are further

defined in Clause 6.

No concept or subsumption assertion shall be made that would lead to logical inconsistencies in the

ontology.
The overall structure of the CQLF Ontology is illust
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.