Language resource management -- Component Metadata Infrastructure (CMDI) -- Part 1: The Component Metadata Model

The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of
interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this
model can be used to describe resources at different levels of granularity (e.g. descriptions both on the
collection level and on the level of individual resources).

Gestion des ressources langagières -- Composante infrastructure de métadonnées (CMDI) -- Partie 1: Composant modèle de métadonnées

Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) - 1. del: Model komponentnih metapodatkov

Področje uporabe tega dela standarda ISO 24622 je opis modela, ki omogoča prilagodljivo zgradbo interoperabilnih shem metapodatkov za jezikovne vire (LR). Sheme metapodatkov, ki temeljijo na tem modelu, je mogoče uporabiti za opisovanje virov na različnih ravneh granularnosti (npr. opise tako na ravni zbirke kot tudi na ravni posameznega vira).

General Information

Status
Published
Publication Date
23-Aug-2018
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
30-Jul-2018
Due Date
04-Oct-2018
Completion Date
24-Aug-2018

Buy Standard

Standard
SIST ISO 24622-1:2018 - BARVE na PDF-str 7,13,15,16,17
English language
17 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Standard
ISO 24622-1:2015 - Language resource management -- Component Metadata Infrastructure (CMDI)
English language
11 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
SIST ISO 24622-1:2018 - BARVE na PDF-str 7,13,15,16,17
English language
17 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day

Standards Content (sample)

SLOVENSKI STANDARD
SIST ISO 24622-1:2018
01-september-2018

Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) -

1. del: Model komponentnih metapodatkov
Language resource management -- Component Metadata Infrastructure (CMDI) -- Part
1: The Component Metadata Model
Gestion des ressources langagières -- Composante infrastructure de métadonnées
(CMDI) -- Partie 1: Composant modèle de métadonnées
Ta slovenski standard je istoveten z: ISO 24622-1:2015
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24622-1:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24622-1:2018
---------------------- Page: 2 ----------------------
SIST ISO 24622-1:2018
INTERNATIONAL ISO
STANDARD 24622-1
First edition
2015-02-01
Language resource management —
Component Metadata Infrastructure
(CMDI) —
Part 1:
The Component Metadata Model
Gestion des ressources langagières — Composante infrastructure de
métadonnées (CMDI) —
Partie 1: Composant modèle de métadonnées
Reference number
ISO 24622-1:2015(E)
ISO 2015
---------------------- Page: 3 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2015

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2015 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Terms and definitions ..................................................................................................................................................................................... 1

3 Metadata schema availability and reuse ..................................................................................................................................... 5

3.1 Overview ...................................................................................................................................................................................................... 5

3.2 Metadata components and elements ................................................................................................................................... 5

4 Semantics in the component metadata model ...................................................................................................................... 7

4.1 Overview ...................................................................................................................................................................................................... 7

4.2 Concept registries ................................................................................................................................................................................. 8

4.3 Relation registries ................................................................................................................................................................................ 8

5 Metadata component and profile - compatibility and versioning ....................................................................9

6 Expressiveness of the component metadata model ......................................................................................................... 9

Annex A (informative) Abbreviations ...............................................................................................................................................................10

Bibliography .............................................................................................................................................................................................................................11

© ISO 2015 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any

patent rights identified during the development of the document will be in the Introduction and/or on

the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity

assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers

to Trade (TBT), see the following URL: Foreword — Supplementary information.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.

ISO 24622 consists of the following part, under the general title Language resource management —

Component metadata infrastructure (CMDI):
— Part 1: The component metadata model
A future part will address the component metadata specific language.
iv © ISO 2015 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
Introduction

Component Metadata (CMD) is an approach to metadata modelling and metadata creation. It is being

increasingly used these days to enable the metadata description of different types of Language

Resources (LRs) with different metadata schemas, while still trying to maintain syntactic and semantic

interoperability.
[1]

CMD is also the core of the Component Metadata Infrastructure (CMDI) : this infrastructure contains

not only the format specifications for this metadata modelling and creation approach, but also a set of

registries and tools for metadata modelling and creation work.

The advantages of having such a unified approach to metadata descriptions for LRs, an approach that

will be usable by many projects and initiatives, are obvious: firstly, there is a better chance of obtaining

interoperability between metadata descriptions from different sources, and secondly, it will be possible

to develop and share tools that work much more efficiently in this metadata framework.

The challenge of designing and organizing a comprehensive and unified approach to metadata

description for the very varied set of LR types, and one that also can satisfy a sufficiently large section

of the LR community, should not be underestimated. The landscape of metadata for LRs has been, and

continues to be, fragmented. Until recently, it was the practice in creating the metadata descriptions for

LRs to choose a specific metadata schema from a (small) existing set derived either from widespread

[2] [3]

traditions or from other disciplines; for example, OLAC is an adapted version of DCMI, which in turn

originates in the library world. Additionally, there are, for the purposes of LR metadata description,

specifically developed metadata schemas that can be limited in application to specific types of LR (e.g.

[4]

IMDI ), or they can be of a proprietary nature (cf. the catalogues of the LR agencies such as LDC and

ELRA ). The result is a domain of LR metadata that is far from interoperable. Although some progress

has been made in developing dedicated bridges for “translating” metadata from one specific schema to

another and in providing a consolidated catalogue, this practice does not scale well since it depends on

specific translations for each pair of different metadata schemas.

For some recent projects, founding principles have included the unification and consolidation of practices

and the need to produce efficient and sufficiently specific metadata descriptions.

It follows that a number of international, European, and national projects and infrastructure initiatives

[5] [6]

such as CLARIN and META-SHARE now share the CMD approach to metadata for LRs. This

International Standard will both standardize the fundamentals of this approach in order to achieve

interoperability based on solid documentation, and foster cooperation between the various initiatives

and projects that work on, and with, this International Standard.

The model description is the first part of an infrastructure that forms a complete package for the

creation of metadata schemas. As stated in the Foreword, the complete infrastructure standard contains,

in addition to this component metadata model specification (ISO 24622-1), one or more metadata

component specification languages (planned), and a number of recommended metadata components

and profiles (planned). Since this part of ISO 24622 specifies an abstract model, we will rely mainly on

[7]
UML to describe it.
Figure 1 — Describing resources with metadata
1) Abbreviations are explained in Annex A.
2) Linguistic Data Consortium, http://www.ldc.upenn.edu/
3) European Language Resources Association, http://www.elra.info/
© ISO 2015 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

This part of ISO 24622 addresses the basic need to provide a model that makes it easy for metadata

modellers (e.g. researchers and resource description experts) to create new metadata schemas, which

can in turn be used either to describe new types of resources or to enable a more appropriate description

for resources in specific circumstances. The metadata schema is instantiated into metadata records [i.e.

the metadata descriptions that describe the actual resource(s)] (see Figure 1).

The context of this desire for flexible metadata modelling is that for scientific work there are usually

various requirements for the proper description of LRs, and these requirements can derive from the

specific needs of a project or from the facility or repository that will be used to store the resource for

future use. This variation requires a flexible framework that enables the easy creation of new metadata

schemas for different purposes, but is also a framework (i) in which the instantiations have a strictly

defined format so that at least syntactic correctness can be checked, and (ii) which provides explicit

semantics for the metadata schema elements for interpretation of the metadata record content.

The metadata descriptions generated by schemas compliant with this model will also be compliant

with other TC 37 International Standards, for example, those requiring that references to the described

[9]

resources and resource parts use ISO 24619:2011 PISA-compatible persistent identifiers (PIDs) .

The definition of a resource in this context is very broad. This part of ISO 24622 takes a pragmatic view:

for example, an image can be a resource in itself when it is associated with a PID and can be referenced

as such, or it can be part of a document where it lacks an identity of its own. In addition, a reference can

point to a part of this image. An individual resource can stand alone in one environment and be treated

as part of a collection in another environment. Also, metadata descriptions describe resources, but they,

too, are a resource in different contexts. This part of ISO 24622 needs to support all such cases, and the

model needs to provide descriptions at all levels of granularity.
This part of ISO 24622 takes two types of collections into account:

a) A complex resource may have been created as a collection originally and, versioning aside, it will

exist as such in a rather static published form. Its specification will be treated as an independent

entity by the responsible archiving institution that also provides a PID for such a collection. In the

context of this part of ISO 24622, the metadata for the collection is the collection specification. The

archiving institution is responsible for maintaining the metadata representing the collection.

b) In contrast, a different type of collection is one that was not planned and designed as a collection

by its creators or by the holding archive, but achieves its status as a federated resource based on

research that needs to be verifiable. Such collections, although purposefully constructed by the

researcher, may not have any significance outside the context of the research for which they were

created. Referring from the research documents to the collection may also become tedious if the

collection contains hundreds of individual resources. It follows that there is a need to capture these

types of collection with a metadata record that is associated with all its constituent resources and

appropriate metadata, but only as the incarnation of this collection. There is no natural responsible

party to maintain this metadata record. It is unlikely that the researcher who created the “virtual”

collection (VC) has any way of consistently maintaining and curating this metadata record in the

long term. There may be special registries maintained by digital archives or publishers where

researchers can register such virtual collections.

Both types of collection are identified with the PID that refers to the collection metadata.

vi © ISO 2015 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24622-1:2018
INTERNATIONAL STANDARD ISO 24622-1:2015(E)
Language resource management — Component Metadata
Infrastructure (CMDI) —
Part 1:
The Component Metadata Model
1 Scope

The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of

interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this

model can be used to describe resources at different levels of granularity (e.g. descriptions both on the

collection level and on the level of individual resources).
2 Terms and definitions
2.1
archive
digital archive
repository (2.26) dedicated to the long-term preservation of the associated data

Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for

reliable PIDs (2.22)
2.2
cardinality
metadata component cardinality
metadata element cardinality

specification of the number of occurrences of a metadata component (2.14) or metadata element (2.12)

in an instantiation
2.3
citation

object containing information that directs a textual resource reader’s or user’s attention from one

resource to another
2.4
closed vocabulary

limited set of items that forms the mandatory value domain of a metadata element (2.12)

2.5
concept reference
concept link
reference to the definition of a concept in a concept registry (2.6)
2.6
concept registry

registry (2.25) for registering concepts enabling their identification with a unique identifier

© ISO 2015 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
2.7
collection
resource collection

grouping of multiple, different constituting elements, each of which is independent of the others and

may be accessed individually

Note 1 to entry: A collection can be a virtual collection if its constituent elements come from other different

(virtual) collections, and possibly if the elements are distributed over different repositories.

2.8
fragment identifier
identifier (2.9) used to reference a resource part (2.28) in a web context
[SOURCE: ISO 12619:2011]
2.9
identifier
digital identifier

compact sequence of characters associated with digital, non-digital, or abstract entities

[SOURCE: Adapted from ISO 12619:2011]

Note 1 to entry: Identifiers can apply to entities such as books, images, reports, metadata records, and events.

2.10
metadata record
metadata description
metadata
record (2.23) containing a description of a resource (
...

INTERNATIONAL ISO
STANDARD 24622-1
First edition
2015-02-01
Language resource management —
Component Metadata Infrastructure
(CMDI) —
Part 1:
The Component Metadata Model
Gestion des ressources langagières — Composante infrastructure de
métadonnées (CMDI) —
Partie 1: Composant modèle de métadonnées
Reference number
ISO 24622-1:2015(E)
ISO 2015
---------------------- Page: 1 ----------------------
ISO 24622-1:2015(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2015

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2015 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24622-1:2015(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Terms and definitions ..................................................................................................................................................................................... 1

3 Metadata schema availability and reuse ..................................................................................................................................... 5

3.1 Overview ...................................................................................................................................................................................................... 5

3.2 Metadata components and elements ................................................................................................................................... 5

4 Semantics in the component metadata model ...................................................................................................................... 7

4.1 Overview ...................................................................................................................................................................................................... 7

4.2 Concept registries ................................................................................................................................................................................. 8

4.3 Relation registries ................................................................................................................................................................................ 8

5 Metadata component and profile - compatibility and versioning ....................................................................9

6 Expressiveness of the component metadata model ......................................................................................................... 9

Annex A (informative) Abbreviations ...............................................................................................................................................................10

Bibliography .............................................................................................................................................................................................................................11

© ISO 2015 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24622-1:2015(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any

patent rights identified during the development of the document will be in the Introduction and/or on

the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity

assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers

to Trade (TBT), see the following URL: Foreword — Supplementary information.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.

ISO 24622 consists of the following part, under the general title Language resource management —

Component metadata infrastructure (CMDI):
— Part 1: The component metadata model
A future part will address the component metadata specific language.
iv © ISO 2015 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24622-1:2015(E)
Introduction

Component Metadata (CMD) is an approach to metadata modelling and metadata creation. It is being

increasingly used these days to enable the metadata description of different types of Language

Resources (LRs) with different metadata schemas, while still trying to maintain syntactic and semantic

interoperability.
[1]

CMD is also the core of the Component Metadata Infrastructure (CMDI) : this infrastructure contains

not only the format specifications for this metadata modelling and creation approach, but also a set of

registries and tools for metadata modelling and creation work.

The advantages of having such a unified approach to metadata descriptions for LRs, an approach that

will be usable by many projects and initiatives, are obvious: firstly, there is a better chance of obtaining

interoperability between metadata descriptions from different sources, and secondly, it will be possible

to develop and share tools that work much more efficiently in this metadata framework.

The challenge of designing and organizing a comprehensive and unified approach to metadata

description for the very varied set of LR types, and one that also can satisfy a sufficiently large section

of the LR community, should not be underestimated. The landscape of metadata for LRs has been, and

continues to be, fragmented. Until recently, it was the practice in creating the metadata descriptions for

LRs to choose a specific metadata schema from a (small) existing set derived either from widespread

[2] [3]

traditions or from other disciplines; for example, OLAC is an adapted version of DCMI, which in turn

originates in the library world. Additionally, there are, for the purposes of LR metadata description,

specifically developed metadata schemas that can be limited in application to specific types of LR (e.g.

[4]

IMDI ), or they can be of a proprietary nature (cf. the catalogues of the LR agencies such as LDC and

ELRA ). The result is a domain of LR metadata that is far from interoperable. Although some progress

has been made in developing dedicated bridges for “translating” metadata from one specific schema to

another and in providing a consolidated catalogue, this practice does not scale well since it depends on

specific translations for each pair of different metadata schemas.

For some recent projects, founding principles have included the unification and consolidation of practices

and the need to produce efficient and sufficiently specific metadata descriptions.

It follows that a number of international, European, and national projects and infrastructure initiatives

[5] [6]

such as CLARIN and META-SHARE now share the CMD approach to metadata for LRs. This

International Standard will both standardize the fundamentals of this approach in order to achieve

interoperability based on solid documentation, and foster cooperation between the various initiatives

and projects that work on, and with, this International Standard.

The model description is the first part of an infrastructure that forms a complete package for the

creation of metadata schemas. As stated in the Foreword, the complete infrastructure standard contains,

in addition to this component metadata model specification (ISO 24622-1), one or more metadata

component specification languages (planned), and a number of recommended metadata components

and profiles (planned). Since this part of ISO 24622 specifies an abstract model, we will rely mainly on

[7]
UML to describe it.
Figure 1 — Describing resources with metadata
1) Abbreviations are explained in Annex A.
2) Linguistic Data Consortium, http://www.ldc.upenn.edu/
3) European Language Resources Association, http://www.elra.info/
© ISO 2015 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO 24622-1:2015(E)

This part of ISO 24622 addresses the basic need to provide a model that makes it easy for metadata

modellers (e.g. researchers and resource description experts) to create new metadata schemas, which

can in turn be used either to describe new types of resources or to enable a more appropriate description

for resources in specific circumstances. The metadata schema is instantiated into metadata records [i.e.

the metadata descriptions that describe the actual resource(s)] (see Figure 1).

The context of this desire for flexible metadata modelling is that for scientific work there are usually

various requirements for the proper description of LRs, and these requirements can derive from the

specific needs of a project or from the facility or repository that will be used to store the resource for

future use. This variation requires a flexible framework that enables the easy creation of new metadata

schemas for different purposes, but is also a framework (i) in which the instantiations have a strictly

defined format so that at least syntactic correctness can be checked, and (ii) which provides explicit

semantics for the metadata schema elements for interpretation of the metadata record content.

The metadata descriptions generated by schemas compliant with this model will also be compliant

with other TC 37 International Standards, for example, those requiring that references to the described

[9]

resources and resource parts use ISO 24619:2011 PISA-compatible persistent identifiers (PIDs) .

The definition of a resource in this context is very broad. This part of ISO 24622 takes a pragmatic view:

for example, an image can be a resource in itself when it is associated with a PID and can be referenced

as such, or it can be part of a document where it lacks an identity of its own. In addition, a reference can

point to a part of this image. An individual resource can stand alone in one environment and be treated

as part of a collection in another environment. Also, metadata descriptions describe resources, but they,

too, are a resource in different contexts. This part of ISO 24622 needs to support all such cases, and the

model needs to provide descriptions at all levels of granularity.
This part of ISO 24622 takes two types of collections into account:

a) A complex resource may have been created as a collection originally and, versioning aside, it will

exist as such in a rather static published form. Its specification will be treated as an independent

entity by the responsible archiving institution that also provides a PID for such a collection. In the

context of this part of ISO 24622, the metadata for the collection is the collection specification. The

archiving institution is responsible for maintaining the metadata representing the collection.

b) In contrast, a different type of collection is one that was not planned and designed as a collection

by its creators or by the holding archive, but achieves its status as a federated resource based on

research that needs to be verifiable. Such collections, although purposefully constructed by the

researcher, may not have any significance outside the context of the research for which they were

created. Referring from the research documents to the collection may also become tedious if the

collection contains hundreds of individual resources. It follows that there is a need to capture these

types of collection with a metadata record that is associated with all its constituent resources and

appropriate metadata, but only as the incarnation of this collection. There is no natural responsible

party to maintain this metadata record. It is unlikely that the researcher who created the “virtual”

collection (VC) has any way of consistently maintaining and curating this metadata record in the

long term. There may be special registries maintained by digital archives or publishers where

researchers can register such virtual collections.

Both types of collection are identified with the PID that refers to the collection metadata.

vi © ISO 2015 – All rights reserved
---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO 24622-1:2015(E)
Language resource management — Component Metadata
Infrastructure (CMDI) —
Part 1:
The Component Metadata Model
1 Scope

The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of

interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this

model can be used to describe resources at different levels of granularity (e.g. descriptions both on the

collection level and on the level of individual resources).
2 Terms and definitions
2.1
archive
digital archive
repository (2.26) dedicated to the long-term preservation of the associated data

Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for

reliable PIDs (2.22)
2.2
cardinality
metadata component cardinality
metadata element cardinality

specification of the number of occurrences of a metadata component (2.14) or metadata element (2.12)

in an instantiation
2.3
citation

object containing information that directs a textual resource reader’s or user’s attention from one

resource to another
2.4
closed vocabulary

limited set of items that forms the mandatory value domain of a metadata element (2.12)

2.5
concept reference
concept link
reference to the definition of a concept in a concept registry (2.6)
2.6
concept registry

registry (2.25) for registering concepts enabling their identification with a unique identifier

© ISO 2015 – All rights reserved 1
---------------------- Page: 7 ----------------------
ISO 24622-1:2015(E)
2.7
collection
resource collection

grouping of multiple, different constituting elements, each of which is independent of the others and

may be accessed individually

Note 1 to entry: A collection can be a virtual collection if its constituent elements come from other different

(virtual) collections, and possibly if the elements are distributed over different repositories.

2.8
fragment identifier
identifier (2.9) used to reference a resource part (2.28) in a web context
[SOURCE: ISO 12619:2011]
2.9
identifier
digital identifier

compact sequence of characters associated with digital, non-digital, or abstract entities

[SOURCE: Adapted from ISO 12619:2011]

Note 1 to entry: Identifiers can apply to entities such as books, images, reports, metadata records, and events.

2.10
metadata record
metadata description
metadata
record (2.23) containing a description of a resource (2.27)
2.11
metadata schema
schema
specification of a format and structure for a metadata record (2.10)

Note 1 to entry: In the context of this part of ISO 24622, a machine-readable and verifiable format specification

usually defined by an XML schema language.
2.12
metadata element

resource property name that can be used in metadata and that can be given a value

Note 1 to entry: A metadata element is referred to as metadata attribute in other communities.

[3]
EXAMPLE The DCMI elements.
2.13
metadata set
metadata element set
collection of metadata elements (2.12) used within a part
...

SLOVENSKI STANDARD
SIST ISO 24622-1:2018
01-september-2018

Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) -

1. del: Model komponentnih metapodatkov

Language resource management -- Component Metadata Infrastructure (CMDI) -- Part 1:

The Component Metadata Model
Gestion des ressources langagières -- Composante infrastructure de métadonnées
(CMDI) -- Partie 1: Composant modèle de métadonnées
Ta slovenski standard je istoveten z: ISO 24622-1:2015
ICS:
01.140.20 Informacijske vede Information sciences
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24622-1:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24622-1:2018
---------------------- Page: 2 ----------------------
SIST ISO 24622-1:2018
INTERNATIONAL ISO
STANDARD 24622-1
First edition
2015-02-01
Language resource management —
Component Metadata Infrastructure
(CMDI) —
Part 1:
The Component Metadata Model
Gestion des ressources langagières — Composante infrastructure de
métadonnées (CMDI) —
Partie 1: Composant modèle de métadonnées
Reference number
ISO 24622-1:2015(E)
ISO 2015
---------------------- Page: 3 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2015

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2015 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Terms and definitions ..................................................................................................................................................................................... 1

3 Metadata schema availability and reuse ..................................................................................................................................... 5

3.1 Overview ...................................................................................................................................................................................................... 5

3.2 Metadata components and elements ................................................................................................................................... 5

4 Semantics in the component metadata model ...................................................................................................................... 7

4.1 Overview ...................................................................................................................................................................................................... 7

4.2 Concept registries ................................................................................................................................................................................. 8

4.3 Relation registries ................................................................................................................................................................................ 8

5 Metadata component and profile - compatibility and versioning ....................................................................9

6 Expressiveness of the component metadata model ......................................................................................................... 9

Annex A (informative) Abbreviations ...............................................................................................................................................................10

Bibliography .............................................................................................................................................................................................................................11

© ISO 2015 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any

patent rights identified during the development of the document will be in the Introduction and/or on

the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity

assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers

to Trade (TBT), see the following URL: Foreword — Supplementary information.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.

ISO 24622 consists of the following part, under the general title Language resource management —

Component metadata infrastructure (CMDI):
— Part 1: The component metadata model
A future part will address the component metadata specific language.
iv © ISO 2015 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
Introduction

Component Metadata (CMD) is an approach to metadata modelling and metadata creation. It is being

increasingly used these days to enable the metadata description of different types of Language

Resources (LRs) with different metadata schemas, while still trying to maintain syntactic and semantic

interoperability.
[1]

CMD is also the core of the Component Metadata Infrastructure (CMDI) : this infrastructure contains

not only the format specifications for this metadata modelling and creation approach, but also a set of

registries and tools for metadata modelling and creation work.

The advantages of having such a unified approach to metadata descriptions for LRs, an approach that

will be usable by many projects and initiatives, are obvious: firstly, there is a better chance of obtaining

interoperability between metadata descriptions from different sources, and secondly, it will be possible

to develop and share tools that work much more efficiently in this metadata framework.

The challenge of designing and organizing a comprehensive and unified approach to metadata

description for the very varied set of LR types, and one that also can satisfy a sufficiently large section

of the LR community, should not be underestimated. The landscape of metadata for LRs has been, and

continues to be, fragmented. Until recently, it was the practice in creating the metadata descriptions for

LRs to choose a specific metadata schema from a (small) existing set derived either from widespread

[2] [3]

traditions or from other disciplines; for example, OLAC is an adapted version of DCMI, which in turn

originates in the library world. Additionally, there are, for the purposes of LR metadata description,

specifically developed metadata schemas that can be limited in application to specific types of LR (e.g.

[4]

IMDI ), or they can be of a proprietary nature (cf. the catalogues of the LR agencies such as LDC and

ELRA ). The result is a domain of LR metadata that is far from interoperable. Although some progress

has been made in developing dedicated bridges for “translating” metadata from one specific schema to

another and in providing a consolidated catalogue, this practice does not scale well since it depends on

specific translations for each pair of different metadata schemas.

For some recent projects, founding principles have included the unification and consolidation of practices

and the need to produce efficient and sufficiently specific metadata descriptions.

It follows that a number of international, European, and national projects and infrastructure initiatives

[5] [6]

such as CLARIN and META-SHARE now share the CMD approach to metadata for LRs. This

International Standard will both standardize the fundamentals of this approach in order to achieve

interoperability based on solid documentation, and foster cooperation between the various initiatives

and projects that work on, and with, this International Standard.

The model description is the first part of an infrastructure that forms a complete package for the

creation of metadata schemas. As stated in the Foreword, the complete infrastructure standard contains,

in addition to this component metadata model specification (ISO 24622-1), one or more metadata

component specification languages (planned), and a number of recommended metadata components

and profiles (planned). Since this part of ISO 24622 specifies an abstract model, we will rely mainly on

[7]
UML to describe it.
Figure 1 — Describing resources with metadata
1) Abbreviations are explained in Annex A.
2) Linguistic Data Consortium, http://www.ldc.upenn.edu/
3) European Language Resources Association, http://www.elra.info/
© ISO 2015 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

This part of ISO 24622 addresses the basic need to provide a model that makes it easy for metadata

modellers (e.g. researchers and resource description experts) to create new metadata schemas, which

can in turn be used either to describe new types of resources or to enable a more appropriate description

for resources in specific circumstances. The metadata schema is instantiated into metadata records [i.e.

the metadata descriptions that describe the actual resource(s)] (see Figure 1).

The context of this desire for flexible metadata modelling is that for scientific work there are usually

various requirements for the proper description of LRs, and these requirements can derive from the

specific needs of a project or from the facility or repository that will be used to store the resource for

future use. This variation requires a flexible framework that enables the easy creation of new metadata

schemas for different purposes, but is also a framework (i) in which the instantiations have a strictly

defined format so that at least syntactic correctness can be checked, and (ii) which provides explicit

semantics for the metadata schema elements for interpretation of the metadata record content.

The metadata descriptions generated by schemas compliant with this model will also be compliant

with other TC 37 International Standards, for example, those requiring that references to the described

[9]

resources and resource parts use ISO 24619:2011 PISA-compatible persistent identifiers (PIDs) .

The definition of a resource in this context is very broad. This part of ISO 24622 takes a pragmatic view:

for example, an image can be a resource in itself when it is associated with a PID and can be referenced

as such, or it can be part of a document where it lacks an identity of its own. In addition, a reference can

point to a part of this image. An individual resource can stand alone in one environment and be treated

as part of a collection in another environment. Also, metadata descriptions describe resources, but they,

too, are a resource in different contexts. This part of ISO 24622 needs to support all such cases, and the

model needs to provide descriptions at all levels of granularity.
This part of ISO 24622 takes two types of collections into account:

a) A complex resource may have been created as a collection originally and, versioning aside, it will

exist as such in a rather static published form. Its specification will be treated as an independent

entity by the responsible archiving institution that also provides a PID for such a collection. In the

context of this part of ISO 24622, the metadata for the collection is the collection specification. The

archiving institution is responsible for maintaining the metadata representing the collection.

b) In contrast, a different type of collection is one that was not planned and designed as a collection

by its creators or by the holding archive, but achieves its status as a federated resource based on

research that needs to be verifiable. Such collections, although purposefully constructed by the

researcher, may not have any significance outside the context of the research for which they were

created. Referring from the research documents to the collection may also become tedious if the

collection contains hundreds of individual resources. It follows that there is a need to capture these

types of collection with a metadata record that is associated with all its constituent resources and

appropriate metadata, but only as the incarnation of this collection. There is no natural responsible

party to maintain this metadata record. It is unlikely that the researcher who created the “virtual”

collection (VC) has any way of consistently maintaining and curating this metadata record in the

long term. There may be special registries maintained by digital archives or publishers where

researchers can register such virtual collections.

Both types of collection are identified with the PID that refers to the collection metadata.

vi © ISO 2015 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24622-1:2018
INTERNATIONAL STANDARD ISO 24622-1:2015(E)
Language resource management — Component Metadata
Infrastructure (CMDI) —
Part 1:
The Component Metadata Model
1 Scope

The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of

interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this

model can be used to describe resources at different levels of granularity (e.g. descriptions both on the

collection level and on the level of individual resources).
2 Terms and definitions
2.1
archive
digital archive
repository (2.26) dedicated to the long-term preservation of the associated data

Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for

reliable PIDs (2.22)
2.2
cardinality
metadata component cardinality
metadata element cardinality

specification of the number of occurrences of a metadata component (2.14) or metadata element (2.12)

in an instantiation
2.3
citation

object containing information that directs a textual resource reader’s or user’s attention from one

resource to another
2.4
closed vocabulary

limited set of items that forms the mandatory value domain of a metadata element (2.12)

2.5
concept reference
concept link
reference to the definition of a concept in a concept registry (2.6)
2.6
concept registry

registry (2.25) for registering concepts enabling their identification with a unique identifier

© ISO 2015 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)
2.7
collection
resource collection

grouping of multiple, different constituting elements, each of which is independent of the others and

may be accessed individually

Note 1 to entry: A collection can be a virtual collection if its constituent elements come from other different

(virtual) collections, and possibly if the elements are distributed over different repositories.

2.8
fragment identifier
identifier (2.9) used to reference a resource part (2.28) in a web context
[SOURCE: ISO 12619:2011]
2.9
identifier
digital identifier

compact sequence of characters associated with digital, non-digital, or abstract entities

[SOURCE: Adapted from ISO 12619:2011]

Note 1 to entry: Identifiers can apply to entities such as books, images, reports, metadata records, and events.

2.10
metadata record
metadata description
metadata
record (2.23) containing a description of a
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.