Language resources management -- Multilingual information framework

ISO 24616:2012 provides a generic platform for modeling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modeling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).

Gestion des ressources langagières -- Plateforme d'informations multilingues

Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije

Ta mednarodni standard zagotavlja splošno platformo za modeliranje večjezikovnih informacij in upravljanje z njimi na različnih področjih: lokalizacija, prevajanje, multimedijsko označevanje, upravljanje z dokumenti, podpora digitalni knjižnici in aplikacije za modeliranje poslovanja. MLIF (ogrodje za večjezične informacije) zagotavlja metamodel in sklop splošnih podatkovnih kategorij [ISO 12620:2009] za različna področja uporabe. MLIF zagotavlja tudi strategije za interoperabilnost in/ali povezovanje modelov, med drugim XLIFF, TMX, smilText in ITS.

General Information

Status
Published
Publication Date
03-Sep-2012
Current Stage
6060 - International Standard published
Start Date
28-Aug-2012
Completion Date
04-Sep-2012

Buy Standard

Standard
ISO 24616:2013 - BARVE na PDF-str 8,19
English language
46 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Standard
ISO 24616:2012 - Language resources management -- Multilingual information framework
English language
42 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24616:2013
English language
46 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day

Standards Content (sample)

SLOVENSKI STANDARD
SIST ISO 24616:2013
01-julij-2013
Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije
Language resources management -- Multilingual information framework
Gestion des ressources langagières -- Plateforme d'informations multilingues
Ta slovenski standard je istoveten z: ISO 24616:2012
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24616:2013 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24616:2013
---------------------- Page: 2 ----------------------
SIST ISO 24616:2013
INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues
Reference number
ISO 24616:2012(E)
ISO 2012
---------------------- Page: 3 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2012

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,

electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or

ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2012 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Contents Page

Foreword ............................................................................................................................................................ iv

1  Scope ...................................................................................................................................................... 1

2  Normative references ............................................................................................................................ 1

3  Terms and definitions ........................................................................................................................... 1

4  Specification principles ........................................................................................................................ 2

4.1  Key standard used in the specification: Unified Modeling Language (UML) .................................. 2

4.2  Metamodel and adornment ................................................................................................................... 2

4.3  XML serialization ................................................................................................................................... 2

5  Metamodel specification ....................................................................................................................... 2

6  MLIF compliance ................................................................................................................................... 3

7  Metamodel adornment .......................................................................................................................... 3

7.1  Introduction ............................................................................................................................................ 3

7.2  General principles concerning the use of W3C generic attributes .................................................. 3

7.3  Recommended adornment for GI ........................................................................................................ 4

7.4  Recommended adornment for GroupC ............................................................................................... 4

7.5  Recommended adornment for MultiC ................................................................................................. 4

7.6  Recommended and mandatory adornment for MonoC ..................................................................... 5

7.7  Recommended adornment for SegC ................................................................................................... 5

7.8  Recommended adornment for HistoC ................................................................................................. 5

7.9  Recommended online annotation adornment .................................................................................... 5

7.10  Recommended adornment for localization......................................................................................... 6

7.11  Recommended adornment for internationalization ........................................................................... 6

7.12  Recommended adornment for temporal synchronization ................................................................ 6

8  Relation with other standards .............................................................................................................. 6

Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) ............................. 8

Annex B (informative) Example: representing TMX data .............................................................................. 11

Annex C (informative) Example of XLIFF data representation ..................................................................... 14

Annex D (informative) Example: representing smilText data ....................................................................... 18

Annex E (informative) Example of MLIF usage for subtitles (captioning) .................................................. 20

Annex F (informative) Using MLIF for MAF data ............................................................................................ 26

Annex G (normative) Detailed specification .................................................................................................. 27

Bibliography ...................................................................................................................................................... 42

© ISO 2012 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies

(ISO member bodies). The work of preparing International Standards is normally carried out through ISO

technical committees. Each member body interested in a subject for which a technical committee has been

established has the right to be represented on that committee. International organizations, governmental and

non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the

International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of technical committees is to prepare International Standards. Draft International Standards

adopted by the technical committees are circulated to the member bodies for voting. Publication as an

International Standard requires approval by at least 75 % of the member bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent

rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
iv © ISO 2012 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24616:2013
INTERNATIONAL STANDARD ISO 24616:2012(E)
Language resources management — Multilingual information
framework
1 Scope

This International Standard provides a generic platform for modelling and managing multilingual information in

various domains: localization, translation, multimedia annotation, document management, digital library

support, and information or business modelling applications. MLIF (multilingual information framework)

provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.

MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,

XLIFF, TMX, smilText and ITS.
2 Normative references

The following referenced documents are indispensable for the application of this document. For dated

references, only the edition cited applies. For undated references, the latest edition of the referenced

document (including any amendments) applies.

ISO 12620:2009; Terminology and other language and content resources — Specification of data categories

and management of a Data Category Registry for language resources

ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)

Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau

Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document

Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).

3.3
subtitle

textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom

of the screen
3.4
working language
language in which linguistic sequences are expressed
© ISO 2012 – All rights reserved 1
---------------------- Page: 7 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)

The MLIF specification complies with the modelling principles of UML as defined by the Object Management

Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.

4.2 Metamodel and adornment

In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that

is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization

Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML

serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.

5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

The MLIF metamodel is defined by the following seven "core components". These components are listed as

follows, according to their XML serialization:

 (Multilingual Data Collection), which represents a collection of data containing global information

and several multilingual units;

 (Global Information), which represents technical and administrative information applying to the

entire multilingual data collection;

 (Grouping components), which represents a sub-collection of multilingual data that have a

common origin or purpose within a given project;

 (Multilingual Component), which groups together all variants of a given textual content;

 (Monolingual Component), which groups together information related to one language and is

part of a multilingual component (MultiC);

 (History Component), which traces modifications to the component to which it is anchored (i.e.

versioning);

 (Segmentation Component), which allows any level of segmentation for textual information,

possibly in a recursive manner.
6 MLIF compliance

Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:

 by fully implementing the MLIF metamodel starting at the level of ;

 by specifically embedding MLIF-compliant information within another model, by implementing one of the

lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction

The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the

following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI

guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the

convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes

are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The

specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:

 the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working

language of any relevant element and, in particular, shall be used systematically for any implementation

of MonoC;

 the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier

to an element of the MLIF metamodel.
© ISO 2012 – All rights reserved 3
---------------------- Page: 9 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
7.3 Recommended adornment for GI















7.4 Recommended adornment for GroupC

7.5 Recommended adornment for MultiC









4 © ISO 2012 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
7.6 Recommended and mandatory adornment for MonoC
 att.lang


 att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC







 att.linguistic
 att.xlink
7.8 Recommended adornment for HistoC

The HistoC component is a generic component that traces modifications made on the component to which it is

anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be

anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or

enhancements to, the component to be recorded.
HistoC may be adorned by four elements:




7.9 Recommended online annotation adornment

Multilingual text documents are often only one stage in a complex workflow that involves external document

sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the

presentational features that have to be retained in a translated target document. To this end, MLIF-compliant

applications should use the following elements, in relation to the element, that map onto similar

subsets in TMX and XLIFF:
© ISO 2012 – All rights reserved 5
---------------------- Page: 11 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)





7.10 Recommended adornment for localization

All the following elements should be used to provide localization-related information:



7.11 Recommended adornment for internationalization

7.12 Recommended adornment for temporal synchronization

The following elements should be used when textual content has to be conveyed (in written or spoken form)

together with some constraints:



8 Relation with other standards

As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a

metamodel that combines with selected data categories as a way of ensuring interoperability between several

multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the

translation relations between them. In each domain where MLIF is applicable, a specific granularity may be

considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF

[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description

respectively.

MLIF supports the construction and the interoperability of localization and translation memories resources,

and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed

list of description features. Rather, it provides a list of data categories that is much easier to update and

extend. This list represents a point of reference for multilingual information in the context of various application

scenarios.

However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word

and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and

section). In addition, MLIF allows for external and internal links (annotations and references).

MLIF is designed to provide a common framework that facilitates the interoperability with formats such as

TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them

6 © ISO 2012 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated

and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
© ISO 2012 – All rights reserved 7
---------------------- Page: 13 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex A
(informative)
Example using MLIF for Computer-Assisted Translation (CAT)

The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on

translation memory to produce translations of new words and sentences that are not in the translation

database.

For example, using a translation memory that contains the English sentence "The meal is nice." and its

translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench

are not able to provide the predicted translation for the sentence "The meals are nice." even though the word

lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that

these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:

creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>



The meal is nice.


Le repas est bon.




To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following

procedure:

Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.

Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word

categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.

SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is

given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this

product.
8 © ISO 2012 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the

lemma and morphological features.

Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French

inflected form as follows:
"The meals are nice." => "Les repas sont bons."

The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word

segmentation and tagset defined in MAF:










SEMMAR
20090922T140653Z

The meal is nice.


Le repas est bon.




The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.


class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
© ISO 2012 – All rights reserved 9
---------------------- Page: 15 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.




10 © ISO 2012 – All rights reserved
---------------------- Page: 16 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex B
(informative)
Example: representing TMX data
B.1 Introduction

TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of

Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The

purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation

vendors with little or no loss of critical data during the process. TMX, which has been on the market since

1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for

Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF

TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to

MLIF as follows:
 maps onto the element;
maps onto the element;

 is a container for the element and maps onto the element;

 maps onto the element;
 maps onto the element;
 maps onto the element;
 of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
 The "creationtool" attribute maps onto the element;
 The "creationdate" attribute maps onto the element;
 The "tuid" attribute maps onto the element within MultiC.

 The element does not map onto any specific element as it represents a generic placeholder for

application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF

elements or onto a standardized ISO/TC 37 data category as available from ISOCat.

© ISO 2012 – All rights reserved 11
---------------------- Page: 17 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
B.3 Example of data

The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and

does not translate all the details of the header.

adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>



Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.


His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.


El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.


Il suo metodo di controllo di qualità in 10 fasi risale a più

di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.


그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.




The corresponding representation in MLIF default representation is as follows:


TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1



12 © ISO 2012 – All rights reserved
---------------------- Page: 18 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
1091303313515
20020930T004233Z

Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.


His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.




B.4 Example of TMX and MLIF interaction

Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of

extraction, translation and merging. The process begins with a TMX document containing linguistic content in

English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM

formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic

information is stored. As most translators (human beings or automatic software modules) work with TMX

software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX

document. This file does not contain any formatting information. Once the translator has added the

appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF

document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the

“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
© ISO 2012 – All rights reserved 13
---------------------- Page: 19 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex C
(informative)
Example of XLIFF data representation
C.1 Introduction

The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of

localizable software- and document-based objects and related metadata.
C.2 Ma
...

INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues
Reference number
ISO 24616:2012(E)
ISO 2012
---------------------- Page: 1 ----------------------
ISO 24616:2012(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2012

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,

electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or

ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2012 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24616:2012(E)
Contents Page

Foreword ............................................................................................................................................................ iv

1  Scope ...................................................................................................................................................... 1

2  Normative references ............................................................................................................................ 1

3  Terms and definitions ........................................................................................................................... 1

4  Specification principles ........................................................................................................................ 2

4.1  Key standard used in the specification: Unified Modeling Language (UML) .................................. 2

4.2  Metamodel and adornment ................................................................................................................... 2

4.3  XML serialization ................................................................................................................................... 2

5  Metamodel specification ....................................................................................................................... 2

6  MLIF compliance ................................................................................................................................... 3

7  Metamodel adornment .......................................................................................................................... 3

7.1  Introduction ............................................................................................................................................ 3

7.2  General principles concerning the use of W3C generic attributes .................................................. 3

7.3  Recommended adornment for GI ........................................................................................................ 4

7.4  Recommended adornment for GroupC ............................................................................................... 4

7.5  Recommended adornment for MultiC ................................................................................................. 4

7.6  Recommended and mandatory adornment for MonoC ..................................................................... 5

7.7  Recommended adornment for SegC ................................................................................................... 5

7.8  Recommended adornment for HistoC ................................................................................................. 5

7.9  Recommended online annotation adornment .................................................................................... 5

7.10  Recommended adornment for localization......................................................................................... 6

7.11  Recommended adornment for internationalization ........................................................................... 6

7.12  Recommended adornment for temporal synchronization ................................................................ 6

8  Relation with other standards .............................................................................................................. 6

Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) ............................. 8

Annex B (informative) Example: representing TMX data .............................................................................. 11

Annex C (informative) Example of XLIFF data representation ..................................................................... 14

Annex D (informative) Example: representing smilText data ....................................................................... 18

Annex E (informative) Example of MLIF usage for subtitles (captioning) .................................................. 20

Annex F (informative) Using MLIF for MAF data ............................................................................................ 26

Annex G (normative) Detailed specification .................................................................................................. 27

Bibliography ...................................................................................................................................................... 42

© ISO 2012 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24616:2012(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies

(ISO member bodies). The work of preparing International Standards is normally carried out through ISO

technical committees. Each member body interested in a subject for which a technical committee has been

established has the right to be represented on that committee. International organizations, governmental and

non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the

International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of technical committees is to prepare International Standards. Draft International Standards

adopted by the technical committees are circulated to the member bodies for voting. Publication as an

International Standard requires approval by at least 75 % of the member bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent

rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
iv © ISO 2012 – All rights reserved
---------------------- Page: 4 ----------------------
INTERNATIONAL STANDARD ISO 24616:2012(E)
Language resources management — Multilingual information
framework
1 Scope

This International Standard provides a generic platform for modelling and managing multilingual information in

various domains: localization, translation, multimedia annotation, document management, digital library

support, and information or business modelling applications. MLIF (multilingual information framework)

provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.

MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,

XLIFF, TMX, smilText and ITS.
2 Normative references

The following referenced documents are indispensable for the application of this document. For dated

references, only the edition cited applies. For undated references, the latest edition of the referenced

document (including any amendments) applies.

ISO 12620:2009; Terminology and other language and content resources — Specification of data categories

and management of a Data Category Registry for language resources

ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)

Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau

Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document

Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).

3.3
subtitle

textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom

of the screen
3.4
working language
language in which linguistic sequences are expressed
© ISO 2012 – All rights reserved 1
---------------------- Page: 5 ----------------------
ISO 24616:2012(E)
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)

The MLIF specification complies with the modelling principles of UML as defined by the Object Management

Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.

4.2 Metamodel and adornment

In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that

is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization

Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML

serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.

5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved
---------------------- Page: 6 ----------------------
ISO 24616:2012(E)

The MLIF metamodel is defined by the following seven "core components". These components are listed as

follows, according to their XML serialization:

 (Multilingual Data Collection), which represents a collection of data containing global information

and several multilingual units;

 (Global Information), which represents technical and administrative information applying to the

entire multilingual data collection;

 (Grouping components), which represents a sub-collection of multilingual data that have a

common origin or purpose within a given project;

 (Multilingual Component), which groups together all variants of a given textual content;

 (Monolingual Component), which groups together information related to one language and is

part of a multilingual component (MultiC);

 (History Component), which traces modifications to the component to which it is anchored (i.e.

versioning);

 (Segmentation Component), which allows any level of segmentation for textual information,

possibly in a recursive manner.
6 MLIF compliance

Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:

 by fully implementing the MLIF metamodel starting at the level of ;

 by specifically embedding MLIF-compliant information within another model, by implementing one of the

lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction

The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the

following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI

guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the

convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes

are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The

specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:

 the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working

language of any relevant element and, in particular, shall be used systematically for any implementation

of MonoC;

 the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier

to an element of the MLIF metamodel.
© ISO 2012 – All rights reserved 3
---------------------- Page: 7 ----------------------
ISO 24616:2012(E)
7.3 Recommended adornment for GI















7.4 Recommended adornment for GroupC

7.5 Recommended adornment for MultiC









4 © ISO 2012 – All rights reserved
---------------------- Page: 8 ----------------------
ISO 24616:2012(E)
7.6 Recommended and mandatory adornment for MonoC
 att.lang


 att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC







 att.linguistic
 att.xlink
7.8 Recommended adornment for HistoC

The HistoC component is a generic component that traces modifications made on the component to which it is

anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be

anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or

enhancements to, the component to be recorded.
HistoC may be adorned by four elements:




7.9 Recommended online annotation adornment

Multilingual text documents are often only one stage in a complex workflow that involves external document

sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the

presentational features that have to be retained in a translated target document. To this end, MLIF-compliant

applications should use the following elements, in relation to the element, that map onto similar

subsets in TMX and XLIFF:
© ISO 2012 – All rights reserved 5
---------------------- Page: 9 ----------------------
ISO 24616:2012(E)





7.10 Recommended adornment for localization

All the following elements should be used to provide localization-related information:



7.11 Recommended adornment for internationalization

7.12 Recommended adornment for temporal synchronization

The following elements should be used when textual content has to be conveyed (in written or spoken form)

together with some constraints:



8 Relation with other standards

As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a

metamodel that combines with selected data categories as a way of ensuring interoperability between several

multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the

translation relations between them. In each domain where MLIF is applicable, a specific granularity may be

considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF

[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description

respectively.

MLIF supports the construction and the interoperability of localization and translation memories resources,

and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed

list of description features. Rather, it provides a list of data categories that is much easier to update and

extend. This list represents a point of reference for multilingual information in the context of various application

scenarios.

However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word

and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and

section). In addition, MLIF allows for external and internal links (annotations and references).

MLIF is designed to provide a common framework that facilitates the interoperability with formats such as

TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them

6 © ISO 2012 – All rights reserved
---------------------- Page: 10 ----------------------
ISO 24616:2012(E)

deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated

and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
© ISO 2012 – All rights reserved 7
---------------------- Page: 11 ----------------------
ISO 24616:2012(E)
Annex A
(informative)
Example using MLIF for Computer-Assisted Translation (CAT)

The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on

translation memory to produce translations of new words and sentences that are not in the translation

database.

For example, using a translation memory that contains the English sentence "The meal is nice." and its

translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench

are not able to provide the predicted translation for the sentence "The meals are nice." even though the word

lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that

these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:

creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>



The meal is nice.


Le repas est bon.




To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following

procedure:

Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.

Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word

categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.

SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is

given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this

product.
8 © ISO 2012 – All rights reserved
---------------------- Page: 12 ----------------------
ISO 24616:2012(E)

Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the

lemma and morphological features.

Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French

inflected form as follows:
"The meals are nice." => "Les repas sont bons."

The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word

segmentation and tagset defined in MAF:










SEMMAR
20090922T140653Z

The meal is nice.


Le repas est bon.




The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.


class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
© ISO 2012 – All rights reserved 9
---------------------- Page: 13 ----------------------
ISO 24616:2012(E)
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.




10 © ISO 2012 – All rights reserved
---------------------- Page: 14 ----------------------
ISO 24616:2012(E)
Annex B
(informative)
Example: representing TMX data
B.1 Introduction

TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of

Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The

purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation

vendors with little or no loss of critical data during the process. TMX, which has been on the market since

1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for

Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF

TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to

MLIF as follows:
 maps onto the element;
maps onto the element;

 is a container for the element and maps onto the element;

 maps onto the element;
 maps onto the element;
 maps onto the element;
 of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
 The "creationtool" attribute maps onto the element;
 The "creationdate" attribute maps onto the element;
 The "tuid" attribute maps onto the element within MultiC.

 The element does not map onto any specific element as it represents a generic placeholder for

application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF

elements or onto a standardized ISO/TC 37 data category as available from ISOCat.

© ISO 2012 – All rights reserved 11
---------------------- Page: 15 ----------------------
ISO 24616:2012(E)
B.3 Example of data

The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and

does not translate all the details of the header.

adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>



Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.


His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.


El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.


Il suo metodo di controllo di qualità in 10 fasi risale a più

di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.


그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.




The corresponding representation in MLIF default representation is as follows:


TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1



12 © ISO 2012 – All rights reserved
---------------------- Page: 16 ----------------------
ISO 24616:2012(E)
1091303313515
20020930T004233Z

Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.


His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.




B.4 Example of TMX and MLIF interaction

Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of

extraction, translation and merging. The process begins with a TMX document containing linguistic content in

English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM

formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic

information is stored. As most translators (human beings or automatic software modules) work with TMX

software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX

document. This file does not contain any formatting information. Once the translator has added the

appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF

document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the

“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
© ISO 2012 – All rights reserved 13
---------------------- Page: 17 ----------------------
ISO 24616:2012(E)
Annex C
(informative)
Example of XLIFF data representation
C.1 Introduction

The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of

localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF

XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language

for monolingual information. This is handled through the appropriate use of the data

category in together with the language declarations ( and ) in

.
The core elements of the XLIFF macro-structure map to MLIF as follows:
 maps onto the element;
maps onto the element;

 is a container for the element and maps onto the element;

 the element maps onto the element;
 maps onto the element;

maps onto the element and simultaneously sets the value of the

element to . The corresponding textual content is placed in a element;

 maps onto the element and simultaneously sets the value of the

element to . The corresponding textual content is placed in a element;

 maps onto the element and simultaneously sets the value of the

element to alternate.
XLIFF further elements and attrib
...

SLOVENSKI STANDARD
SIST ISO 24616:2013
01-julij-2013
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DYHþMH]LþQHLQIRUPDFLMH
Language resources management -- Multilingual information framework
Gestion des ressources langagières -- Plateforme d'informations multilingues
Ta slovenski standard je istoveten z: ISO 24616:2012
ICS:
01.020 7HUPLQRORJLMD QDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
SIST ISO 24616:2013 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24616:2013
---------------------- Page: 2 ----------------------
SIST ISO 24616:2013
INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues
Reference number
ISO 24616:2012(E)
ISO 2012
---------------------- Page: 3 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2012

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,

electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or

ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2012 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Contents Page

Foreword ............................................................................................................................................................ iv

1  Scope ...................................................................................................................................................... 1

2  Normative references ............................................................................................................................ 1

3  Terms and definitions ........................................................................................................................... 1

4  Specification principles ........................................................................................................................ 2

4.1  Key standard used in the specification: Unified Modeling Language (UML) .................................. 2

4.2  Metamodel and adornment ................................................................................................................... 2

4.3  XML serialization ................................................................................................................................... 2

5  Metamodel specification ....................................................................................................................... 2

6  MLIF compliance ................................................................................................................................... 3

7  Metamodel adornment .......................................................................................................................... 3

7.1  Introduction ............................................................................................................................................ 3

7.2  General principles concerning the use of W3C generic attributes .................................................. 3

7.3  Recommended adornment for GI ........................................................................................................ 4

7.4  Recommended adornment for GroupC ............................................................................................... 4

7.5  Recommended adornment for MultiC ................................................................................................. 4

7.6  Recommended and mandatory adornment for MonoC ..................................................................... 5

7.7  Recommended adornment for SegC ................................................................................................... 5

7.8  Recommended adornment for HistoC ................................................................................................. 5

7.9  Recommended online annotation adornment .................................................................................... 5

7.10  Recommended adornment for localization......................................................................................... 6

7.11  Recommended adornment for internationalization ........................................................................... 6

7.12  Recommended adornment for temporal synchronization ................................................................ 6

8  Relation with other standards .............................................................................................................. 6

Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) ............................. 8

Annex B (informative) Example: representing TMX data .............................................................................. 11

Annex C (informative) Example of XLIFF data representation ..................................................................... 14

Annex D (informative) Example: representing smilText data ....................................................................... 18

Annex E (informative) Example of MLIF usage for subtitles (captioning) .................................................. 20

Annex F (informative) Using MLIF for MAF data ............................................................................................ 26

Annex G (normative) Detailed specification .................................................................................................. 27

Bibliography ...................................................................................................................................................... 42

© ISO 2012 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies

(ISO member bodies). The work of preparing International Standards is normally carried out through ISO

technical committees. Each member body interested in a subject for which a technical committee has been

established has the right to be represented on that committee. International organizations, governmental and

non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the

International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of technical committees is to prepare International Standards. Draft International Standards

adopted by the technical committees are circulated to the member bodies for voting. Publication as an

International Standard requires approval by at least 75 % of the member bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent

rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
iv © ISO 2012 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24616:2013
INTERNATIONAL STANDARD ISO 24616:2012(E)
Language resources management — Multilingual information
framework
1 Scope

This International Standard provides a generic platform for modelling and managing multilingual information in

various domains: localization, translation, multimedia annotation, document management, digital library

support, and information or business modelling applications. MLIF (multilingual information framework)

provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.

MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,

XLIFF, TMX, smilText and ITS.
2 Normative references

The following referenced documents are indispensable for the application of this document. For dated

references, only the edition cited applies. For undated references, the latest edition of the referenced

document (including any amendments) applies.

ISO 12620:2009; Terminology and other language and content resources — Specification of data categories

and management of a Data Category Registry for language resources

ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)

Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau

Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document

Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).

3.3
subtitle

textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom

of the screen
3.4
working language
language in which linguistic sequences are expressed
© ISO 2012 – All rights reserved 1
---------------------- Page: 7 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)

The MLIF specification complies with the modelling principles of UML as defined by the Object Management

Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.

4.2 Metamodel and adornment

In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that

is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization

Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML

serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.

5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

The MLIF metamodel is defined by the following seven "core components". These components are listed as

follows, according to their XML serialization:

 (Multilingual Data Collection), which represents a collection of data containing global information

and several multilingual units;

 (Global Information), which represents technical and administrative information applying to the

entire multilingual data collection;

 (Grouping components), which represents a sub-collection of multilingual data that have a

common origin or purpose within a given project;

 (Multilingual Component), which groups together all variants of a given textual content;

 (Monolingual Component), which groups together information related to one language and is

part of a multilingual component (MultiC);

 (History Component), which traces modifications to the component to which it is anchored (i.e.

versioning);

 (Segmentation Component), which allows any level of segmentation for textual information,

possibly in a recursive manner.
6 MLIF compliance

Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:

 by fully implementing the MLIF metamodel starting at the level of ;

 by specifically embedding MLIF-compliant information within another model, by implementing one of the

lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction

The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the

following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI

guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the

convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes

are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The

specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:

 the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working

language of any relevant element and, in particular, shall be used systematically for any implementation

of MonoC;

 the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier

to an element of the MLIF metamodel.
© ISO 2012 – All rights reserved 3
---------------------- Page: 9 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
7.3 Recommended adornment for GI















7.4 Recommended adornment for GroupC

7.5 Recommended adornment for MultiC









4 © ISO 2012 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
7.6 Recommended and mandatory adornment for MonoC
 att.lang


 att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC







 att.linguistic
 att.xlink
7.8 Recommended adornment for HistoC

The HistoC component is a generic component that traces modifications made on the component to which it is

anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be

anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or

enhancements to, the component to be recorded.
HistoC may be adorned by four elements:




7.9 Recommended online annotation adornment

Multilingual text documents are often only one stage in a complex workflow that involves external document

sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the

presentational features that have to be retained in a translated target document. To this end, MLIF-compliant

applications should use the following elements, in relation to the element, that map onto similar

subsets in TMX and XLIFF:
© ISO 2012 – All rights reserved 5
---------------------- Page: 11 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)





7.10 Recommended adornment for localization

All the following elements should be used to provide localization-related information:



7.11 Recommended adornment for internationalization

7.12 Recommended adornment for temporal synchronization

The following elements should be used when textual content has to be conveyed (in written or spoken form)

together with some constraints:



8 Relation with other standards

As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a

metamodel that combines with selected data categories as a way of ensuring interoperability between several

multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the

translation relations between them. In each domain where MLIF is applicable, a specific granularity may be

considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF

[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description

respectively.

MLIF supports the construction and the interoperability of localization and translation memories resources,

and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed

list of description features. Rather, it provides a list of data categories that is much easier to update and

extend. This list represents a point of reference for multilingual information in the context of various application

scenarios.

However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word

and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and

section). In addition, MLIF allows for external and internal links (annotations and references).

MLIF is designed to provide a common framework that facilitates the interoperability with formats such as

TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them

6 © ISO 2012 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated

and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
© ISO 2012 – All rights reserved 7
---------------------- Page: 13 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex A
(informative)
Example using MLIF for Computer-Assisted Translation (CAT)

The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on

translation memory to produce translations of new words and sentences that are not in the translation

database.

For example, using a translation memory that contains the English sentence "The meal is nice." and its

translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench

are not able to provide the predicted translation for the sentence "The meals are nice." even though the word

lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that

these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:

creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>



The meal is nice.


Le repas est bon.




To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following

procedure:

Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.

Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word

categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.

SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is

given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this

product.
8 © ISO 2012 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)

Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the

lemma and morphological features.

Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French

inflected form as follows:
"The meals are nice." => "Les repas sont bons."

The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word

segmentation and tagset defined in MAF:










SEMMAR
20090922T140653Z

The meal is nice.


Le repas est bon.




The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.


class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
© ISO 2012 – All rights reserved 9
---------------------- Page: 15 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.




10 © ISO 2012 – All rights reserved
---------------------- Page: 16 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex B
(informative)
Example: representing TMX data
B.1 Introduction

TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of

Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The

purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation

vendors with little or no loss of critical data during the process. TMX, which has been on the market since

1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for

Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF

TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to

MLIF as follows:
 maps onto the element;
maps onto the element;

 is a container for the element and maps onto the element;

 maps onto the element;
 maps onto the element;
 maps onto the element;
 of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
 The "creationtool" attribute maps onto the element;
 The "creationdate" attribute maps onto the element;
 The "tuid" attribute maps onto the element within MultiC.

 The element does not map onto any specific element as it represents a generic placeholder for

application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF

elements or onto a standardized ISO/TC 37 data category as available from ISOCat.

© ISO 2012 – All rights reserved 11
---------------------- Page: 17 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
B.3 Example of data

The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and

does not translate all the details of the header.

adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>



Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.


His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.


El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.


Il suo metodo di controllo di qualità in 10 fasi risale a più

di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.


그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.




The corresponding representation in MLIF default representation is as follows:


TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1



12 © ISO 2012 – All rights reserved
---------------------- Page: 18 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
1091303313515
20020930T004233Z

Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.


His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.




B.4 Example of TMX and MLIF interaction

Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of

extraction, translation and merging. The process begins with a TMX document containing linguistic content in

English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM

formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic

information is stored. As most translators (human beings or automatic software modules) work with TMX

software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX

document. This file does not contain any formatting information. Once the translator has added the

appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF

document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the

“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
© ISO 2012 – All rights reserved 13
---------------------- Page: 19 ----------------------
SIST ISO 24616:2013
ISO 24616:2012(E)
Annex C
(informative)
Example of XLIFF data representation
C.1 Introduction

The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of

localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF

XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language

for monolingual inform
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.