SIST ISO 24616:2013
(Main)Language resources management -- Multilingual information framework
Language resources management -- Multilingual information framework
ISO 24616:2012 provides a generic platform for modeling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modeling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).
Gestion des ressources langagières -- Plateforme d'informations multilingues
Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije
Ta mednarodni standard zagotavlja splošno platformo za modeliranje večjezikovnih informacij in upravljanje z njimi na različnih področjih: lokalizacija, prevajanje, multimedijsko označevanje, upravljanje z dokumenti, podpora digitalni knjižnici in aplikacije za modeliranje poslovanja. MLIF (ogrodje za večjezične informacije) zagotavlja metamodel in sklop splošnih podatkovnih kategorij [ISO 12620:2009] za različna področja uporabe. MLIF zagotavlja tudi strategije za interoperabilnost in/ali povezovanje modelov, med drugim XLIFF, TMX, smilText in ITS.
General Information
- Status
- Published
- Publication Date
- 11-Jun-2013
- Technical Committee
- IDT - Information, documentation, language and terminology
- Current Stage
- 6060 - National Implementation/Publication (Adopted Project)
- Start Date
- 31-May-2013
- Due Date
- 05-Aug-2013
- Completion Date
- 12-Jun-2013
Overview
SIST ISO 24616:2013 - Multilingual information framework (MLIF) defines a generic platform for modelling and managing multilingual information across domains such as localization, translation, multimedia annotation, document management, digital libraries, and information or business modelling. MLIF provides a UML-specified metamodel plus a set of generic data categories (per ISO 12620:2009) and an XML serialization to enable consistent representation, linking and interoperability between formats such as XLIFF, TMX, smilText and ITS.
Key topics and technical requirements
- Metamodel (7 core components): MLDC (Multilingual Data Collection), GI (Global Information), GroupC (Grouping), MultiC (Multilingual Component), MonoC (Monolingual Component), HistoC (History), SegC (Segmentation).
- UML-based specification: The metamodel is defined using UML principles (subset relevant to MLIF).
- XML serialization: MLIF prescribes XML elements and attributes for serializing the metamodel, enabling machine-readable interchange.
- Data categories & adornment: MLIF uses ISO 12620 data categories to “adorn” model components (e.g., translationStatus, creationDate, matchQuality).
- Mandatory W3C attributes: xml:lang (mandatory on MonoC to indicate working language) and xml:id for unique identifiers.
- Segmentation & inline markup: SegC supports recursive segmentation and inline annotations (beginPairedTag, placeholder, genericPlaceholder) to preserve presentational features.
- Versioning & provenance: HistoC captures author, version, transaction and date for change tracking.
- Compliance modes: Implement MLIF fully from or embed MLIF-compliant elements (, , ) within other models.
Practical applications and who uses it
- Localization and translation tool vendors - to build interoperable translation memories and CAT tools that exchange TMX/XLIFF content reliably.
- Content managers and digital libraries - to manage multilingual collections with consistent metadata, provenance and segmentation.
- Multimedia and captioning teams - to synchronize text with audio/video (temporal synchronization: duration, begin, next).
- NLP and corpus engineers - to annotate corpora (morphology, POS, lemmas) and enhance translation prediction (see Annex A CAT example).
- Standards architects and integrators - to map or link domain models and ensure interoperability between language-resource formats.
Related standards
- ISO 12620:2009 - data category registry for language resources (used to adorn MLIF elements)
- ISO 24611 (MAF), ISO 24615 (SynAF), ISO 16642 (TMF) - complementary frameworks for morphological, syntactic and terminological description referenced for finer-grained annotation
- Interoperability targets: XLIFF, TMX, smilText, ITS
MLIF is an extensible, standards-based framework focused on interoperability and practical reuse of multilingual resources across localization, translation memory, multimedia annotation and digital content management workflows.
Frequently Asked Questions
SIST ISO 24616:2013 is a standard published by the Slovenian Institute for Standardization (SIST). Its full title is "Language resources management -- Multilingual information framework". This standard covers: ISO 24616:2012 provides a generic platform for modeling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modeling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).
ISO 24616:2012 provides a generic platform for modeling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modeling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).
SIST ISO 24616:2013 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination); 01.140.20 - Information sciences; 35.240.30 - IT applications in information, documentation and publishing. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase SIST ISO 24616:2013 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of SIST standards.
Standards Content (Sample)
SLOVENSKI STANDARD
01-julij-2013
Upravljanje z jezikovnimi viri - Ogrodje za večjezične informacije
Language resources management -- Multilingual information framework
Gestion des ressources langagières -- Plateforme d'informations multilingues
Ta slovenski standard je istoveten z: ISO 24616:2012
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues
Reference number
©
ISO 2012
© ISO 2012
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2012 – All rights reserved
Contents Page
Foreword . iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Specification principles . 2
4.1 Key standard used in the specification: Unified Modeling Language (UML) . 2
4.2 Metamodel and adornment . 2
4.3 XML serialization . 2
5 Metamodel specification . 2
6 MLIF compliance . 3
7 Metamodel adornment . 3
7.1 Introduction . 3
7.2 General principles concerning the use of W3C generic attributes . 3
7.3 Recommended adornment for GI . 4
7.4 Recommended adornment for GroupC . 4
7.5 Recommended adornment for MultiC . 4
7.6 Recommended and mandatory adornment for MonoC . 5
7.7 Recommended adornment for SegC . 5
7.8 Recommended adornment for HistoC . 5
7.9 Recommended online annotation adornment . 5
7.10 Recommended adornment for localization. 6
7.11 Recommended adornment for internationalization . 6
7.12 Recommended adornment for temporal synchronization . 6
8 Relation with other standards . 6
Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) . 8
Annex B (informative) Example: representing TMX data . 11
Annex C (informative) Example of XLIFF data representation . 14
Annex D (informative) Example: representing smilText data . 18
Annex E (informative) Example of MLIF usage for subtitles (captioning) . 20
Annex F (informative) Using MLIF for MAF data . 26
Annex G (normative) Detailed specification . 27
Bibliography . 42
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
iv © ISO 2012 – All rights reserved
INTERNATIONAL STANDARD ISO 24616:2012(E)
Language resources management — Multilingual information
framework
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
3.3
subtitle
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
3.4
working language
language in which linguistic sequences are expressed
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
(Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
(Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
(Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
(Multilingual Component), which groups together all variants of a given textual content;
(Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
(History Component), which traces modifications to the component to which it is anchored (i.e.
versioning);
(Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
by fully implementing the MLIF metamodel starting at the level of ;
by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
7.3 Recommended adornment for GI
7.4 Recommended adornment for GroupC
7.5 Recommended adornment for MultiC
4 © ISO 2012 – All rights reserved
7.6 Recommended and mandatory adornment for MonoC
att.lang
att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC
att.linguistic
att.xlink
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:
7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:
7.11 Recommended adornment for internationalization
7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:
8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
respectively.
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
scenarios.
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
6 © ISO 2012 – All rights reserved
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
Annex A
(informative)
Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
database.
For example, using a translation memory that contains the English sentence "The meal is nice." and its
1)
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>
The meal is nice.
Le repas est bon.
To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
procedure:
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.
1)
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
product.
8 © ISO 2012 – All rights reserved
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:
SEMMAR
20090922T140653Z
The meal is nice.
Le repas est bon.
The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.
class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.
10 © ISO 2012 – All rights reserved
Annex B
(informative)
Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
maps onto the element;
maps onto the element;
maps onto the element;
of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
The "creationtool" attribute maps onto the element;
The "creationdate" attribute maps onto the element;
The "tuid" attribute maps onto the element within MultiC.
The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.
adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>
Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.
El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.
Il suo metodo di controllo di qualità in 10 fasi risale a più
di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.
그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
The corresponding representation in MLIF default representation is as follows:
TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1
12 © ISO 2012 – All rights reserved
1091303313515
20020930T004233Z
Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.
B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
Annex C
(informative)
Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF
XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language
for monolingual information. This is handled through the appropriate use of the data
category in together with the language declarations ( and ) in
.
The core elements of the XLIFF macro-structure map to MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
the element maps onto the element;
maps onto the element;
element to . The corresponding textual content is placed in a element;
maps onto the element and simultaneously sets the value of the
element to . The corresponding textual content is placed in a element;
maps onto the element and simultaneously sets the value of the
element to alternate.
XLIFF further elements and attributes map onto MLIF elements as follows:
The XLIFF tool attribute maps onto the element.
14 © ISO 2012 – All rights reserved
C.3 Example of data
The following example, based on XLIFF version 1.2, focuses on the bilingual part of an XLIFF document:
xmlns="urn:oasis:names:tc:xliff:document:1.2"
version="1.2"
xml:lang="en"
xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-schema-1.2.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
source-language="en"
target-language="fr"
datatype="winres"
original="Sample1.rc">
restype="dialog"
resname="IDD_DIALOG1"
coord="0;0;186;57"
font="MS Sans Serif;8">
id="1" restype="caption">
xml:lang="en">Title
xml:lang="fr">Titre
id="2"
restype="label"
resname="IDC_STATIC"
coord="8;4;19;8">
xml:lang="en">Path
xml:lang="fr">Chemin
id="3"
restype="check"
resname="IDC_CHECK1"
coord="8;40;41;10">
xml:lang="en">Validate
xml:lang="fr">Valider
id="4"
restype="button"
resname="IDOK"
coord="129;7;50;14">
xml:lang="en">OK
xml:lang="fr">OK
id="5"
restype="button"
resname="IDCANCEL"
coord="129;24;50;14">
xml:lang="en">Cancel
Annuler
The corresponding representation in MLIF default representation is as follows:
XLIFF
1.2
en
fr
file
body
sourceLanguage
Title
targetLanguage
Titre
sourceLanguage
Path
targetLanguage
Chemin
sourceLanguage
Validate
targetLanguage
Valider
16 © ISO 2012 – All rights reserved
sourceLanguage
OK
targetLanguage
OK
sourceLanguage
Cancel
targetLanguage
Annuler
Annex D
(informative)
Example: representing smilText data
D.1 Introduction
Within the SMIL 3.0 W3C recommendation (http://www.w3.org/TR/2008/REC-SMIL3-20081201), the smilText
modules provide a text container element with an explicit content model for defining timed text
(http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html). smilText has the potential to be an
important application context for MLIF as it associates and synchronizes multimedia and textual content.
D.2 Using generic SMIL attributes in MLIF
General timing mechanisms from the SMIL (Synchronized Multimedia Integration Language) recommendation
may be used in MLIF-compliant content to provide synchronization mechanisms for textual content. The
following SMIL elements are thus integrated in the overall MLIF specification: "begin", "next" and "dur".
D.3 Simplified mapping of monolingual content
The basic use case for articulating MLIF and SMIL involves producing a monolingual SMIL output from a
multilingual representation expressed in an MLIF-compliant format. This results from the selection of the
content corresponding to a selected language and its integration into one or several containers, for
instance embedded in a construct. When applicable, the existing timing information is propagated into
the SMIL representation.
In this context, the core mappings between MLIF and the smilText specification are as follows:
elements map onto elements, together with the corresponding attributes (in
particular, language);
elements map univocally onto elements, together with the corresponding descriptors (in
particular, temporal ones).
The actual embedding of multilingual content within a single SMIL representation is based on the
constructs within the following skeleton:
xml:id="TE30"
region="Contents"
dur="12s"
its:dir="ltr"
xml:lang="en"
its:translate="yes"> This is a sentence.
xml:id="TF30"
region="Contents"
dur="12s"
18 © ISO 2012 – All rights reserved
its:dir="ltr"
xml:lang="fr"
its:translate="yes">Ceci est une
phrase.
Other non-temporal attributes such as region are not covered by the MLIF specification, and should therefore
be created separately from the MLIF-compliant structure.
This mapping can be used conversely to generate MLIF-compliant content from a SMIL representation. The
associated use case is typically the preparation of an MLIF-compliant structure that will later contain further
translation(s).
D.4 Mapping smilText to MLIF
2)
The core elements of smilText map to MLIF elements as follows :
the element functions as a logical and temporal structuring element that allows the inclusion
of inline text content in a SMIL presentation. smilText can also be used as an external, stand-alone timed
text format. This is achieved by using the SMIL 3.0 smilText profile;
the element defines a "temporal moment" within a block of smilText content; depending on the
values of the begin or next attributes, it determines a scheduling time during which the associated text
content (up to the following or element or the end of the smilText element) is rendered;
mapping to , the element defines a "temporal moment" within a block of smilText content
at which the full contents of the rendering area are cleared.
The following SMIL attributes also map as follows:
the "dur" attribute maps onto the element;
the "begin" attribute maps onto the element;
the "next" attribute maps onto the element.
2) These definitions are taken from the W3C SMIL recommandation:
http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html.
Annex E
(informative)
Example of MLIF usage for subtitles (captioning)
E.1 Introduction
Subtitles are textual versions of the dialog in films, television programs, video games, etc., and are usually
displayed at the bottom of the screen. They can be either, a written rendering of the dialog in the same
language, or a written rendering of the dialog in a language other than that of the dialog itself. Additional
information may be included in order to help viewers who are deaf and hard-of-hearing to follow the dialog
[SUB].
Professional subtitlers usually work with specialized computer software and hardware where the video is
stored digitally, making each individual frame instantly accessible. In addition to creating the subtitles, the
subtitler usually specifies the exact positions where each subtitle shall appear and disappear. For cinema film,
this task is traditionally done by separate technicians. The end result is a subtitle file containing the actual
subtitles as well as position markers indicating when each subtitle appears and disappears. These markers
are usually based on timecode if the work is for the electronic media (e.g. television, videos and DVDs) and on
film length (measured in feet and frames) if the subtitles are to be used for traditional cinema film.
E.2 Using MLIF to represent subtitling information
There are several formats that may be used for subtitles. Some of them are de jure standards (e.g.
[MPEG-4 TT]), while some others, although not being de jure standards, are currently being used by a large
number of people all over the world (e.g. SRT Format – SubRip). SRT is probably the most popular external
subtitle file format.
All subtitle formats have to provide a way of synchronizing video frames with subtitles. It goes without saying,
that synchronization means associating temporal markers to textual information.
The following example is a very small fragment of a part of an SRT file:
Fragment-1:
00:00:20,000 --> 00:00:24,400
Subtitle number one…
00:00:24,600 --> 00:00:27,800
Subtitle number two…
This annex demonstrates how MLIF may be used for subtitles. Fragment-2 and Fragment-3 have been built in
compliance with the latest SMIL specification, in particular smilText.
The use of MLIF for dealing with multilingual subtitles is straightforward. It is easy to parse any of the
proposed MLIF documents in order to obtain SRT files.
However, depending on the underlying scenario (or workflow), the subtitling information may be represented
in two different ways.
20 © ISO 2012 – All rights reserved
The first MLIF way (Fragment-2) defines a single element, and inside this element, two
elements are embedded as follows:
Fragment-2:
The second MLIF way (Fragment-3) defines two elements, each containing a single
element, with the corresponding outline:
Fragment-3:
The first approach may be more convenient for a pair-to-pair translation process, while the second way may
be more convenient for filtering and selecting of one language (for example, a monolingual block can easily be
isolated).
Other implementations may occur depending on how one wants to elicit temporal information associated with
the presentation of subtitles. For instance, the following examples use the SMIL attributes in two different
ways, with either an (Fragment-4) or a (Fragment-5).
Fragment-4
00:12:28,928
00:12:32,515
- Good morning.
- Dr Lecter, my name is Clarice Starling.
00:12:01,800
00:12:05,270
- Bonjour.
- Dr Lecter, je m'appelle Clarice Starling.
Fragment-5
00:12:28,928
3.607
- Good morning.
- Dr Lecter, my name is Clarice Starling.
00:12:01,800
3.47
- Bonjour.
- Dr Lecter, je m'appelle Clarice Starling.
E.3 Full example
E.3.1 Introduction
The following example associates an SRT representation with a compliant MLIF-based format.
E.3.2 SRT source files
E.3.2.1 English subtitles
The English subtitles are as follows:
00:00:32,560 --> 00:00:35,119
The world is changed.
00:00:35,640 --> 00:00:38,200
I see it in the water.
E.3.2.2 French subtitles
The French subtitles are as follows:
00:00:32,560 --> 00:00:35,119
Le monde a changé.
00:00:35,640 --> 00:00:38,200
J
...
SLOVENSKI STANDARD
01-julij-2013
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DYHþMH]LþQHLQIRUPDFLMH
Language resources management -- Multilingual information framework
Gestion des ressources langagières -- Plateforme d'informations multilingues
Ta slovenski standard je istoveten z: ISO 24616:2012
ICS:
01.020 7HUPLQRORJLMDQDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues
Reference number
©
ISO 2012
© ISO 2012
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2012 – All rights reserved
Contents Page
Foreword . iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Specification principles . 2
4.1 Key standard used in the specification: Unified Modeling Language (UML) . 2
4.2 Metamodel and adornment . 2
4.3 XML serialization . 2
5 Metamodel specification . 2
6 MLIF compliance . 3
7 Metamodel adornment . 3
7.1 Introduction . 3
7.2 General principles concerning the use of W3C generic attributes . 3
7.3 Recommended adornment for GI . 4
7.4 Recommended adornment for GroupC . 4
7.5 Recommended adornment for MultiC . 4
7.6 Recommended and mandatory adornment for MonoC . 5
7.7 Recommended adornment for SegC . 5
7.8 Recommended adornment for HistoC . 5
7.9 Recommended online annotation adornment . 5
7.10 Recommended adornment for localization. 6
7.11 Recommended adornment for internationalization . 6
7.12 Recommended adornment for temporal synchronization . 6
8 Relation with other standards . 6
Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) . 8
Annex B (informative) Example: representing TMX data . 11
Annex C (informative) Example of XLIFF data representation . 14
Annex D (informative) Example: representing smilText data . 18
Annex E (informative) Example of MLIF usage for subtitles (captioning) . 20
Annex F (informative) Using MLIF for MAF data . 26
Annex G (normative) Detailed specification . 27
Bibliography . 42
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
iv © ISO 2012 – All rights reserved
INTERNATIONAL STANDARD ISO 24616:2012(E)
Language resources management — Multilingual information
framework
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
3.3
subtitle
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
3.4
working language
language in which linguistic sequences are expressed
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
(Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
(Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
(Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
(Multilingual Component), which groups together all variants of a given textual content;
(Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
(History Component), which traces modifications to the component to which it is anchored (i.e.
versioning);
(Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
by fully implementing the MLIF metamodel starting at the level of ;
by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
7.3 Recommended adornment for GI
7.4 Recommended adornment for GroupC
7.5 Recommended adornment for MultiC
4 © ISO 2012 – All rights reserved
7.6 Recommended and mandatory adornment for MonoC
att.lang
att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC
att.linguistic
att.xlink
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:
7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:
7.11 Recommended adornment for internationalization
7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:
8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
respectively.
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
scenarios.
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
6 © ISO 2012 – All rights reserved
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
Annex A
(informative)
Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
database.
For example, using a translation memory that contains the English sentence "The meal is nice." and its
1)
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>
The meal is nice.
Le repas est bon.
To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
procedure:
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.
1)
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
product.
8 © ISO 2012 – All rights reserved
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:
SEMMAR
20090922T140653Z
The meal is nice.
Le repas est bon.
The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.
class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.
10 © ISO 2012 – All rights reserved
Annex B
(informative)
Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
maps onto the element;
maps onto the element;
maps onto the element;
of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
The "creationtool" attribute maps onto the element;
The "creationdate" attribute maps onto the element;
The "tuid" attribute maps onto the element within MultiC.
The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.
adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>
Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.
El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.
Il suo metodo di controllo di qualità in 10 fasi risale a più
di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.
그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
The corresponding representation in MLIF default representation is as follows:
TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1
12 © ISO 2012 – All rights reserved
1091303313515
20020930T004233Z
Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.
B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
Annex C
(informative)
Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF
XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language
for monolingual information. This is handled through the appropriate use of the data
category in together with the language declarations ( and ) in
.
The core elements of the XLIFF macro-structure map to MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
the element maps onto the element;
maps onto the element;
element to . The corresponding textual content is placed in a element;
maps onto the element and simultaneously sets the value of the
element to . The corresponding textual content is placed in a element;
maps onto the element and simultaneously sets the value of the
element to alternate.
XLIFF further elements and attributes map onto MLIF elements as follows:
The XLIFF tool attribute maps onto the element.
14 © ISO 2012 – All rights reserved
C.3 Example of data
The following example, based on XLIFF version 1.2, focuses on the bilingual part of an XLIFF document:
xmlns="urn:oasis:names:tc:xliff:document:1.2"
version="1.2"
xml:lang="en"
xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-schema-1.2.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
source-language="en"
target-language="fr"
datatype="winres"
original="Sample1.rc">
restype="dialog"
resname="IDD_DIALOG1"
coord="0;0;186;57"
font="MS Sans Serif;8">
id="1" restype="caption">
xml:lang="en">Title
xml:lang="fr">Titre
id="2"
restype="label"
resname="IDC_STATIC"
coord="8;4;19;8">
xml:lang="en">Path
xml:lang="fr">Chemin
id="3"
restype="check"
resname="IDC_CHECK1"
coord="8;40;41;10">
xml:lang="en">Validate
xml:lang="fr">Valider
id="4"
restype="button"
resname="IDOK"
coord="129;7;50;14">
xml:lang="en">OK
xml:lang="fr">OK
id="5"
restype="button"
resname="IDCANCEL"
coord="129;24;50;14">
xml:lang="en">Cancel
Annuler
The corresponding representation in MLIF default representation is as follows:
XLIFF
1.2
en
fr
file
body
sourceLanguage
Title
targetLanguage
Titre
sourceLanguage
Path
targetLanguage
Chemin
sourceLanguage
Validate
targetLanguage
Valider
16 © ISO 2012 – All rights reserved
sourceLanguage
OK
targetLanguage
OK
sourceLanguage
Cancel
targetLanguage
Annuler
Annex D
(informative)
Example: representing smilText data
D.1 Introduction
Within the SMIL 3.0 W3C recommendation (http://www.w3.org/TR/2008/REC-SMIL3-20081201), the smilText
modules provide a text container element with an explicit content model for defining timed text
(http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html). smilText has the potential to be an
important application context for MLIF as it associates and synchronizes multimedia and textual content.
D.2 Using generic SMIL attributes in MLIF
General timing mechanisms from the SMIL (Synchronized Multimedia Integration Language) recommendation
may be used in MLIF-compliant content to provide synchronization mechanisms for textual content. The
following SMIL elements are thus integrated in the overall MLIF specification: "begin", "next" and "dur".
D.3 Simplified mapping of monolingual content
The basic use case for articulating MLIF and SMIL involves producing a monolingual SMIL output from a
multilingual representation expressed in an MLIF-compliant format. This results from the selection of the
content corresponding to a selected language and its integration into one or several containers, for
instance embedded in a construct. When applicable, the existing timing information is propagated into
the SMIL representation.
In this context, the core mappings between MLIF and the smilText specification are as follows:
elements map onto elements, together with the corresponding attributes (in
particular, language);
elements map univocally onto elements, together with the corresponding descriptors (in
particular, temporal ones).
The actual embedding of multilingual content within a single SMIL representation is based on the
constructs within the following skeleton:
xml:id="TE30"
region="Contents"
dur="12s"
its:dir="ltr"
xml:lang="en"
its:translate="yes"> This is a sentence.
xml:id="TF30"
region="Contents"
dur="12s"
18 © ISO 2012 – All rights reserved
its:dir="ltr"
xml:lang="fr"
its:translate="yes">Ceci est une
phrase.
Other non-temporal attributes such as region are not covered by the MLIF specification, and should therefore
be created separately from the MLIF-compliant structure.
This mapping can be used conversely to generate MLIF-compliant content from a SMIL representation. The
associated use case is typically the preparation of an MLIF-compliant structure that will later contain further
translation(s).
D.4 Mapping smilText to MLIF
2)
The core elements of smilText map to MLIF elements as follows :
the element functions as a logical and temporal structuring element that allows the inclusion
of inline text content in a SMIL presentation. smilText can also be used as an external, stand-alone timed
text format. This is achieved by using the SMIL 3.0 smilText profile;
the element defines a "temporal moment" within a block of smilText content; depending on the
values of the begin or next attributes, it determines a scheduling time during which the associated text
content (up to the following or element or the end of the smilText element) is rendered;
mapping to , the element defines a "temporal moment" within a block of smilText content
at which the full contents of the rendering area are cleared.
The following SMIL attributes also map as follows:
the "dur" attribute maps onto the element;
the "begin" attribute maps onto the element;
the "next" attribute maps onto the element.
2) These definitions are taken from the W3C SMIL recommandation:
http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html.
Annex E
(informative)
Example of MLIF usage for subtitles (captioning)
E.1 Introduction
Subtitles are textual versions of the dialog in films, television programs, video games, etc., and are usually
displayed at the bottom of the screen. They can be either, a written rendering of the dialog in the same
language, or a written rendering of the dialog in a language other than that of the dialog itself. Additional
information may be included in order to help viewers who are deaf and hard-of-hearing to follow the dialog
[SUB].
Professional subtitlers usually work with specialized computer software and hardware where the video is
stored digitally, making each individual frame instantly accessible. In addition to creating the subtitles, the
subtitler usually specifies the exact positions where each subtitle shall appear and disappear. For cinema film,
this task is traditionally done by separate technicians. The end result is a subtitle file containing the actual
subtitles as well as position markers indicating when each subtitle appears and disappears. These markers
are usually based on timecode if the work is for the electronic media (e.g. television, videos and DVDs) and on
film length (measured in feet and frames) if the subtitles are to be used for traditional cinema film.
E.2 Using MLIF to represent subtitling information
There are several formats that may be used for subtitles. Some of them are de jure standards (e.g.
[MPEG-4 TT]), while some others, although not being de jure standards, are currently being used by a large
number of people all over the world (e.g. SRT Format – SubRip). SRT is probably the most popular external
subtitle file format.
All subtitle formats have to provide a way of synchronizing video frames with subtitles. It goes without saying,
that synchronization means associating temporal markers to textual information.
The following example is a very small fragment of a part of an SRT file:
Fragment-1:
00:00:20,000 --> 00:00:24,400
Subtitle number one…
00:00:24,600 --> 00:00:27,800
Subtitle number two…
This annex demonstrates how MLIF may be used for subtitles. Fragment-2 and Fragment-3 have been built in
compliance with the latest SMIL specification, in particular smilText.
The use of MLIF for dealing with multilingual subtitles is straightforward. It is easy to parse any of the
proposed MLIF documents in order to obtain SRT files.
However, depending on the underlying scenario (or workflow), the subtitling information may be represented
in two different ways.
20 © ISO 2012 – All rights reserved
The first MLIF way (Fragment-2) defines a single element, and inside this element, two
elements are embedded as follows:
Fragment-2:
The second MLIF way (Fragment-3) defines two elements, each containing a single
element, with the corresponding outline:
Fragment-3:
The first approach may be more convenient for a pair-to-pair translation process, while the second way may
be more convenient for filtering and selecting of one language (for example, a monolingual block can easily be
isolated).
Other implementations may occur depending on how one wants to elicit temporal information associated with
the presentation of subtitles. For instance, the following examples use the SMIL attributes in two different
ways, with either an (Fragment-4) or a (Fragment-5).
Fragment-4
00:12:28,928
00:12:32,515
- Good morning.
- Dr Lecter, my name is Clarice Starling.
00:12:01,800
00:12:05,270
- Bonjour.
- Dr Lecter, je m'appelle Clarice Starling.
Fragment-5
00:12:28,928
3.607
- Good morning.
- Dr Lecter, my name is Clarice Starling.
00:12:01,800
3.47
- Bonjour.
- Dr Lecter, je m'appelle Clarice Starling.
E.3 Full example
E.3.1 Introduction
The following example associates an SRT representation with a compliant MLIF-based format.
E.3.2 SRT source files
E.3.2.1 English subtitles
The English subtitles are as follows:
00:00:32,560 --> 00:00:35,119
The world is changed.
00:00:35,640 --> 00:00:38,200
I see it in the water.
E.3.2.2 French subtitles
The French subtitles are as follows:
00:00:32,560 --> 00:00:35,119
Le monde a changé.
00:00:35,640 --> 00:00:38,200
Je le vois dans l'eau.
22 © ISO 2012 – All rights reserved
SIST ISO 24616:201
...
INTERNATIONAL ISO
STANDARD 24616
First edition
2012-09-01
Language resources management —
Multilingual information framework
Gestion des ressources langagières — Plateforme d'informations
multilingues
Reference number
©
ISO 2012
© ISO 2012
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2012 – All rights reserved
Contents Page
Foreword . iv
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Specification principles . 2
4.1 Key standard used in the specification: Unified Modeling Language (UML) . 2
4.2 Metamodel and adornment . 2
4.3 XML serialization . 2
5 Metamodel specification . 2
6 MLIF compliance . 3
7 Metamodel adornment . 3
7.1 Introduction . 3
7.2 General principles concerning the use of W3C generic attributes . 3
7.3 Recommended adornment for GI . 4
7.4 Recommended adornment for GroupC . 4
7.5 Recommended adornment for MultiC . 4
7.6 Recommended and mandatory adornment for MonoC . 5
7.7 Recommended adornment for SegC . 5
7.8 Recommended adornment for HistoC . 5
7.9 Recommended online annotation adornment . 5
7.10 Recommended adornment for localization. 6
7.11 Recommended adornment for internationalization . 6
7.12 Recommended adornment for temporal synchronization . 6
8 Relation with other standards . 6
Annex A (informative) Example using MLIF for Computer-Assisted Translation (CAT) . 8
Annex B (informative) Example: representing TMX data . 11
Annex C (informative) Example of XLIFF data representation . 14
Annex D (informative) Example: representing smilText data . 18
Annex E (informative) Example of MLIF usage for subtitles (captioning) . 20
Annex F (informative) Using MLIF for MAF data . 26
Annex G (normative) Detailed specification . 27
Bibliography . 42
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24616 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
iv © ISO 2012 – All rights reserved
INTERNATIONAL STANDARD ISO 24616:2012(E)
Language resources management — Multilingual information
framework
1 Scope
This International Standard provides a generic platform for modelling and managing multilingual information in
various domains: localization, translation, multimedia annotation, document management, digital library
support, and information or business modelling applications. MLIF (multilingual information framework)
provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains.
MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to,
XLIFF, TMX, smilText and ITS.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 12620:2009; Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 8879, Information processing — Text and office systems —Generalized Markup Language (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply:
3.1
adornment
data category attached to a component of a metamodel
3.2
inline code
inline instructions inserted in a source document
Note to entry: Native code can, for instance, provide presentational instructions (e.g. HTML codes).
3.3
subtitle
textual versions of the dialog in films, television programs, video games, etc., usually displayed at the bottom
of the screen
3.4
working language
language in which linguistic sequences are expressed
4 Specification principles
4.1 Key standard used in the specification: Unified Modeling Language (UML)
The MLIF specification complies with the modelling principles of UML as defined by the Object Management
Group (OMG) [UML]. The specification uses the UML subset that is relevant for the purposes of MLIF.
4.2 Metamodel and adornment
In line with Terminological Markup Framework (TMF) as defined in ISO 16642, MLIF defines a metamodel that
is adorned by data categories, as defined in ISO 12620.
4.3 XML serialization
Associated with the metamodel and its adornment, MLIF proposes a representation in XML called “XML
serialization”, in line with Extensible Markup Language (XML) as defined in ISO 8879.
5 Metamodel specification
The MLIF metamodel is specified in the UML object diagram in Figure 1.
Figure 1 — MLIF metamodel
2 © ISO 2012 – All rights reserved
The MLIF metamodel is defined by the following seven "core components". These components are listed as
follows, according to their XML serialization:
(Multilingual Data Collection), which represents a collection of data containing global information
and several multilingual units;
(Global Information), which represents technical and administrative information applying to the
entire multilingual data collection;
(Grouping components), which represents a sub-collection of multilingual data that have a
common origin or purpose within a given project;
(Multilingual Component), which groups together all variants of a given textual content;
(Monolingual Component), which groups together information related to one language and is
part of a multilingual component (MultiC);
(History Component), which traces modifications to the component to which it is anchored (i.e.
versioning);
(Segmentation Component), which allows any level of segmentation for textual information,
possibly in a recursive manner.
6 MLIF compliance
Any format compliant with this International Standard may use the MLIF metamodel in two possible ways:
by fully implementing the MLIF metamodel starting at the level of ;
by specifically embedding MLIF-compliant information within another model, by implementing one of the
lower level MLIF elements, namely , or .
7 Metamodel adornment
7.1 Introduction
The MLIF XML serialization proposes a set of XML elements and XML attributes, which are described in the
following sections, where the characters “<” and “>” delimit the name of the element. Following the TEI
guidelines (http://www.tei-c.org), some attributes are specified by means of a class attribute, with the
convention that the name of the class attribute is prefixed by “att.” (e.g. “att.xlink”). The other XML attributes
are listed with the convention that two quotes delimit the name of the attribute (e.g. “xml:lang”). The
specifications in Annex G shall be applied.
7.2 General principles concerning the use of W3C generic attributes
The following W3C attributes are to be used by all MLIF-compliant applications:
the attribute xml:lang shall be used in accordance with W3C recommendations to represent the working
language of any relevant element and, in particular, shall be used systematically for any implementation
of MonoC;
the attribute xml:id shall be used in accordance with W3C recommendations to provide a unique identifier
to an element of the MLIF metamodel.
7.3 Recommended adornment for GI
7.4 Recommended adornment for GroupC
7.5 Recommended adornment for MultiC
4 © ISO 2012 – All rights reserved
7.6 Recommended and mandatory adornment for MonoC
att.lang
att.xlink
The language attribute is mandatory on MonoC. All other adornments are optional.
7.7 Recommended adornment for SegC
att.linguistic
att.xlink
7.8 Recommended adornment for HistoC
The HistoC component is a generic component that traces modifications made on the component to which it is
anchored (e.g. creation, modification and validation). In the MLIF metamodel, the HistoC component may be
anchored to the GI, MultiC or MonoC component. This makes it possible for all evolutions of, or
enhancements to, the component to be recorded.
HistoC may be adorned by four elements:
7.9 Recommended online annotation adornment
Multilingual text documents are often only one stage in a complex workflow that involves external document
sources in a wide variety of formats. From these, it is often necessary to keep inline markup indicating the
presentational features that have to be retained in a translated target document. To this end, MLIF-compliant
applications should use the following elements, in relation to the element, that map onto similar
subsets in TMX and XLIFF:
7.10 Recommended adornment for localization
All the following elements should be used to provide localization-related information:
7.11 Recommended adornment for internationalization
7.12 Recommended adornment for temporal synchronization
The following elements should be used when textual content has to be conveyed (in written or spoken form)
together with some constraints:
8 Relation with other standards
As with the “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a
metamodel that combines with selected data categories as a way of ensuring interoperability between several
multilingual applications and corpora. MLIF deals with multilingual corpora, multilingual fragments, and the
translation relations between them. In each domain where MLIF is applicable, a specific granularity may be
considered for segmentation and description. These two last processes may rely on MAF [ISO 24611], SynAF
[ISO 24615] and TMF for morphological description, syntactical annotation and terminological description
respectively.
MLIF supports the construction and the interoperability of localization and translation memories resources,
and also deals with the description of a metamodel for multilingual content. MLIF does not propose a closed
list of description features. Rather, it provides a list of data categories that is much easier to update and
extend. This list represents a point of reference for multilingual information in the context of various application
scenarios.
However, MLIF not only describes elementary linguistic segments (e.g. sentence, syntactical fragment, word
and part of speech), but may also be used to represent document structure (e.g. title, abstract, paragraph and
section). In addition, MLIF allows for external and internal links (annotations and references).
MLIF is designed to provide a common framework that facilitates the interoperability with formats such as
TMX (LISA OSCAR) and XLIFF (OASIS). MLIF can be seen as a parent of these formats, since both of them
6 © ISO 2012 – All rights reserved
deal with multilingual data expressed in the form of segments or text units. Both can be stored, manipulated
and translated in a similar manner.
Examples of using MLIF are given in Annexes A to F.
Annex A
(informative)
Example using MLIF for Computer-Assisted Translation (CAT)
The main reason for lemma, part-of-speech and morphological features is to allow CAT tools based on
translation memory to produce translations of new words and sentences that are not in the translation
database.
For example, using a translation memory that contains the English sentence "The meal is nice." and its
1)
translation in French "Le repas est bon.", current CAT tools such as SDL TRADOS Translator's Workbench
are not able to provide the predicted translation for the sentence "The meals are nice." even though the word
lemmas of "The meal is nice." and "The meals are nice." are matching. This weakness is due to the fact that
these tools use limited linguistic criteria during the translation process.
The data produced by TRADOS Translator's Workbench is as follows:
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>
The meal is nice.
Le repas est bon.
To translate the sentence "The meals are nice.", an MLIF-compliant tool should implement the following
procedure:
Step-1 Represent in MLIF and add linguistic properties to all the words within the translation memory.
Step-2 Run a part-of-speech tagger on the sentence in order to obtain the right morphosyntactic word
categories.
Step-3 Translate the lemmas using an English-to-French bilingual lexicon.
1)
SDL TRADOS Translator's Workbench is an example of a suitable product available commercially. This information is
given for the convenience of users of this International Standard and does not constitute an endorsement by ISO of this
product.
8 © ISO 2012 – All rights reserved
Step-4 Consult a French lexicon of inflected forms in order to retrieve the correct inflected form using the
lemma and morphological features.
Step-5 Generate the translation of "The meals are nice." by substituting each English word with its French
inflected form as follows:
"The meals are nice." => "Les repas sont bons."
The XML data will include a feature structure declaration defining a tagset (e.g. for "nS"), with a word
segmentation and tagset defined in MAF:
SEMMAR
20090922T140653Z
The meal is nice.
Le repas est bon.
The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.
class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.
10 © ISO 2012 – All rights reserved
Annex B
(informative)
Example: representing TMX data
B.1 Introduction
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of
Translation Memory (TM) data created by computer-aided translation (CAT) and localization tools. The
purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation
vendors with little or no loss of critical data during the process. TMX, which has been on the market since
1998, is a certifiable standard format. It was developed, and is maintained by, OSCAR (Open Standards for
Container/Content Allowing Re-use), a LISA Special Interest Group.
B.2 Mapping TMX to MLIF
TMX is nearly isomorphic to the MLIF metamodel. The core elements of the TMX macro-structure map to
MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
maps onto the element;
maps onto the element;
maps onto the element;
of type term maps onto the element of type term.
Further TMX elements and attributes map onto MLIF elements as follows:
The "creationtool" attribute maps onto the element;
The "creationdate" attribute maps onto the element;
The "tuid" attribute maps onto the element within MultiC.
The element does not map onto any specific element as it represents a generic placeholder for
application-dependent data. When applicable, a specific element is explicitly mapped onto MLIF
elements or onto a standardized ISO/TC 37 data category as available from ISOCat.
B.3 Example of data
The following example, based on TMX version 1.4, focuses on the multilingual units of a TMX document and
does not translate all the details of the header.
adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>
Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.
El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.
Il suo metodo di controllo di qualità in 10 fasi risale a più
di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.
그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
The corresponding representation in MLIF default representation is as follows:
TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1
12 © ISO 2012 – All rights reserved
1091303313515
20020930T004233Z
Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.
B.4 Example of TMX and MLIF interaction
Figure B.1 illustrates the interaction between TMX and MLIF. This process involves subsequent steps of
extraction, translation and merging. The process begins with a TMX document containing linguistic content in
English (en) and German (de). The extraction process (1) generates a “Skeleton File” (2) containing all TM
formatting information, and an MLIF Document Linguistic Content (3) in which only relevant linguistic
information is stored. As most translators (human beings or automatic software modules) work with TMX
software-oriented tools, an XSL style-sheet makes it possible to transform an MLIF document into a TMX
document. This file does not contain any formatting information. Once the translator has added the
appropriate Japanese (ja) translation, another XSL style-sheet transforms the TMX document into an MLIF
document (4). Finally, the new MLIF document (containing the Japanese translation) is merged with the
“Skeleton File” to produce a new TMX formatted document (5).
Figure B.1 — TMX and MLIF interaction
Annex C
(informative)
Example of XLIFF data representation
C.1 Introduction
The purpose of the XLIFF is to define and promote the adoption of a specification for the interchange of
localizable software- and document-based objects and related metadata.
C.2 Mapping XLIFF to MLIF
XLIFF differs from the MLIF metamodel in that it draws a clear distinction between source and target language
for monolingual information. This is handled through the appropriate use of the data
category in together with the language declarations ( and ) in
.
The core elements of the XLIFF macro-structure map to MLIF as follows:
maps onto the element;
is a container for the element and maps onto the element;
the element maps onto the element;
maps onto the element;
element to . The corresponding textual content is placed in a element;
maps onto the element and simultaneously sets the value of the
element to . The corresponding textual content is placed in a element;
maps onto the element and simultaneously sets the value of the
element to alternate.
XLIFF further elements and attributes map onto MLIF elements as follows:
The XLIFF tool attribute maps onto the element.
14 © ISO 2012 – All rights reserved
C.3 Example of data
The following example, based on XLIFF version 1.2, focuses on the bilingual part of an XLIFF document:
xmlns="urn:oasis:names:tc:xliff:document:1.2"
version="1.2"
xml:lang="en"
xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-schema-1.2.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
source-language="en"
target-language="fr"
datatype="winres"
original="Sample1.rc">
restype="dialog"
resname="IDD_DIALOG1"
coord="0;0;186;57"
font="MS Sans Serif;8">
id="1" restype="caption">
xml:lang="en">Title
xml:lang="fr">Titre
id="2"
restype="label"
resname="IDC_STATIC"
coord="8;4;19;8">
xml:lang="en">Path
xml:lang="fr">Chemin
id="3"
restype="check"
resname="IDC_CHECK1"
coord="8;40;41;10">
xml:lang="en">Validate
xml:lang="fr">Valider
id="4"
restype="button"
resname="IDOK"
coord="129;7;50;14">
xml:lang="en">OK
xml:lang="fr">OK
id="5"
restype="button"
resname="IDCANCEL"
coord="129;24;50;14">
xml:lang="en">Cancel
Annuler
The corresponding representation in MLIF default representation is as follows:
XLIFF
1.2
en
fr
file
body
sourceLanguage
Title
targetLanguage
Titre
sourceLanguage
Path
targetLanguage
Chemin
sourceLanguage
Validate
targetLanguage
Valider
16 © ISO 2012 – All rights reserved
sourceLanguage
OK
targetLanguage
OK
sourceLanguage
Cancel
targetLanguage
Annuler
Annex D
(informative)
Example: representing smilText data
D.1 Introduction
Within the SMIL 3.0 W3C recommendation (http://www.w3.org/TR/2008/REC-SMIL3-20081201), the smilText
modules provide a text container element with an explicit content model for defining timed text
(http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html). smilText has the potential to be an
important application context for MLIF as it associates and synchronizes multimedia and textual content.
D.2 Using generic SMIL attributes in MLIF
General timing mechanisms from the SMIL (Synchronized Multimedia Integration Language) recommendation
may be used in MLIF-compliant content to provide synchronization mechanisms for textual content. The
following SMIL elements are thus integrated in the overall MLIF specification: "begin", "next" and "dur".
D.3 Simplified mapping of monolingual content
The basic use case for articulating MLIF and SMIL involves producing a monolingual SMIL output from a
multilingual representation expressed in an MLIF-compliant format. This results from the selection of the
content corresponding to a selected language and its integration into one or several containers, for
instance embedded in a construct. When applicable, the existing timing information is propagated into
the SMIL representation.
In this context, the core mappings between MLIF and the smilText specification are as follows:
elements map onto elements, together with the corresponding attributes (in
particular, language);
elements map univocally onto elements, together with the corresponding descriptors (in
particular, temporal ones).
The actual embedding of multilingual content within a single SMIL representation is based on the
constructs within the following skeleton:
xml:id="TE30"
region="Contents"
dur="12s"
its:dir="ltr"
xml:lang="en"
its:translate="yes"> This is a sentence.
xml:id="TF30"
region="Contents"
dur="12s"
18 © ISO 2012 – All rights reserved
its:dir="ltr"
xml:lang="fr"
its:translate="yes">Ceci est une
phrase.
Other non-temporal attributes such as region are not covered by the MLIF specification, and should therefore
be created separately from the MLIF-compliant structure.
This mapping can be used conversely to generate MLIF-compliant content from a SMIL representation. The
associated use case is typically the preparation of an MLIF-compliant structure that will later contain further
translation(s).
D.4 Mapping smilText to MLIF
2)
The core elements of smilText map to MLIF elements as follows :
the element functions as a logical and temporal structuring element that allows the inclusion
of inline text content in a SMIL presentation. smilText can also be used as an external, stand-alone timed
text format. This is achieved by using the SMIL 3.0 smilText profile;
the element defines a "temporal moment" within a block of smilText content; depending on the
values of the begin or next attributes, it determines a scheduling time during which the associated text
content (up to the following or element or the end of the smilText element) is rendered;
mapping to , the element defines a "temporal moment" within a block of smilText content
at which the full contents of the rendering area are cleared.
The following SMIL attributes also map as follows:
the "dur" attribute maps onto the element;
the "begin" attribute maps onto the element;
the "next" attribute maps onto the element.
2) These definitions are taken from the W3C SMIL recommandation:
http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html.
Annex E
(informative)
Example of MLIF usage for subtitles (captioning)
E.1 Introduction
Subtitles are textual versions of the dialog in films, television programs, video games, etc., and are usually
displayed at the bottom of the screen. They can be either, a written rendering of the dialog in the same
language, or a written rendering of the dialog in a language other than that of the dialog itself. Additional
information may be included in order to help viewers who are deaf and hard-of-hearing to follow the dialog
[SUB].
Professional subtitlers usually work with specialized computer software and hardware where the video is
stored digitally, making each individual frame instantly accessible. In addition to creating the subtitles, the
subtitler usually specifies the exact positions where each subtitle shall appear and disappear. For cinema film,
this task is traditionally done by separate technicians. The end result is a subtitle file containing the actual
subtitles as well as position markers indicating when each subtitle appears and disappears. These markers
are usually based on timecode if the work is for the electronic media (e.g. television, videos and DVDs) and on
film length (measured in feet and frames) if the subtitles are to be used for traditional cinema film.
E.2 Using MLIF to represent subtitling information
There are several formats that may be used for subtitles. Some of them are de jure standards (e.g.
[MPEG-4 TT]), while some others, although not being de jure standards, are currently being used by a large
number of people all over the world (e.g. SRT Format – SubRip). SRT is probably the most popular external
subtitle file format.
All subtitle formats have to provide a way of synchronizing video frames with subtitles. It goes without saying,
that synchronization means associating temporal markers to textual information.
The following example is a very small fragment of a part of an SRT file:
Fragment-1:
00:00:20,000 --> 00:00:24,400
Subtitle number one…
00:00:24,600 --> 00:00:27,800
Subtitle number two…
This annex demonstrates how MLIF may be used for subtitles. Fragment-2 and Fragment-3 have been built in
compliance with the latest SMIL specification, in particular smilText.
The use of MLIF for dealing with multilingual subtitles is straightforward. It is easy to parse any of the
proposed MLIF documents in order to obtain SRT files.
However, depending on the underlying scenario (or workflow), the subtitling information may be represented
in two different ways.
20 © ISO 2012 – All rights reserved
The first MLIF way (Fragment-2) defines a single element, and inside this element, two
elements are embedded as follows:
Fragment-2:
The second MLIF way (Fragment-3) defines two elements, each containing a single
element, with the corresponding outline:
Fragment-3:
The first approach may be more convenient for a pair-to-pair translation process, while the second way may
be more convenient for filtering and selecting of one language (for example, a monolingual block can easily be
isolated).
Other implementations may occur depending on how one wants to elicit temporal information associated with
the presentation of subtitles. For instance, the following examples use the SMIL attributes in two different
ways, with either an (Fragment-4) or a (Fragment-5).
Fragment-4
00:12:28,928
00:12:32,515
- Good morning.
- Dr Lecter, my name is Clarice Starling.
00:12:01,800
00:12:05,270
- Bonjour.
- Dr Lecter, je m'appelle Clarice Starling.
Fragment-5
00:12:28,928
3.607
- Good morning.
- Dr Lecter, my name is Clarice Starling.
00:12:01,800
3.47
- Bonjour.
- Dr Lecter, je m'appelle Clarice Starling.
E.3 Full example
E.3.1 Introduction
The following example associates an SRT representation with a compliant MLIF-based format.
E.3.2 SRT source files
E.3.2.1 English subtitles
The English subtitles are as follows:
00:00:32,560 --> 00:00:35,119
The world is changed.
00:00:35,640 --> 00:00:38,200
I see it in the water.
E.3.2.2 French subtitles
The French subtitles are as follows:
00:00:32,560 --> 00:00:35,119
Le monde a changé.
00:00:35,640 --> 00:00:38,200
Je le vois dans l'eau.
22 © ISO 2012 – All rights reserved
E.4 MLIF representation - paired sentences
Based on Fragment-2 structure, the resulting MLIF data are as follows:
2008-11-30T17:31:57+01:00
Samuel CRUZ-LARA
0.1
00:00:32.560
00:00:35.119
The
world
is
changed
.
00:00:35.640
00:00:38.200
I
feel
it
in
the
water
.
00:00:32.560
00:00:35.119
...
МЕЖДУНАРОДНЫЙ ISO
СТАНДАРТ 24616
Первое издание
2012-09-01
Управление языковыми ресурсами.
Многоязыковая информационная
система
Language resources management – Multilingual information framework
Ответственность за подготовку русской версии несѐт GOST R
(Российская Федерация) в соответствии со статьѐй 18.1 Устава ISO
Ссылочный номер
©
ISO 2012
ДОКУМЕНТ ЗАЩИЩЁН АВТОРСКИМ ПРАВОМ
© ISO 2012
Все права сохраняются. Если не указано иное, никакую часть настоящей публикации нельзя копировать или
использовать в какой-либо форме или каким-либо электронным или механическим способом, включая фотокопии и
микрофильмы, без предварительного получения письменного согласия ISO по указанному ниже адресу или
организации-члена ISO в стране запрашивающей стороны.
Бюро ISO по авторским правам:
Case postale 56 CH-1211 Geneva 20
Тел.: + 41 22 749 01 11
Факс: + 41 22 749 09 47
Эл. почта: copyright@iso.org
Веб-сайт: www.iso.org
Опубликовано в Швейцарии
ii © ISO 2012 – Все права сохраняются
Содержание Страница
Предисловие .iv
1 Область применения .1
2 Нормативные ссылки .1
3 Термины и определения .1
4 Принципы описания .2
4.1 Основополагающий стандарт спецификаций: универсальный язык моделирования
UML .2
4.2 Метамодель и стилистический орнамент .2
4.3 XML-сериализация .2
5 Спецификация метамодели .2
6 Применимость MLIF .3
7 Стилистический орнамент метамодели .4
7.1 Общие замечания .4
7.2 Общие принципы использования групповых атрибутов W3C .4
7.3 Рекомендуемый стилистический орнамент для компонента GI .4
7.4 Рекомендуемый стилистический орнамент для компонента GroupC .5
7.5 Рекомендуемый стилистический орнамент для компонента MultiC .5
7.6 Рекомендуемый стилистический орнамент для компонента MonoC .5
7.7 Рекомендуемый стилистический орнамент для компонента SegC .5
7.8 Рекомендуемый стилистический орнамент для компонента HistoC .6
7.9 Рекомендуемый стилистический орнамент для оперативно доступной аннотации .6
7.10 Рекомендуемый стилистический орнамент для локализации .6
7.11 Рекомендуемый стилистический орнамент для интернационализации .6
7.12 Рекомендуемый стилистический орнамент для синхронизации во времени .7
8 Связь с другими стандартами .7
Приложение A (информативное) Пример использования MLIF для автоматизированного
перевода .8
Приложение B (информативное) Пример: представление данных TMX . 11
Приложение C (информативное) Пример представления данных в формате XLIFF. 14
Приложение D (информативное) Пример представления данных smilText . 18
Приложение E (информативное) Пример использования MLIF для создания субтитров . 20
Приложение F (информативное) Использование MLIF применительно к данным MAF . 26
Приложение G (информативное) Детализированная спецификация . 27
Библиография . 42
Предисловие
Международная организация по стандартизации (ISO) является всемирной федерацией национальных
организаций по стандартизации (комитетов-членов ISO). Разработка международных стандартов
обычно осуществляется техническими комитетами ISO. Каждый комитет-член, заинтересованный в
деятельности, для которой был создан технический комитет, имеет право быть представленным в этом
комитете. Международные правительственные и неправительственные организации, имеющие связь с
ISO, также принимают участие в работе. ISO работает в тесном сотрудничестве с Международной
электротехнической комиссией (IEC) по всем вопросам стандартизации в области электротехники.
Проекты международных стандартов разрабатываются согласно правилам, приведѐнным в Директивах
ISO/IEC, Часть 2.
Разработка международных стандартов является основной задачей технических комитетов. Проекты
международных стандартов, принятые техническими комитетами, рассылаются комитетам-членам на
голосование. Для публикации в качестве международного стандарта требуется одобрение не менее
75 % комитетов-членов, принявших участие в голосовании.
Принимается во внимание тот факт, что некоторые из элементов настоящего документа могут быть
объектом патентных прав. ISO не принимает на себя обязательств по определению отдельных или
всех таких патентных прав.
ISO 24616 был подготовлен Техническим комитетом ISO/TC 37, Терминология и другие языковые и
информационные ресурсы, Подкомитетом SC 4, Управление языковыми ресурсами.
iv ISO 2012 – Все права сохраняются
МЕЖДУНАРОДНЫЙ СТАНДАРТ ISO 24616:2012(R)
Управление языковыми ресурсами. Многоязыковая
информационная система
1 Область применения
Настоящий Международный стандарт обеспечивает универсальную платформу для моделирования
многоязыковой информации и управления ею в самых разных сферах: локализации, перевода,
мультимедийного аннотирования, организации документооборота, ведения цифровых библиотек и в
прикладных системах моделирования хозяйственной деятельности предприятий. Многоязыковая
информационная система MLIF (multilingual information framework) предоставляет соответствующую
высокоуровневую модель (метамодель) и множество универсальных категорий данных [согласно
ISO 12620:2009] для многочисленных прикладных областей. Она обеспечивает также необходимые
стратегии взаимодействия и/или связывания различных моделей, включая, в частности, широко
используемые модели XLIFF, TMX, smilText и ITS.
2 Нормативные ссылки
Перечисленные ниже ссылочные документы обязательны для применения данного документа. В
случае датированных ссылок действующим является только указанное издание. Применительно к
недатированным ссылочным документам применяются их самые последние издания (включая все
последующие изменения):
ISO 12620:2009; Терминология, другие языковые ресурсы и ресурсы содержания. Спецификация
категорий данных и ведение реестра категорий данных для языковых ресурсов
ISO 8879, Обработка информации. Текстовые и офисные системы. Стандартный обобщѐнный
язык разметки (SGML)
Extensible Markup Language. Fifth Edition, T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau
Editors, W3C Recommendation, 26 November 2008, http://www.w3.org/TR/xml
3 Термины и определения
В рамках настоящего документа используются термины и определения, приведѐнные ниже:
3.1
стилистический орнамент
adornment
категория данных, приписываемая компоненту метамодели
3.2
внутритекстовый код
inline code
внутритекстовые команды, встроенные в исходный документ
Примечание к статье: на естественном языке могут записываться, в частности, команды
представления информации (например, коды HTML).
ISO 2012 – Все права сохраняются
3.3
субтитр
subtitle
текстовые эквиваленты диалогов в кинофильмах, телепрограммах, видеоиграх и т.п., обычно
отображаемые внизу экрана
3.4
рабочий язык
working language
язык, с помощью которого выражаются последовательности лингвистических единиц
4 Принципы описания
4.1 Основополагающий стандарт спецификаций: универсальный язык
моделирования UML
В основе спецификации MLIF лежат принципы построения моделей на языке UML, как он был
определѐн Группой объектного управления OMG [Object Management Group]. В спецификации
используется подмножество элементов языка UML, подходящее для целей MLIF.
4.2 Метамодель и стилистический орнамент
Наряду с терминологической системой разметки TMF (Terminological Markup Framework), как она
определена в ISO 16642, MLIF определяет метамодель, орнаментированную категориями данных, как
она представлена в ISO 12620.
4.3 XML-сериализация
Совместно с метамоделью и еѐ стилистическим орнаментом MLIF даѐт представление информации на
языке XML, называемое “XML-сериализацией”, в сочетании с расширяемым языком разметки XML
(Extensible Markup Language), как он определѐн в ISO 8879.
5 Спецификация метамодели
Метамодель MLIF описывается объектной диаграммой на языке UML, показанной на Рисунке 1. Эту
модель определяют следующие семь “компонентов ядра”, перечисленных ниже в порядке их XML-
сериализации:
(Multilingual Data Collection / Многоязыковая коллекция данных), которая представляет
собой совокупность данных, содержащих информацию глобального характера и несколько
многоязыковых лингвистических единиц;
(Global Information / Глобальная информация), включающая в себя сведения технического и
административного характера, применимые ко всей коллекции многоязыковых данных в целом;
(Grouping components / Компоненты группирования), которые представляют
подмножество многоязыковых данных, имеющих общий источник или общее целевое назначение
в рамках конкретного проекта;
(Multilingual Component / Многоязыковой компонент), обеспечивающий группировку всех
вариантов определѐнного текстового содержания;
(Monolingual Component / Одноязычный компонент), обеспечивающий группировку
информации, которая относится к одному и тому же языку и является частью многоязыкового
компонента MultiC;
ISO 2012 – Все права сохраняются
(History Component / Компонент предыстории), обеспечивающий отслеживание
изменений компонента, к которому он привязан (т.е. контроль версий);
(Segmentation Component / Компонент сегментации), который обеспечивает возможность
любого уровня сегментирования текстовой информации, в том числе – с использованием
рекурсивного метода.
Рисунок 1 — Схематическое представление метамодели MLIF
6 Применимость MLIF
Метамодель MLIF может использоваться применительно к любому формату, совместимому с
настоящим международным стандартом, двумя способами:
посредством полной реализации метамодели MLIF, начиная с уровня ;
путѐм специального вложения информации, совместимой с MLIF, в другую модель, с целью
реализации низкоуровневых элементов MLIF, а именно , или .
7 Стилистический орнамент метамодели
7.1 Общие замечания
XML-сериализация MLIF предполагает наличие множества элементов и атрибутов XML, которые
описываются в последующих разделах настоящего стандарта и в которых символы “<” и “>”
ограничивают имя элемента. В соответствии с руководящими указаниями TEI (http://www.tei-c.org),
некоторые из атрибутов определяются путѐм указания их класса, и в этом случае атрибут имени
класса предваряется префиксом “att.” (например “att.xlink”). В то же время другие атрибуты XML
определяются списком, в котором имена атрибутов выделяются кавычками (например “xml:lang”). При
этом должны использоваться спецификации, представленные в Приложении G настоящего стандарта.
7.2 Общие принципы использования групповых атрибутов W3C
Во всех MLIF-совместимых приложениях подлежат использованию следующие атрибуты,
определѐнные консорциумом W3C:
атрибут xml:lang должен применяться для представления рабочего языка любого релевантного
элемента и, в частности, использоваться систематически при любой реализации компонента
MonoC;
атрибут xml:id должен использоваться в соответствии с рекомендациями W3C для предоставления
уникального идентификатора элемента метамодели MLIF.
7.3 Рекомендуемый стилистический орнамент для компонента GI
ISO 2012 – Все права сохраняются
7.4 Рекомендуемый стилистический орнамент для компонента GroupC
7.5 Рекомендуемый стилистический орнамент для компонента MultiC
7.6 Рекомендуемый стилистический орнамент для компонента MonoC
att.lang
att.xlink
Для компонента MonoC обязательно наличие языкового атрибута; все другие атрибуты -
факультативны.
7.7 Рекомендуемый стилистический орнамент для компонента SegC
att.linguistic
att.xlink
7.8 Рекомендуемый стилистический орнамент для компонента HistoC
Групповой компонент HistoC обеспечивает отслеживание изменений компонента, к которому он
привязан (например, этапов создания, модификации и подтверждение достоверности). В метамодели
MLIF компонент HistoC может привязываться к компоненту GI, MultiC или MonoC, благодаря чему
становится возможной регистрация всех эволюционных изменений или расширений компонента.
Компонент HistoC может иметь стилистический орнамент из четырѐх элементов:
7.9 Рекомендуемый стилистический орнамент для оперативно доступной аннотации
Многоязычные текстовые документы часто появляются только на одном этапе сложного
технологического процесса, в котором участвуют внешние источники документов, имеющих самые
разные форматы. Отсюда часто возникает необходимость сохранения внутритекстовой разметки,
указывающей на характеристики представления данных, которые подлежат сохранению и в целевом
документе на языке перевода. Поэтому в рамках MLIF-совместимых приложений применительно к
компоненту должны использоваться следующие элементы, отображаемые на аналогичные
подмножества элементов в TMX и XLIFF:
7.10 Рекомендуемый стилистический орнамент для локализации
Для предоставления необходимой информации, касающейся локализации, подлежат использованию
следующие элементы:
7.11 Рекомендуемый стилистический орнамент для интернационализации
ISO 2012 – Все права сохраняются
7.12 Рекомендуемый стилистический орнамент для синхронизации во времени
Когда текстовое содержание документа подлежит передаче (в письменной или устной форме) вместе с
некоторыми сопутствующими ограничениями, должны использоваться элементы:
8 Связь с другими стандартами
Применительно к структуре терминологической разметки TMF [ISO 16642] при работе с терминологией
MLIF предоставляет метамодель, которая в сочетании с выбранными категориями данных образует
надѐжную основу для обеспечения надлежащего взаимодействия между несколькими многоязыковыми
приложениями в рамках работы с текстовыми корпусами. При этом MLIF обеспечивает работу с
многоязыковыми корпусами, многоязычными фрагментами и отношениями, характеризующими
перевод с одного языка на другой. В любой сфере применимости MLIF для целей сегментирования и
описания текстов может выбираться определѐнный уровень разбиения текстовой информации. В этой
части процессы сегментирования и описания могут основываться на использовании MAF [ISO 24611],
SynAF [ISO 24615] и структуры терминологической разметки (TMF) для морфологического описания,
синтаксического аннотирования и терминологического описания, соответственно.
MLIF поддерживает процессы разработки и взаимодействия ресурсов памяти переводов и процедур
локализации, а также работу с описанием метамодели в части обработки еѐ многоязычного контента.
MLIF не предоставляет исчерпывающего списка характеристик используемых описаний, а вместо этого
даѐт перечень категорий данных, который гораздо более удобен для обновления и расширения. Этот
перечень является отправной точкой для обработки многоязычной информации в контексте различных
сценариев, реализуемых приложениями.
Однако MLIF не только описывает элементарные лингвистические сегменты (например, предложение,
синтаксический фрагмент, слово и часть речи), но может также использоваться для представления
структуры документа (например, заголовка, резюме, абзаца и раздела). Кроме того, MLIF допускает
установление внешних и внутренних связей (аннотаций и ссылок).
MLIF предназначается для создания общей основы, облегчающей работу с такими форматами, как
TMX (LISA OSCAR) и XLIFF (OASIS). MLIF может рассматриваться как родительский узел этих
форматов, поскольку оба они относятся к многоязычным данным, выраженным в форме сегментов или
текстовых единиц. Оба этих формата могут храниться, использоваться и преобразовываться
одинаковым образом.
Примеры использования MLIF приведены в Приложениях от A до F.
Приложение A
(информативное)
Пример использования MLIF для автоматизированного перевода
Главная цель использования таких структур, как лемма, часть речи и морфологические элементы
состоит в том, чтобы придать инструментальным средствам автоматизации перевода (CAT), основой
которых является память переводов, способность к выполнению перевода новых слов и предложений,
которые не содержатся в базе данных автоматизированной системы перевода.
1 )
Например, такая современная система памяти переводов, как SDL TRADOS , в которой будут
записаны английское предложение "The meal is nice" (“Эта еда великолепна”) и его перевод на
французский язык "Le repas est bon", не способна будет дать очевидный перевод предложения "The
meals are nice", несмотря на то, что текстовые леммы "The meal is nice" и "The meals are nice"
полностью совпадают. Причина такой слабости кроется в том факте, что в данной системе
автоматизации в процессе перевода задействовано недостаточное число лингвистических критериев.
В рассматриваемом случае данные, формируемые модулем TRADOS Translator's Workbench, выглядят
следующим образом:
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 8 Build 863"
segtype="sentence"
o-tmf="TW4Win 2.0 Format"
adminlang="EN-US"
srclang="EN-GB"
datatype="rtf"
creationdate="20100528T144322Z"
creationid="USER"/>
The meal is nice.
Le repas est bon.
Для обеспечения перевода предложения "The meals are nice", MLIF-совместимое инструментальное
средство должно было бы реализовать следующую процедуру:
Шаг 1 Представить в рамках MLIF с добавлением лингвистических характеристик все слова,
хранящиеся в памяти переводов.
1)
Система SDL TRADOS Translator's Workbench взята как подходящий для примера коммерческий программный
продукт, широко доступный для приобретения. Информация приведена здесь для удобства пользователей
настоящего Международного стандарта и не должна рассматриваться как одобрение указанной системы со
стороны ISO.
ISO 2012 – Все права сохраняются
Шаг 2 Пропустить предложение через программу частеречной разметки для получения правильных
морфосинтаксических категорий слов.
Шаг 3 Осуществить перевод лемм с использованием двуязычного англо-французского словаря.
Шаг 4 Обратиться к французскому словарю форм склонения для выбора правильной словоформы
по заданной лемме и морфологическим признакам.
Шаг 5 Сформировать переводной эквивалент фразы "The meals are nice" путѐм замены каждого
английского слова его французской формой склонения следующим образом:
"The meals are nice." => "Les repas sont bons."
Данные на языке XML должны включать в себя объявление признаковой структуры, определяющее
набор тегов (например для "nS"), и сегментацию слов с использованием набора тегов, определѐнного
в рамках MAF:
SEMMAR
20090922T140653Z
The meal is nice.
Le repas est bon.
The
class="word"
lemma="meal"
pos="commonNoun"
tag="#nS">meal
class="word"
lemma="be"
pos="verb"
tag="#mP #p1 #nS">is
nice
.
class="word"
lemma="le"
pos="definiteArticle"
tag="#gM #nS">Le
class="word"
lemma="repas"
pos="commonNoun"
tag="#gM #nS">repas
class="word"
lemma="être"
pos="verb"
tag="#mP #p1 #nS">est
class="word"
lemma="bon"
pos="qualifierAdjective"
tag="#gM #nS">bon
.
ISO 2012 – Все права сохраняются
Приложение B
(информативное)
Пример: представление данных стандарта TMX
B.1 Вводные замечания
TMX (Translation Memory eXchange / информационный обмен в памяти переводов) – это название
инвариантного к поставщику переводческих услуг открытого стандарта XML, определяющего обмен
данными памяти переводов (TM), создаваемыми в рамках инструментальных средств
автоматизированного перевода и локализации. Целью TMX является упрощение процедур обмена
данными между инструментальными системами и/или поставщиками переводческих услуг без потери
либо с минимальной потерей критически важных данных в процессе выполнения перевода.
Аттестованный стандартный формат TMX присутствует на рынке систем программного обеспечения с
1998 года. Он разработан и поддерживается Рабочей группой по открытым стандартам контента
повторного использования OSCAR (Open Standards for Container/Content Allowing Re-use)
Международной ассоциации отраслевых стандартов локализации LISA (Localisation Industry Standards
Association).
B.2 Отображение формата TMX на MLIF
Структура формата TMX почти изоморфна структуре метамодели MLIF. Отображение макроструктуры
TMX на MLIF реализуется следующим образом:
элемент отображается на компонент ;
элемент
элемент - контейнер элемента - отображается на компонент ;
элемент отображается на компонент ;
элемент отображается на компонент ;
элемент отображается на компонент ;
элемент , указывающий тип термина, отображается на элемент для типа термина.
Прочие элементы и атрибуты TMX отображаются на элементы MLIF как указано ниже:
атрибут "creationtool" отображается на элемент ;
атрибут "creationdate" отображается на элемент element;
атрибут "tuid" отображается на элемент в рамках компонента MultiC.
элемент не отображается ни на какой другой конкретный элемент, поскольку он является
универсальным заполнителем, указывающим местоположение данных, зависящих от конкретного
приложения; в случае его применения конкретный элемент явным образом отображается
на элементы MLIF или на стандартизованные категории данных ISO/TC 37, доступные в каталоге
ISOCat.
B.3 Пример данных
Приведѐнный ниже пример, основанный на TMX версии 1.4, охватывает многоязычные
лингвистические единицы документа в формате TMX и не передаѐт всех подробностей заголовка.
adminlang="en"
creationdate="20040731T164933Z"
creationtool="Heartsome TM Server"
creationtoolversion="1.0.1"
datatype="xml"
o-tmf="unknown"
segtype="block"
srclang="*all*"/>
Le processus de contrôle de
qualité en dix étapes qu'il a créé il y a plus
de 1300 ans est beaucoup plus complet et précis que ceux
existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300 years
ago is far more thorough and exacting than any existing
today.
El proceso de control de
calidad en diez pasos que inició hace más de
1300 años es mucho más completo y preciso que los que
existen en la actualidad.
Il suo metodo di controllo di qualità in 10 fasi risale a più
di 1300 anni fa ed è molto più accurato e preciso di
qualsiasi metodo attuale.
그가 1300여년 전 시작한 10단계 품질
관리 방법은 현존하는 것보다 훨씬 더 철저하고 정확하다.
Соответствующее стандартное представление в MLIF будет иметь вид:
TMX
1.4
20040731T164933Z
Heartsome TM Server
1.0.1
ISO 2012 – Все права сохраняются
1091303313515
20020930T004233Z
Le processus de contrôle
de qualité en dix étapes qu'il a créé il y a
plus de 1300 ans est beaucoup plus complet et précis que
ceux existant aujourd'hui.
His 10-stage quality
control process initiated more than 1300
years ago is far more thorough and exacting than any
existing today.
B.4 Пример взаимодействия TMX и MLIF
Рисунок B.1 иллюстрирует процесс взаимодействия между форматами TMX и MLIF. Этот процесс
состоит из последовательных этапов извлечения информации, еѐ перевода и слияния и начинается с
чтения документа TMX, содержащего лингвистический контент на английском (en) и немецком (de)
языках. На этапе извлечения (1) порождается так называемый эскизный файл (“Skeleton File”) (2), в
котором присутствует вся информация о форматировании TM, и лингвистический контент документа
MLIF (3), в котором хранится только релевантная лингвистическая информация. Поскольку
большинство переводчиков («живых» или представляемых автоматическими программными
модулями), работает с программными средствами, ориентированными на формат TMX, таблица
стилей XSL обеспечивает возможность преобразования документа MLIF в документ TMX. This file does
not contain any formatting information. Как только переводчик добавил соответствующий перевод на
японском языке (ja), другая таблица стилей XSL трансформирует документ TMX в документ MLIF (4).
На последнем этапе новый документ MLIF (содержащий перевод на японский язык) объединяется с
эскизным файлом (“Skeleton File”) для создания нового документа в формате TMX (5).
Рисунок B.1 — Схема взаимодействия TMX и MLIF
Приложение C
(информативное)
Пример представления данных в формате XLIFF
С.1 Вводное замечание
Целевое назначение XLIFF состоит в том, чтобы определять и обеспечивать использование
спецификации обмена локализуемыми программными и документальными объектами и относящимися
к ним метаданными.
С.2 Отображение XLIFF на MLIF
Формат XLIFF отличается от формата метамодели MLIF тем, что в нѐм в рамках одноязычной
информации проводится чѐткое различие между исходным языком и целевым языком. Это
обеспечивается соответствующим использованием в компоненте категории данных
совместно с объявлениями языка ( и ) в
компоненте .
Ключевые элементы макроструктуры XLIFF отображаются на MLIF следующим образом:
элемент отображается на компонент ;
элемент
элемент - контейнер элемента - отображается на компонент ;
элемент отображается на компонент ;
элемент отображается на компонент ;
элемент
элемента в компоненте ; при этом соответствующее текстовое
содержание помещается в элемент ;
элемент отображается на компонент и одновременно устанавливает значение
элемента как ; при этом соответствующее текстовое
содержание помещается в элемент ;
элемент отображается на компонент и одновременно устанавливает значение
элемента как переменное.
Прочие элементы и атрибуты XLIFF отображаются на элементы MLIF как указано ниже:
атрибут инструментария XLIFF отображается на элемент .
ISO 2012 – Все права сохраняются
С.3 Пример данных
Приведѐнный ниже пример основан на XLIFF версии 1.2 и касается двуязычной части
документа XLIFF:
xmlns="urn:oasis:names:tc:xliff:document:1.2"
version="1.2"
xml:lang="en"
xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-schema-1.2.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
source-language="en"
target-language="fr"
datatype="winres"
original="Sample1.rc">
restype="dialog"
resname="IDD_DIALOG1"
coord="0;0;186;57"
font="MS Sans Serif;8">
id="1" restype="caption">
xml:lang="en">Title
xml:lang="fr">Titre
id="2"
restype="label"
resname="IDC_STATIC"
coord="8;4;19;8">
xml:lang="en">Path
xml:lang="fr">Chemin
id="3"
restype="check"
resname="IDC_CHECK1"
coord="8;40;41;10">
xml:lang="en">Validate
xml:lang="fr">Valider
id="4"
restype="button"
resname="IDOK"
coord="129;7;50;14">
xml:lang="en">OK
xml:lang="fr">OK
id="5"
restype="button"
resname="IDCANCEL"
coord="129;24;50;14">
xml:lang="en">Cancel
Annuler
Соответствующее стандартное представление MLIF имеет вид:
XLIFF
1.2
en
fr
file
body
sourceLanguage
Title
targetLanguage
Titre
sourceLanguage
Path
targetLanguage
Chemin
sourceLanguage
Validate
targetLanguage
Valider
ISO 2012 – Все права сохраняются
sourceLanguage
OK
targetLanguage
OK
sourceLanguage
Cancel
targetLanguage
Annuler
Приложение D
(информативное)
Пример представления данных smilText
D.1 Вводное замечание
В рамках рекомендации Консорциума W3C, касающейся использования языка SMIL версии 3.0
(http://www.w3.org/TR/2008/REC-SMIL3-20081201), модули smilText в этой языковой среде служат
контейнерными элементами для текстовой информации, оснащенными представленной в явной
форме моделью еѐ содержания, которая обеспечивает определение синхронизируемого текста
(http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html). Модуль smilText играет важную роль в
прикладном контексте MLIF, так как он связывает и синхронизирует мультимедийный контент с
текстовым.
D.2 Использование универсальных атрибутов SMIL в рамках MLIF
В соответствии с рекомендацией консорциума всемирной паутины, общие механизмы синхронизации в
языке интеграции синхронизированных мультимедийных данных SMIL (Synchronized Multimedia
Integration Language) могут использоваться в MLIF-совместимой содержательной информации для
обеспечения механизмов синхронизации текстуального контента. Для этого в общую спецификацию
MLIF вводятся такие элементы языка SMIL, как "begin", "next" и "dur".
D.3 Упрощѐнное отображение одноязычного контента
Основным примером совместного использования MLIF и SMIL является формирование одноязычного
выхода SMIL из многоязычного представления информации в MLIF-совместимом формате. Такой
выход формируется в результате выбора содержания, соответствующего конкретному языку, и его
интегрирования в один или несколько контейнерных элементов , - например, через
вложение в конструкцию . При наличии такой возможности существующая информация,
касающаяся синхронизации, передаѐтся в многоязычное представление на языке SMIL.
В этом контексте ключевую роль играют следующие преобразования между средой MLIF и
спецификацией smilText:
элементы отображаются на элементы вместе с соответствующими
атрибутами (в частности, языковыми);
элементы отображаются на элементы вместе с соответствующими дескрипторами
(в частности, временными).
Реализация вложения многоязычного контента в презентацию на языке SMIL основывается на
использовании конструкций в эскизном представлении, показанном ниже:
xml:id="TE30"
region="Contents"
dur="12s"
its:dir="ltr"
ISO 2012 – Все права сохраняются
xml:lang="en"
its:translate="yes"> This is a sentence.
xml:id="TF30"
region="Contents"
dur="12s"
its:dir="ltr"
xml:lang="fr"
its:translate="yes">Ceci est une
phrase.
Другие не привязанные ко времени атрибуты, такие как местоположение в тексте, не охватываются
спецификацией MLIF и потому должны создаваться отдельно от MLIF-совместимой структуры.
Показанное выше отображение может использоваться и в обратном направлении – для формирования
MLIF-совместимого контента из представления на языке SMIL. Типичным примером такого
использования является построение MLIF-совместимой структуры, которая впоследствии будет
включать в себя дальнейшие переводы.
D.4 Отображение элементов smilText на элементы MLIF
2)
Основные элементы smilText отображаются на элементы MLIF следующим образом :
элемент функционирует как инструмент логического и временного структурирования,
который позволяет включать внутристрочный текст в презентации на языке SMIL. Модуль smilText
может также использоваться как внешний автономный синхронизированный текстовый формат.
Это достигается при использовании профиля smilText, соответствующего третьей версии языка
SMIL;
элемент определяет конкретный "период времени" в рамках контента модуля smilText; в
зависимости от значений начального или последующих атрибутов этот элемент определяет
запланированное время, в течение которого должен отображаться ассоциируемый с ним
текстовый контент (вплоть до появления замыкающего элемента или либо признака
конца элемента smilText);
элемент mapping to , the элемент определяет "временной промежуток" внутри
содержательного блока smilText, когда стирается всѐ содержимое отображаемой области.
Другие отображения атрибутов SMIL выполняются так:
атрибут "dur" отображается на элемент ;
атрибут "begin" отображается на элемент ;
атрибут "next" отображается на элемент .
2) Эти определения взяты из рекомендации W3C SMIL:
http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-text.html.
Приложение E
(информативное)
Пример использования MLIF для создания субтитров
E.1 Вводные замечания
Субтитры – это текстуальные версии диалога в фильмах, телепрограммах, видеоиграх и т.п., обычно
отображаемые в нижней части экрана. Они могут воспроизводиться как запись диалога на том же
самом или на другом языке и могут содержать дополнительную информацию для помощи глухим или
плохо слышащим зрителям в восприятии диалога [SUB].
Профессионалы субтитрования обычно работают со специализированным программным
обеспечением и техническими средствами, в которых видеоинформация хранится в цифровой форме,
чем обеспечивается мгновенный доступ к каждому отдельному кадру. В дополнение к созданию
субтитров такой специалист обычно точно определяет места, в которых каждый субтитр должен
появляться и исчезать. В случае кинофильма эта задача традиционно решается отдельными
техническими специалистами. Конечным результатом их работы является файл субтитров,
содержащий как сами субтитры, так и позиционные маркеры, указывающие, когда каждый субтитр
появляется и исчезает. Такие маркеры обычно устанавливаются с помощью временного кода, если
обрабатывается электронный носитель (например, в телевидении, видеокассетах и цифровых
видеодисках), или по длине плѐнки (измеряемой в футах и кадрах), если субтитры должны
использоваться применительно к традиционному кинофильму.
E.2 Использование MLIF для представления информации субтитров
Для создания субтитров могут использоваться несколько форматов. Некоторые из них являются
стандартами “де-юре” (например, формат MPEG-4 TT), тогда как другие, хотя они и не признаны
стандартами “де-юре”, широко используются большим числом людей во всѐм мире (как, например,
формат SRT программы SubRip). Формат SRT, вероятно, является наиболее популярным для файлов
субтитров.
Все форматы субтитров должны обеспечивать возможность синхронизации видеокадров с субтитрами,
из чего следует необходимость привязки временных маркеров к текстовой информации.
Ниже показан очень короткий фрагмент файла субтитров с расширением SRT:
Фрагмент-1:
00:00:20,000 --> 00:00:24,400
Субтитр номер один…
00:00:24,600 --> 00:00:27,800
Субтитр номер два…
В настоящем Приложении показано, каким образом MLIF может использоваться для создания
субтитров. Фрагмент-2 и Фрагмент-3, приведѐнные ниже, построены в соответствии с самой последней
спецификацией SMIL применительно к конкретному модулю smilText.
Использование MLIF для работы с многоязычными субтитрами предельно просто. Чтобы получить
SRT-файлы, достаточно провести грамматический разбор любого из представленных документов MLIF.
ISO 2012 – Все права сохраняются
Однако в зависимости от сценария (или техноло
...






















Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...