Information technology — System Process and Architecture for Multilingual Semantic Reverse Query Expansion

ISO/IEC TR 29127:2011 identifies an example of a system based process to index, query, translate, and manage components used in querying and translating documents in multiple foreign languages, enabling learners in learning, education, and training areas to effectively find and share documents on a global scale.

Technologies de l'information — Processus système et architecture pour l'extension multilinguale des requêtes sémantiques inverses

General Information

Status
Withdrawn
Publication Date
28-Jun-2011
Withdrawal Date
28-Jun-2011
Current Stage
9599 - Withdrawal of International Standard
Completion Date
16-Sep-2021
Ref Project

Buy Standard

Technical report
ISO/IEC TR 29127:2011 - Information technology -- System Process and Architecture for Multilingual Semantic Reverse Query Expansion
English language
31 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

TECHNICAL ISO/IEC
REPORT TR
29127
First edition
2011-07-01

Information technology — System
Process and Architecture for Multilingual
Semantic Reverse Query Expansion
Technologies de l'information — Processus système et architecture
pour l'extension multilinguale des requêtes sémantiques inverses




Reference number
ISO/IEC TR 29127:2011(E)
©
ISO/IEC 2011

---------------------- Page: 1 ----------------------
ISO/IEC TR 29127:2011(E)

COPYRIGHT PROTECTED DOCUMENT


©  ISO/IEC 2011
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2011 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC TR 29127:2011(E)
Contents Page
Foreword . iv
Introduction . v
1  Scope . 1
2  Terms and definitions . 1
3  Example SRQE Implementation . 2
3.1  Initialization of the User Interface . 3
3.2  Select Query Parameters . 4
3.3  Select Word Senses . 6
3.4  Selecting and Translating Appropriate Terms . 8
3.5  Selecting Appropriate Translations and Executing the Query . 10
3.6  Query Returns . 11
4  Components and Architecture of the SRQE Process . 14
4.1  SRQE Process Flow . 15
4.2  Repositories . 15
4.3  Terms and Results . 16
4.4  Translators . 16
4.5  Entity Extraction . 17
4.6  Terminology for Query Searches . 17
Annex A (informative) Potential Linkage to Current and Future ISO/IEC JTC1 SC 36 Technology
Areas . 18
Annex B (informative) Patent Declaration Form for SRQE Process . 20
Annex C (informative) Summary on the Issue of Language Equivalencies . 22
Bibliography . 30

© ISO/IEC 2011 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC TR 29127:2011(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
In exceptional circumstances, when the joint technical committee has collected data of a different kind from
that which is normally published as an International Standard (“state of the art”, for example), it may decide to
publish a Technical Report. A Technical Report is entirely informative in nature and shall be subject to review
every five years in the same manner as an International Standard.
ISO/IEC TR 29127 was prepared by Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 36, Information technology for learning, education and training.

iv © ISO/IEC 2011 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC TR 29127:2011(E)
Introduction
Learning, Education and Training (LET) in the context of multilingual cultures on a local and global scale can
be problematic, especially when learners are proficient in only one language. One of the multilingual problems
in a LET environment is how to query LET materials when the requestor cannot understand or is not proficient
in the language of the material available.
For example, how does a person who is proficient in French search for, find, and readily understand digital
LET materials in Arabic, if the person is not proficient in Arabic? One solution can be found in a process called
the Semantic Reverse Query Expander (SRQE). Based on components such as language ontologies, the
1)
SRQE process utilizes Java 2 Platform, Enterprise Edition (J2EE) (J2EE) services that can take a term in
one language (source language), expand the term conceptually, translate the expanded terms (into a target
language), and perform a query on a targeted foreign language document set. Returns are translated into the
language of the requestor. This Technical Report identifies an existing process and architecture used to query
foreign language text files.
Technologies and ontologies (i.e. thesauri) for undertaking this kind of matching and expansion operation
have been available for some time (e.g. the work of CYC Corp, Global WordNet, Global WordGrid). Valuable
lessons have been learned about what such technologies can and cannot accomplish. This Technical Report
does not discuss these pre-existing technologies, or describe the improvement or change that the proposed
process presented might represent. A particular approach (theory and practice) with respect to the context of
difficulties experienced in regard to multilingual equivalencies and translation are presented in Annex C of this
Technical Report.
In Clause 3 of this Technical Report, an implementation of the SRQE process is described in a web
environment to help clarify the architecture described in Clause 4. Annex A contains possible linkages to
ISO/IEC JTC 1, SC 36 projects and future areas of study.
The International Organization for Standardization (ISO) and the International Electrotechnical Commission
(IEC) draw attention to the fact that it is claimed that the process described in this Technical Report may
involve the use of patents. ISO and IEC take no position concerning the evidence, validity and scope of these
patent rights.
The holders of these patent rights have assured ISO and the IEC that they are willing to grant a free of charge
license to an unrestricted number of applicants on a worldwide, non-discriminatory basis and under other
reasonable terms and conditions to make, use, and sell implementations of the process contained in this
Technical Report. In this respect, the statements of the holders of these patent rights are registered with ISO
and IEC. Information may be obtained from the companies listed below.

Raytheon Company
Phillip Berestecki
Intellectual Property and Licensing
870 Winter Street
Waltham, Massachusetts 02451-1449
USA

NOTE 1 This Technical Report refers to one particular process or approach for performing reverse semantic queries;
there are other approaches and processes that could be developed for these same purposes.
NOTE 2 The process is not dependent on particular database software, protocols, or data sets. Specific components
used in the process are an implementation decision.

1) A widely used platform for server programming in the Java programming language.
© ISO/IEC 2011 – All rights reserved v

---------------------- Page: 5 ----------------------
TECHNICAL REPORT ISO/IEC TR 29127:2011(E)

Information technology — System Process and Architecture for
Multilingual Semantic Reverse Query Expansion
1 Scope
This Technical Report identifies an example of a system-based process to index, query, translate, and
manage components used in querying and translating documents in multiple foreign languages, enabling
learners in learning, education, and training areas to effectively find and share documents on a global scale.
2 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
NOTE For this Technical Report, the following terms and definitions are not considered to be normative. They are
informative, and apply only within the context of this Technical Report.
2.1
coordinate term
words that have the same hypernym
EXAMPLE Boat, yacht, and shrimper, all have the same hypernym, ship.
NOTE Adapted from ISO 1087-1:2000, definition 3.2.19.
2.2
entity extraction
process that seeks to locate, classify, and tag atomic elements in text into predefined categories
EXAMPLE Names of persons, organizations, locations, expressions of times, quantities, monetary values,
percentages, etc.
2.3
hypernym
superordinate concept
word that is more generic or broad than another given word
NOTE 1 Another term for a hypernym is a superordinate concept.
NOTE 2 Adapted from ISO 1087-1:2000, definition 3.2.13.
2.4
hyponym
subordinate concept
word that is more specific than another given term
NOTE 1 Another term for hyponym is a subordinate concept.
NOTE 2 Adapted from ISO 1087-1:2000, definition 3.2.14.
© ISO/IEC 2011 – All rights reserved 1

---------------------- Page: 6 ----------------------
ISO/IEC TR 29127:2011(E)
2.5
Java servlet
Java programming language objects that dynamically process requests and construct responses
2.6
meronym
constituent part of, or a member of, something
EXAMPLE “Winchester Cathedral” is a meronym of “Church of England”.
2.7
nominalization
use of a verb or an adjective as a noun with or without morphological transformation, so that the word can now
act as the head of a noun phrase
2.8
word sense
linguistics one of the meanings of a word
NOTE A dictionary may have over 50 different meanings of the word “play”, with each of these having a different
meaning based on the context of the word usage in a sentence.
EXAMPLE We went to see the play Romeo and Juliet at the theater.
The children went out to play in the park.
3 Example SRQE Implementation
This clause provides an implemented example of the SRQE process. The simplistic example of the SRQE
implementation illustrates a sequence of actions between a learner and the system. The example provided is
based on a possible learning assignment made to a learner in producing a report on trucks using foreign
language resources. A learner wanting international information in producing a report on trucks could use the
system to gather international information related to trucks for possible inclusion in the report. The system can
perform a cross lingual query on documents in a number of languages, and translate the documents into the
learner’s native language. The learner can optionally access a map of the locations listed in the text files
returned by the query for improved comprehension of where the article originates from, or the location of
where the subject of the article can be found.
In this example, the SRQE process utilizes a user interface, interacting with Java servlets to provide a web
based User Interface (UI). The SRQE process flow is illustrated below in Figure 1. The SRQE process is a
human-machine interactive process to perform cross-lingual queries. Human inputs are shown in the white
box above the green arrow. Java servlet functions are shown below the green arrow. A UML diagram showing
the Java servlets in a green box are linked to functions in the white box.

Figure 1 — SRQE Process Flow
2 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/IEC TR 29127:2011(E)
3.1 Initialization of the User Interface
The user interface application is accessed by URL. The GetMenuBarDataService initializes the user interface
application providing the repositories available and in what languages the repositories are in. The user
interface application is shown in Figure 2. The GetMenuBarDataService builds the menus and menu choices
in the user interface. This includes 1. Enter word to be translated, 2. Select Class, 3. Select Language(s), and
4. Select Sources, 5. Execute Query, the results section (blank), and the Map.

Figure 2 — Initialized User Interface
NOTE 1 In this example, an Adobe Flash plug-in is required for the use of the SRQE in a browser. The user interface
described in the example is based on Adobe Flex.
NOTE 2 The original example uses CaMel CaSe format in this instance and the instances that follow. This formatting is
retained for this reason.
© ISO/IEC 2011 – All rights reserved 3

---------------------- Page: 8 ----------------------
ISO/IEC TR 29127:2011(E)
3.2 Select Query Parameters
The learner enters a word in “1. Enter word to be translated”. The learner selects a class of the word in “2.
Select Class”. The learner selects a language in “3. Select Language(s)”. The learner selects a source in “4.
Select Source(s)”. In this example, the learner enters Truck in “1. Enter word to be translated”. The learner
clicks on Noun in “2. Select Class”. The learner selects Arabic in “3. Select Language(s)”. The learner selects
Linguistic Data Consortium as the source in “4. Select Sources”. Figure 3 shows menu items 1 through 4 filled
in by the learner.

Figure 3 — Learner Inputs Menu Items 1-4
4 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC TR 29127:2011(E)
The learner selects Define Term in Item 5 “Execute query” menu. The GetSensesService retrieves the noun
word senses for Truck, in this example, word senses are retrieved by GetSensesService from WordNet
utilizing the Java WordNet Library (JWNL). The GetSensesService returns noun word senses for truck. The
word senses for Truck is displayed in the return section of the interface. Figure 4 shows the noun word senses
for Truck returned by the GetSensesService.

Figure 4 — Word Senses for Truck
© ISO/IEC 2011 – All rights reserved 5

---------------------- Page: 10 ----------------------
ISO/IEC TR 29127:2011(E)
3.3 Select Word Senses
The learner selects the relevant word sense for Truck. Figure 5 shows the word sense selected by the learner
“an automotive vehicle suitable for hauling”.

Figure 5 — Word Sense Selected
6 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC TR 29127:2011(E)
The learner selects “Expand Term” at the far right of the word sense display. The GetNymService retrieves
coordinate terms, hyponyms, nominalizations, hypernyms, and meronyms for the word sense selected. In this
example, the GetNymService retrieves expanded terms from WordNet utilizing the JWNL. Figure 6 shows the
expanded terms returned from the GetNymService.

Figure 6 — Expanded Terms List
© ISO/IEC 2011 – All rights reserved 7

---------------------- Page: 12 ----------------------
ISO/IEC TR 29127:2011(E)
3.4 Selecting and Translating Appropriate Terms
The learner selects the terms of interest for translation and reverse translation. Terms selected are shown in
Figure 7.

Figure 7 — Terms Selected
8 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC TR 29127:2011(E)
The learner selects “Translate Terms” above and to the right of the part meronyms list. The
GetTranslatedWordService sends the terms to the appropriate translator. In this example, the terms are sent
to an Arabic word translator (LanguageWeaver). The terms translated into Arabic are reverse translated into
the original language, in this example English. The translated and reverse translated terms are returned to the
GetTranslatedWordService for display to the learner. Figure 8 shows the translated and reverse translated
terms. The terms on the far left are the terms selected from the “Expanded Term” page. The Arabic terms in
the middle are the Arabic translations of the terms on the left. The terms on the right are the reverse
translations of the Arabic terms.

Figure 8 — Translated and Reverse Translated Terms
© ISO/IEC 2011 – All rights reserved 9

---------------------- Page: 14 ----------------------
ISO/IEC TR 29127:2011(E)
3.5 Selecting Appropriate Translations and Executing the Query
The learner selects the appropriate terms to use in the query. Please note there are errors in the translation of
some of the terms. For example, “pickup” a type of truck was translated into Arabic as “taken”. This is an
obvious miss-translation of what was semantically intended by the learner. The learner does not select this
term for use in the query, as it would return information not useful to the learner. Also note the term “van” was
translated into Arabic as “the”. This also is not the semantic intent of the learner for the term “van”. The learner
does not select this term for use in the query, as it would return information not useful to the learner. Figure 9
shows the terms selected by the learner for the query.

Figure 9 — Terms Selected for Query
The learner executes the query by selecting “Search Now” to the right of the translated and reverse translated
terms. In this example, the SearchService places the Arabic terms into an SQL statement and sends the
query to the source, in this example, the Linguistic Data Consortium (LDC) source. The LDC source consists
of over 10,000 Arabic documents with a wide variety of topics.
10 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 15 ----------------------
ISO/IEC TR 29127:2011(E)
3.6 Query Returns
The query returns are sent to extractors where summary information is obtained and then translated into the
learner’s language, in this example, English. The query return summary information is returned by the
SearchService for display in the interface. Return summaries are displayed in the left side of the interface,
ranked by return relevancy. Figure 10 shows the query returns ranked by relevancy.

Figure 10 — Query Return Summaries
© ISO/IEC 2011 – All rights reserved 11

---------------------- Page: 16 ----------------------
ISO/IEC TR 29127:2011(E)
The learner selects a summary to retrieve the entire document. When the learner selects the appropriate
summary, the GetSearchResultTextService, retrieves the entire Arabic document from the source, translates
the document, and in this example, extracts additional information such as location data. The
GetSearchResultTextService returns the Arabic document, and in this example, the English translation for
display in the interface. The Arabic document the query was performed on is displayed on the left side of the
document return section. The English translation is displayed on the right side of the display. Figure 11 shows
the Arabic document and the English translation of the Arabic document.

Figure 11 — Arabic Document and English Translation
12 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 17 ----------------------
ISO/IEC TR 29127:2011(E)
In this example, character based locations are extracted from the Arabic document. The locations are
converted to geolocation coordinates. In this example, the coordinates are sent to Yahoo maps for display
purposes. The learner can view the map and plotted locations by selecting “Show Map” located above the
English translation. Locations mentioned in the Arabic document are displayed on the map. The learner can
mouse over the plot points to view the location name. Figure 12 shows the map with locations from the Arabic
document plotted on the map. This facilities improved comprehension of the article’s place of origin or the
location where the subject of the article can be found.

Figure 12 — Map With Plotted Locations
© ISO/IEC 2011 – All rights reserved 13

---------------------- Page: 18 ----------------------
ISO/IEC TR 29127:2011(E)
4 Components and Architecture of the SRQE Process
This clause contains the basic process diagram of the SRQE process and the J2EE architecture with the Java
servlets and objects shown in a UML diagram format. The SRQE architecture is shown in Figure 13. Each of
the architecture components are described in context to the SRQE process flow.

Figure 13 — SRQE Process Architecture
14 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 19 ----------------------
ISO/IEC TR 29127:2011(E)
4.1 SRQE Process Flow
The SRQE process is a human-machine interactive process to perform cross-lingual queries. Figure 14
describes the SRQE process. Human inputs are shown in the white box above the green arrow. Java servlet
functions are shown below the green arrow. A UML diagram showing the Java servlets is in a green box
linked to functions in the white box.

Bild 14 — SRQE Process
4.2 Repositories
A repository in the SRQE prototype is a generic concept for any service that contains indexed data that can be
searched and retrieved by the SRQE prototype. A current example of a foreign language text document
repository is an Oracle database utilizing Oracle Text.
Repositories are accessed using the ArticleRepository interface. ArticleRepository objects are created in a
generic manner using the ArticleRepositoryFactory. Each repository provides access to any number of data
sources (data sources refer to the original source of the data before the repository indexed it) and also
provides the languages contained in those data sources. The ArticleRepository interface and the
ArticleRepositoryFactory are shown in Figure 15.

Figure 15 — ArticleRepository Interface and the ArticleRepository Factory
The ArticleRepository interface and the ArticleRepositoryFactory have two main functions. One function is to
provide the UI with the sources and languages available for query for display to the learner. The
GetMenuBarDataService uses the ArticleRepository interface and the ArticleRepositoryFactory. The
repositories also provide the functionality to search the repositories based on learner approved terms and
retrieve data contained in the repositories. The SearchService uses ArticleRepository interface and the
ArticleRepositoryFactory.
© ISO/IEC 2011 – All rights reserved 15

---------------------- Page: 20 ----------------------
ISO/IEC TR 29127:2011(E)
Repositories need to have a specific ArticleRepository class written. Repositories can be implemented for use
by modifying the search type mapping file used by the ArticleRepositoryFactory to construct the appropriate
ArticleRepository object for the search type being performed.
4.3 Terms and Results
SRQE searches are based on the premise that given a list of expanded and translated terms, a list of search
results can be obtained. The Term object encapsulates the original expanded term, the translation of that
term, and the reverse translation of that term. The Term object is used by the GetSenses, GetNymsService,
GetTranslatedWordService and the SearchService.
The search results can be of any SearchResult type, depending on the type of data that matches the search
terms in the repositories available to be searched. The SearchResult object is used by the SearchService and
the GetSearchResultTextService. SearchResult object and Term object are shown in Figure 16.

Figure 16 — SearchResult Object and Term Object
4.4 Translators
The concept of a translator is treated as a generic interface in the SRQE process architecture. Based on the
original language and the target language, the appropriate translator object is constructed by the
TranslatorFactory.
New translation tools can easily be added by writing a specific Translator class for the TranslatorFactory
object. The language pair mapping file used by TranslatorFactory should be modified to use the new
Translator class for all language pairs that are best handled by the new translation tool.
Both the TranslatorFactory object and the created Translator object created by the TranslatorFactory object,
are used by the GetTranslationWordService, SearchService, and GetSearchResultTextService. The
TranslatorFactory object and Translator object is shown in Figure 17.

Figure 17 — TranslatorFactory Object and Translator Object
16 © ISO/IEC 2011 – All rights reserved

---------------------- Page: 21 ----------------------
ISO/IEC TR 29127:2011(E)
4.5 Entity Extraction
The SRQE process has the ability to incorporate any entity extraction tool that exposes an external API. The
EntityExtractor class returns an extracted normalized Entity objects, regardless of the specific tool used to
extract those entities. The entities are extracted from the native language text, if a tool is available to do so.
Otherwise the translated text is used for extraction. The translated entities are then returned to the client and
displayed to the learner. The GetSearchResultTextService uses the EntityExtractor object. The EntityExtractor
along with sample extractors used in a prototype is shown in Figure 18.

Figure 18 — EntityExtractor
4.6 Terminology for Query Searches
The SRQE process uses a terminology ontology, dictionary, knowledge base etc. to expand and select terms
to include in a query. In most cross-lingual situations, there is often no word-to-word translation from one
language to another. Additionally, there are often semantic ambiguities to be resolved. In both cases, term
expansion helps. The SRQE process has the ability to incorporate any terminology repository that exposes an
external API. For example, in Clause 3, an example implementation of the SRQE utilizing Princeton University
WordNet is shown. In this example implementation, the Java WordNet Library (JWNL) is used by the
GetSensesService and GetNymsService, using the JWNL API. The JWNL object is shown in Figure 19.

Figure 19 — JWNL Object
© ISO/IEC 2011 – All rights reserved 17

---------------------- Page: 22 ----------------------
ISO/IEC TR 29127:2011(E)
Annex A
(informative)

Potential Linkage to Current and Future
ISO/IEC JTC1 SC 36 Technology Areas
The System Process and Architecture for Multilingual Semantic Reverse Query Expansion for LET
identified in this Technical Report is used to query LET materials when the requestor does not
understand or is not proficient in the language of the material available. It might be useful to identify
other possible uses for the process and architecture identified in this Technical Report. The below
matrix contains a list of potential linkages to current and future ISO/IEC JTC1 technology areas.
This list is not meant to be exhaustive or definitive as information technology for LET purposes will
evolve over time. The list is meant to be used as potential usage for the existing System Process
and Architecture for Multilingual Semantic Reverse Query Expansion for LET as identified by this
Technical Report. The Technical Report only identifies the existing process and architecture. The
Technology Areas listed below are not required to utilize or incorporate the process or architecture
contained in this Technical Report.
Table A.1 — ISO/IEC JTC1 SC36 Related Technology Areas
SRQE Processes/Services SC36 Technology Areas Linkage
Entire
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.