Language resource management - Semantic annotation framework (SemAF) - Part 10: Visual information

This document specifies an annotation language for visual information, based on VoxML (visual object concept structure modelling language), a modelling language for the visualizations of concepts and actions denoted by natural language (NL) expressions in three dimensions (3D).  
The specification of the VoxML-based annotation scheme conforms to the requirements given in ISO 24617-1, ISO 24617-7 and ISO 24617-14. The adoption of VoxML, specified in ISO 24617-14 as a semantic basis, is necessary for the 3D simulation and visualization of actions and motions taken by both human and artificial agents in real-life situations.

Gestion des ressources linguistiques - Cadre d'annotation sémantique — Partie 10: informations visuelles (VoxML)

Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 10. del: Vizualne informacije

Ta dokument določa jezik za označevanje vizualnih informacij, ki temelji na VoxML (jezik za modeliranje struktur konceptov vizualnih objektov), to je modelirnem jeziku za vizualizacije konceptov in dejanj, označenih z izrazi naravnega jezika (NL) v treh dimenzijah (3D).   Specifikacija sheme označevanja, ki temelji na jeziku VoxML, je skladna z zahtevami standardov ISO 24617-1, ISO 24617-7 in ISO 24617-14. Uporaba jezika VoxML, ki je v standardu ISO 24617-14 določen kot semantična osnova, je potrebna za 3D-simulacijo in vizualizacijo dejanj in gibov tako človeških kot umetnih agentov v resničnih situacijah.

General Information

Status
Published
Publication Date
15-May-2025
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
10-Apr-2025
Due Date
15-Jun-2025
Completion Date
16-May-2025

Overview

ISO 24617-10:2024 - Language resource management - Semantic annotation framework (SemAF) - Part 10: Visual information (VoxML) specifies an annotation language for visual information based on VoxML (visual object concept structure modelling language). The standard defines how to encode semantic knowledge for 3D visualizations of concepts, objects, actions and motions denoted by natural language (NL). It is intended to support robust 3D simulation and visualization of actions taken by human and artificial agents and to integrate with the broader SemAF family for time and spatial annotation.

Key topics and requirements

  • VoxML-based annotation scheme: A metamodel and concrete/abstract syntax for representing visual object concept structures (VoxML structures) in annotation formats.
  • Semantic building blocks: Formalization of objects, actions as programs, relations, properties (attributes), functions, and examples of voxemes (basic entries in the voxicon).
  • Basic semantic assumptions: Concepts such as habitats, affordances (Gibsonian and telic), qualia structures, and minimal embedding space (MES) are used to constrain model-theoretic interpretation.
  • Representation & syntax: Guidance on representation of VoxML structures, including concrete syntax for visual information markup (visML) and mapping to feature-structure representations (see ISO 24610-1).
  • Semantic interpretation: Specification of how VoxML enriches annotation semantics to enable 3D simulation, linking NL input to geometric and functional parameters.
  • Conformance: The VoxML annotation scheme conforms to requirements in ISO 24617-1, ISO 24617-7 and ISO 24617-14 for temporal and spatial semantics.

Applications

  • Multimodal simulation and visualization driven by natural language: generate 3D scenes from textual descriptions.
  • Human-computer interaction and conversational agents that require situated visual understanding.
  • AR/VR content generation and interactive training simulations where semantic behaviors and affordances must be modeled.
  • Robotics and embodied AI for planning and simulating object interactions (grasping, moving, using).
  • Corpus annotation and dataset creation for NLP, computer vision and multimodal research.
  • Tooling for annotation pipelines, visualization engines and semantic parsers that require standardized visual-semantic markup.

Who should use this standard

  • Computational linguists, NLP researchers and semantics engineers
  • Annotation tool and dataset developers
  • AR/VR/3D visualization and simulation platform developers
  • Robotics and embodied AI researchers integrating NL-driven behaviors
  • Standards bodies and interoperability architects working on multimodal systems

Related standards

  • ISO 24617-1 (SemAF - Time and events)
  • ISO 24617-7 (SemAF - Spatial information)
  • ISO 24617-14 (SemAF - Spatial semantics; specifies VoxML as a semantic basis)
  • ISO 24610-1 (Feature structure representation)

Keywords: ISO 24617-10:2024, SemAF, VoxML, visual information, semantic annotation, 3D simulation, voxeme, voxicon, affordance, habitat, natural language visualization.

Standard

SIST ISO 24617-10:2025

English language
28 pages
Preview
Preview
e-Library read for
1 day
Standard

ISO 24617-10:2024 - Language resource management — Semantic annotation framework (SemAF) — Part 10: Visual information Released:6. 08. 2024

English language
23 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

SIST ISO 24617-10:2025 is a standard published by the Slovenian Institute for Standardization (SIST). Its full title is "Language resource management - Semantic annotation framework (SemAF) - Part 10: Visual information". This standard covers: This document specifies an annotation language for visual information, based on VoxML (visual object concept structure modelling language), a modelling language for the visualizations of concepts and actions denoted by natural language (NL) expressions in three dimensions (3D). The specification of the VoxML-based annotation scheme conforms to the requirements given in ISO 24617-1, ISO 24617-7 and ISO 24617-14. The adoption of VoxML, specified in ISO 24617-14 as a semantic basis, is necessary for the 3D simulation and visualization of actions and motions taken by both human and artificial agents in real-life situations.

This document specifies an annotation language for visual information, based on VoxML (visual object concept structure modelling language), a modelling language for the visualizations of concepts and actions denoted by natural language (NL) expressions in three dimensions (3D). The specification of the VoxML-based annotation scheme conforms to the requirements given in ISO 24617-1, ISO 24617-7 and ISO 24617-14. The adoption of VoxML, specified in ISO 24617-14 as a semantic basis, is necessary for the 3D simulation and visualization of actions and motions taken by both human and artificial agents in real-life situations.

SIST ISO 24617-10:2025 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination); 01.140.20 - Information sciences; 35.240.30 - IT applications in information, documentation and publishing. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase SIST ISO 24617-10:2025 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of SIST standards.

Standards Content (Sample)


SLOVENSKI STANDARD
01-junij-2025
Upravljanje jezikovnih virov - Ogrodje za semantično označevanje (SemAF) - 10.
del: Vizualne informacije
Language resource management — Semantic annotation framework (SemAF) — Part
10: Visual information
Gestion des ressources linguistiques - Cadre d'annotation sémantique — Partie 10:
informations visuelles (VoxML)
Ta slovenski standard je istoveten z: ISO 24617-10:2024
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

International
Standard
ISO 24617-10
First edition
Language resource management —
2024-08
Semantic annotation framework
(SemAF) —
Part 10:
Visual information
Gestion des ressources linguistiques - Cadre d'annotation
sémantique —
Partie 10: informations visuelles (VoxML)
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 2
5 Basic semantic assumptions — Habitats and affordances . 3
6 VoxML specification . 4
6.1 Metamodel and VoxML elements . .4
6.2 Representation of VoxML structures .5
6.3 Objects .6
6.4 Actions as programs .7
6.5 Relations .8
6.5.1 General .8
6.5.2 Properties (Attributes) .8
6.5.3 Relations .9
6.5.4 Functions .9
7 Examples of voxemes . 9
7.1 General .9
7.2 Objects .10
7.3 Eventualities as programs . 13
7.4 Properties .14
7.5 Relations . 15
7.6 Functions . 15
8 Using VoxML for simulation modelling of language .16
9 VoxML-based annotation scheme .18
9.1 Overview .18
9.2 Annotation scheme .18
9.2.1 Abstract specification .18
9.2.2 Concrete syntax for the representation of annotation structures .19
9.3 Semantic representation and interpretation . 20
Bibliography .22

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
This document standardizes the specification of a semantic annotation scheme for visual information, based
on a modelling language for constructing three-dimensional (3D) visualizations of concepts denoted by
natural language (NL) expressions. This modelling language serves as a semantic basis of interpreting the
semantic forms of annotation structures model-theoretically by constraining the models for interpretation.
This document focuses on the introduction of the modelling language as a semantic basis for interpretation,
since the syntactic specification of the annotation scheme for visual information is a simplified formulation
based on the abstract specification of the spatio-temporal annotation schemes, such as those specified in
ISO 24617-1, ISO 24617-7 and ISO 24617-14. These three standards lay a theoretical basis for this document,
which specifies ways of annotating visual information involving motions and actions that are spatio-
temporally characterized.
The modelling language, named “VoxML” (visual object concept structure modelling language), where “Vox”
abbreviates “visual object concept structure” (VOCS), can be used as the platform for creating multimodal
semantic simulations in the context of human-computer communication. VoxML encodes semantic knowledge
of real-world objects represented as 3D models, and of events and attributes related to and enacted over
these objects. VoxML is intended to overcome the limitations of existing 3D visual markup languages by
allowing for the encoding of a broad range of semantic knowledge that can be exploited by a variety of
systems and platforms, leading to multimodal simulations of real-world scenarios using conceptual objects
that represent their semantic values.
NOTE 1 The main content of this document is based on References [1] and [2]. Reference [1] was developed by the
Brandeis University Computer Science Department in the context of communicating with computers (CwC), a Defence
Advanced Research Projects Agency (DARPA) effort to identify and construct computational semantic elements, for
the purpose of carrying out joint plans between a human and computer through NL discourse.
NOTE 2 This document adopts VoxML as a semantic basis for enriching the model for interpreting the descriptions
of objects, actions and relations involving dynamic visual information.
This document outlines a specification:
a) to formulate the annotation scheme for visual information;
b) to represent semantic knowledge of real-world objects represented as 3D models.
It uses a combination of parameters that can be determined from the object’s geometrical properties as
well as lexical information from NL, with methods of correlating the two where applicable. This information
allows for visualization and simulation software to fill in information missing from the NL input and
allows the software to render a functional visualization of programs being run over objects in a robust and
extensible way. Currently, a voxicon, which is the structured repository of visual object concepts, contains
500 object (noun) voxemes, lexemes or entries of the voxicon, and 10 program (verb) voxemes.
NOTE 3 As this library of available voxemes continues to grow, the specification elements will operationalize an
increasingly large library of various and more complicated programs. A voxeme library and visualization software
where users will be able to conduct visualizations of available behaviours driven by VoxML after parsing and
interpretation is available from Reference [25].

v
International Standard ISO 24617-10:2024(en)
Language resource management — Semantic annotation
framework (SemAF) —
Part 10:
Visual information
1 Scope
This document specifies an annotation language for visual information, based on VoxML (visual object
concept structure modelling language), a modelling language for the visualizations of concepts and actions
denoted by natural language (NL) expressions in three dimensions (3D).
The specification of the VoxML-based annotation scheme conforms to the requirements given in ISO 24617-1,
ISO 24617-7 and ISO 24617-14. The adoption of VoxML, specified in ISO 24617-14 as a semantic basis, is
necessary for the 3D simulation and visualization of actions and motions taken by both human and artificial
agents in real-life situations.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 24610-1:2006, Language resource management — Feature structures — Part 1: Feature structure
representation
ISO 24617-1, Language resource management — Semantic annotation framework (SemAF) — Part 1: Time and
events (SemAF-Time, ISO-TimeML)
ISO 24617-7, Language resource management — Semantic annotation framework — Part 7: Spatial information
ISO 24617-14, Language resource management — Semantic annotation framework (SemAF) — Part 14: Spatial
semantics
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
affordance
affordance structure
set of specific actions, described along with the requisite conditions, that the object may take part in

3.1.1
Gibsonian affordance
GA
set of specific actions that an agent can perform with an object that is presented to the agent
EXAMPLE Hold, grasp, move.
3.1.2
telic affordance
set of goal-oriented or intentionally situated actions of an agent on an object presented to the agent
EXAMPLE An agent eating an apple when it is presented to the agent.
3.2
habitat
representation of an object situated within a partial minimal model
3.3
minimal embedding space
MES
three-dimensional (3D) region within which the state is configured, or the event unfolds
3.4
qualia
qualia structure
QS
relational forces or aspects of a lexical item or concept
3.5
telic
purpose or function qualia (3.4) of an object
3.6
voxeme
basic entries in voxicon (3.7)
3.7
voxicon
lexicon or list of basic visual object concepts of VoxML (visual object concept structure modelling language)
4 Abbreviated terms
3D three dimensional
A agentive role
ARG argument
AS atomic structure
AS annotation scheme for visual information markup language
visML
ASyn abstract syntax for visual information markup language
visML
CSyn concrete syntax for visual information markup language
visML
C constitutive property
F formal property
GA Gibsonian affordance
ID identifier
MES minimal embedding space
NL natural language
NLP natural language processing
QS qualia structure
T telic role
Vox visual object concept structure
VoxML visual object concept structures modelling language
XML extensible markup language
5 Basic semantic assumptions — Habitats and affordances
Before introducing the VoxML specification, this document reviews two basic assumptions regarding the
[3]
semantics underlying the model. Following the Generative Lexicon, lexical entries in the object language
are given a feature structure consisting of a word’s basic type, its parameter listing, its event typing and its
qualia structure. In accordance with ISO 24610-1:2006, each feature structure shall be typed, consisting of
pairs of features (attributes) and values, either atomic or complex. If a value is a variable, then it is bound
either universally, existentially, or by the lambda operator, as shown in Example 1.
The semantic structure of an object shall be analysed into the following four sub-structures:
a) atomic structure (formal): objects expressed as basic nominal types;
b) subatomic structure (constitutive): mereo-topological structure of objects;
c) event structure (telic) and (agentive): origin and functions associated with an object;
d) macro-object structure: how objects fit together in space and through coordinated activities.
Objects can be partially contextualized through their qualia structure. For example, a food item has an atelic
value of “eat”; an instrument for writing has a telic value of “write”; a cup has a telic value of “hold”, etc. As a
further example, the lexical semantics for the noun “chair” carries a telic value of “sit_in”:
EXAMPLE 1
where
AS is an atomic structure;
QS is a qualia structure;
ARG1 is argument 1;
F is a formal property;
T is a telic role.
While an artefact is designed for a specific purpose (its telic role), this can only be achieved under
specific circumstances. Reference [4] introduces the notion of an object’s “habitat”, which encodes these
circumstances. References [5] and [6] further define the notion of habitat and how it interacts with
affordances. It is assumed that for an artefact, x, given the appropriate context C, performing the action π
will result in the intended or desired resulting state, R, i.e. C → [π]R. That is, if a context C (a set of contextual

factors) is satisfied, then every time the activity of π is performed, the resulting state R will occur. It is
necessary to specify the precondition context C since this enables the local modality to be satisfied.
Using this notion, a habit is defined as representing an object situated within a partial minimal model; it is
a directed enhancement of the qualia structure. Multi-dimensional affordances determine how habitats are
deployed and how they modify or augment the context, and compositional operations include procedural
(simulation) and operational (selection, specification, refinement) knowledge.
The habitat for an object is built by first placing it within an embedding space and then contextualizing it.
For example, to use a table, the top must be oriented upward, the surface must be accessible, etc. A chair
also must be oriented up, the seat must be free and accessible, it must be able to support the user, etc. An
illustration of how the resulting knowledge structure for the habitat of a chair is shown in Example 2.
EXAMPLE 2
where
F is a formal property;
C is a constitutive property;
T is a telic role;
A is an agentive role.
As described in more detail in 6.4, event or action simulations are constructed from the composition of
object habitats, along with some constraints imposed by the dynamic event structure inherent in the verb
itself, when interpreted as a program.
The final step in contextualizing the semantics of an object is to operationalize the telic value in its habitat.
[7][8]
This effectively means identifying the “affordance structure” for the object. The affordance structure
available to an agent, when presented with an object, is the set of actions that can be performed with it.
These are referred to as “Gibsonian affordances” and they include “grasp”, “move”, “hold”, “turn”, etc.
This is to distinguish them from more goal-directed, intentionally situated activities, referred to as “telic
affordances”.
6 VoxML specification
6.1 Metamodel and VoxML elements
The spatio-temporal annotation schemes given in ISO 24617-1, ISO 24617-7 and ISO 24617-14 shall apply.
The metamodel, graphically depicted by Figure 1, represents a small world of basic elements modelled in
VoxML. These elements form a set of categories:
a) event (program);
b) entity (object);
c) relation over them.
Events, especially actions, work as programs while taking simple objects or spatio-temporally localized
objects as arguments. Entities as objects are individuals or groups that may behave as agents. Relations can
be divided into properties, often referred to as “attributes”, and functions as subcategories. Attributes and
relations evaluate to states, and functions evaluate to geometric regions. These elements can then compose
into visualizations ns of NL concepts and expressions.

The metamodel of VoxML, presented in Figure 1, has no regions or times. These are introduced by functions
such as loc and τ. The function loc, for instance, maps an object x to the region loc(x) to which it is anchored.
Likewise, τ(x) maps an event to an event time, the time of its occurrence. Similarly, the function seq or the
function vec maps a set of regions to a path or a vector. Thereby, the ontology of VoxML is enriched with
spatio-temporal entities and dynamic paths.
NOTE 1 The empty triangular head of an arrow represents a subcategorization relation. Each directed arrow with
a smaller filled-in arrowhead relates one element to one or other more elements while its labelling specifies such a
relation. An entity as an agent, for example, triggers intentionally an action, while the action is a subcategory of an
event, treated as a program.
NOTE 2 SOURCE: Reference [2], reproduced with the permission of the authors.
Figure 1 — Metamodel
6.2 Representation of VoxML structures
This document follows the convention of the current version of VoxML and Voxicon (see Reference [1]). Basic
VoxML structures called “voxemes” are conventionally represented as feature structures, each consisting of
a set of attribute-value specifications, conforming to ISO 24610-1. Voxemes are mostly formed by complex
feature structures, having at least one of their substructures embedded in them as a feature structure, as
illustrated in this clause.
NOTE 1 ISO 24610-1 avoids the use of the term “attribute-value”. Instead, it uses the term “feature-value”, thus
defining a feature structure as a function from a set of features to a set of values.
In the concrete syntax, adopted for representing these feature structures of VoxML in this document, the
names of its attributes are represented in all uppercase characters, while the names of elements start with
their first character in upper case (e.g. the attribute LEX for the element Object as in Figure 2).
NOTE 2 This document follows the convention of the current version of VoxML and Voxicon for representing
attribute names in upper case characters.

Figure 2 — Voxeme structure of a wall
6.3 Objects
The element Object in VoxML is used for modelling nouns. The current set of Object attributes is shown in
Table 1.
Table 1 — Object attributes
LEX Object’s lexical information
TYPE Object’s geometrical typing
HABITAT Object’s habitat for actions
AFFORD_STR Object’s affordance structure
EMBODIMENT Object’s agent-relative embodiment
The attribute LEX in Table 1 contains a substructure, specified by two attributes: PRED and TYPE. The
attribute PRED in the substructure specifies the predicate lexeme denoting the Object, and the attribute
[3]
TYPE in the substructure specifies the Object’s type according to the Generative Lexicon (see Figure 2).
There are two different sorts of the attribute TYPE, as shown in Figure 2. The first sort refers to the attribute
TYPE of the element Object. In contrast, the second sort refers to the attribute TYPE of the substructure
of the attribute LEX, which contains information to define the object geometry in terms of primitives.
This attribute TYPE has an attribute HEAD in its substructure, which specifies a primitive 3D shape that
roughly describes the object’s form (such as calling an apple an “ellipsoid”), or the form of the object’s most
semantically salient subpart. Possible values for the attribute HEAD are grounded in, for completeness,
[9]
mathematical formalism defining families of polyhedra , and, for the annotator’s ease, common primitives
found across the “corpus” of 3D artwork and 3D modelling software.
NOTE Mathematically curved surfaces such as spheres and cylinders are in fact represented, computed and
[10]
rendered as polyhedra by most modern 3D software.
Using common 3D modelling primitives as convenience definitions provides some built-in redundancy to
VoxML, as is found in an NL description of structural forms. For example, a “rectangular_prism” is the same
as a “parallelepiped” that has at least two defined planes of reflectional symmetry, meaning that an object
whose Head is a rectangular_prism can be defined in two ways, an association which a reasoner can unify
axiomatically. Possible values for the attribute HEAD are given in Table 2

Table 2 — Possible values for the attribute HEAD
HEAD prismatoid, pyramid, wedge, parallelepiped, cupola, frustum, cylindroid, ellipsoid,
hemiellipsoid, bypyramid, rectangular_prism, toroid, sheet
These values are not intended to reflect the exact structure of a particular geometry, but rather a cognitive
[11]
approximation of its shape, as is used in some image-recognition work.
The substructures of an object are enumerated in its attribute COMPONENTS. In Figure 2, the attribute
COMPONENTS embedded in the attribute TYPE has its value nil. Concavity can be concave, flat or convex and
refers to any concavity that deforms the Head shape. ROTATSYM, or rotational symmetry, defines any of the
world’s three orthogonal axes around which the object’s geometry may be rotated for an interval of less than
360° and retain identical form as the unrotated geometry. A sphere may be rotated at any interval around
any of the three axes and retain the same form. A rectangular prism may be rotated 180° around any of the
three axes and retain the same shape. An object such as a ceiling fan would only have rotational symmetry
around the y-axis. Reflectional symmetry, or REFLECTSYM, is defined similarly. If an object can be bisected
by a plane defined by two of the world’s three orthogonal axes and then reflected across that plane to obtain
the same geometric form as the original object, it is considered to have reflectional symmetry across that
plane. A sphere or rectangular prism has reflectional symmetry across the XY, XZ and YZ planes. A wine
bottle only has reflectional symmetry across the XY and YZ planes.
The possible values of ROTATSYM and REFLECTSYM are intended to be world-relative, not object-relative.
That is, because objects are only being discussed when situated in a minimal embedding space (MES), even
an otherwise empty one, wherein all coordinates are given Cartesian values, the axis of rotational symmetry
or plane of reflectional symmetry are those denoted in the world, not of the object. Thus, a tetrahedron
(which in isolation has seven axes of rotational symmetry, no two of which are orthogonal) when placed in
the MES such that it cognitively satisfies all “real-world” constraints, is situated with one base downward
(a tetrahedron placed any other way will fall over). Thus, reducing the salient in-world axes of rotational
symmetry to one: the world’s y-axis. When the orientation of the object is ambiguous relative to the world,
the world is assumed to provide the grounding value.
The Habitat element defines habitats “intrinsic” to the object, regardless of what action it participates in,
such as intrinsic orientations or surfaces, as well as “extrinsic” habitats which must be satisfied for some
specified actions to take place. Intrinsic faces of an object can be defined in terms of its geometry and axes.
The model of a computer monitor, when axis-aligned according to 3D modelling convention, aligns the screen
with the world’s Z-axis facing the direction of increasing Z values. When discussing the object “computer
monitor”, the lexeme “front” singles out the screen of the monitor as opposed to any other p
...


International
Standard
ISO 24617-10
First edition
Language resource management —
2024-08
Semantic annotation framework
(SemAF) —
Part 10:
Visual information
Gestion des ressources linguistiques - Cadre d'annotation
sémantique —
Partie 10: informations visuelles (VoxML)
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 2
5 Basic semantic assumptions — Habitats and affordances . 3
6 VoxML specification . 4
6.1 Metamodel and VoxML elements . .4
6.2 Representation of VoxML structures .5
6.3 Objects .6
6.4 Actions as programs .7
6.5 Relations .8
6.5.1 General .8
6.5.2 Properties (Attributes) .8
6.5.3 Relations .9
6.5.4 Functions .9
7 Examples of voxemes . 9
7.1 General .9
7.2 Objects .10
7.3 Eventualities as programs . 13
7.4 Properties .14
7.5 Relations . 15
7.6 Functions . 15
8 Using VoxML for simulation modelling of language .16
9 VoxML-based annotation scheme .18
9.1 Overview .18
9.2 Annotation scheme .18
9.2.1 Abstract specification .18
9.2.2 Concrete syntax for the representation of annotation structures .19
9.3 Semantic representation and interpretation . 20
Bibliography .22

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
This document standardizes the specification of a semantic annotation scheme for visual information, based
on a modelling language for constructing three-dimensional (3D) visualizations of concepts denoted by
natural language (NL) expressions. This modelling language serves as a semantic basis of interpreting the
semantic forms of annotation structures model-theoretically by constraining the models for interpretation.
This document focuses on the introduction of the modelling language as a semantic basis for interpretation,
since the syntactic specification of the annotation scheme for visual information is a simplified formulation
based on the abstract specification of the spatio-temporal annotation schemes, such as those specified in
ISO 24617-1, ISO 24617-7 and ISO 24617-14. These three standards lay a theoretical basis for this document,
which specifies ways of annotating visual information involving motions and actions that are spatio-
temporally characterized.
The modelling language, named “VoxML” (visual object concept structure modelling language), where “Vox”
abbreviates “visual object concept structure” (VOCS), can be used as the platform for creating multimodal
semantic simulations in the context of human-computer communication. VoxML encodes semantic knowledge
of real-world objects represented as 3D models, and of events and attributes related to and enacted over
these objects. VoxML is intended to overcome the limitations of existing 3D visual markup languages by
allowing for the encoding of a broad range of semantic knowledge that can be exploited by a variety of
systems and platforms, leading to multimodal simulations of real-world scenarios using conceptual objects
that represent their semantic values.
NOTE 1 The main content of this document is based on References [1] and [2]. Reference [1] was developed by the
Brandeis University Computer Science Department in the context of communicating with computers (CwC), a Defence
Advanced Research Projects Agency (DARPA) effort to identify and construct computational semantic elements, for
the purpose of carrying out joint plans between a human and computer through NL discourse.
NOTE 2 This document adopts VoxML as a semantic basis for enriching the model for interpreting the descriptions
of objects, actions and relations involving dynamic visual information.
This document outlines a specification:
a) to formulate the annotation scheme for visual information;
b) to represent semantic knowledge of real-world objects represented as 3D models.
It uses a combination of parameters that can be determined from the object’s geometrical properties as
well as lexical information from NL, with methods of correlating the two where applicable. This information
allows for visualization and simulation software to fill in information missing from the NL input and
allows the software to render a functional visualization of programs being run over objects in a robust and
extensible way. Currently, a voxicon, which is the structured repository of visual object concepts, contains
500 object (noun) voxemes, lexemes or entries of the voxicon, and 10 program (verb) voxemes.
NOTE 3 As this library of available voxemes continues to grow, the specification elements will operationalize an
increasingly large library of various and more complicated programs. A voxeme library and visualization software
where users will be able to conduct visualizations of available behaviours driven by VoxML after parsing and
interpretation is available from Reference [25].

v
International Standard ISO 24617-10:2024(en)
Language resource management — Semantic annotation
framework (SemAF) —
Part 10:
Visual information
1 Scope
This document specifies an annotation language for visual information, based on VoxML (visual object
concept structure modelling language), a modelling language for the visualizations of concepts and actions
denoted by natural language (NL) expressions in three dimensions (3D).
The specification of the VoxML-based annotation scheme conforms to the requirements given in ISO 24617-1,
ISO 24617-7 and ISO 24617-14. The adoption of VoxML, specified in ISO 24617-14 as a semantic basis, is
necessary for the 3D simulation and visualization of actions and motions taken by both human and artificial
agents in real-life situations.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 24610-1:2006, Language resource management — Feature structures — Part 1: Feature structure
representation
ISO 24617-1, Language resource management — Semantic annotation framework (SemAF) — Part 1: Time and
events (SemAF-Time, ISO-TimeML)
ISO 24617-7, Language resource management — Semantic annotation framework — Part 7: Spatial information
ISO 24617-14, Language resource management — Semantic annotation framework (SemAF) — Part 14: Spatial
semantics
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
affordance
affordance structure
set of specific actions, described along with the requisite conditions, that the object may take part in

3.1.1
Gibsonian affordance
GA
set of specific actions that an agent can perform with an object that is presented to the agent
EXAMPLE Hold, grasp, move.
3.1.2
telic affordance
set of goal-oriented or intentionally situated actions of an agent on an object presented to the agent
EXAMPLE An agent eating an apple when it is presented to the agent.
3.2
habitat
representation of an object situated within a partial minimal model
3.3
minimal embedding space
MES
three-dimensional (3D) region within which the state is configured, or the event unfolds
3.4
qualia
qualia structure
QS
relational forces or aspects of a lexical item or concept
3.5
telic
purpose or function qualia (3.4) of an object
3.6
voxeme
basic entries in voxicon (3.7)
3.7
voxicon
lexicon or list of basic visual object concepts of VoxML (visual object concept structure modelling language)
4 Abbreviated terms
3D three dimensional
A agentive role
ARG argument
AS atomic structure
AS annotation scheme for visual information markup language
visML
ASyn abstract syntax for visual information markup language
visML
CSyn concrete syntax for visual information markup language
visML
C constitutive property
F formal property
GA Gibsonian affordance
ID identifier
MES minimal embedding space
NL natural language
NLP natural language processing
QS qualia structure
T telic role
Vox visual object concept structure
VoxML visual object concept structures modelling language
XML extensible markup language
5 Basic semantic assumptions — Habitats and affordances
Before introducing the VoxML specification, this document reviews two basic assumptions regarding the
[3]
semantics underlying the model. Following the Generative Lexicon, lexical entries in the object language
are given a feature structure consisting of a word’s basic type, its parameter listing, its event typing and its
qualia structure. In accordance with ISO 24610-1:2006, each feature structure shall be typed, consisting of
pairs of features (attributes) and values, either atomic or complex. If a value is a variable, then it is bound
either universally, existentially, or by the lambda operator, as shown in Example 1.
The semantic structure of an object shall be analysed into the following four sub-structures:
a) atomic structure (formal): objects expressed as basic nominal types;
b) subatomic structure (constitutive): mereo-topological structure of objects;
c) event structure (telic) and (agentive): origin and functions associated with an object;
d) macro-object structure: how objects fit together in space and through coordinated activities.
Objects can be partially contextualized through their qualia structure. For example, a food item has an atelic
value of “eat”; an instrument for writing has a telic value of “write”; a cup has a telic value of “hold”, etc. As a
further example, the lexical semantics for the noun “chair” carries a telic value of “sit_in”:
EXAMPLE 1
where
AS is an atomic structure;
QS is a qualia structure;
ARG1 is argument 1;
F is a formal property;
T is a telic role.
While an artefact is designed for a specific purpose (its telic role), this can only be achieved under
specific circumstances. Reference [4] introduces the notion of an object’s “habitat”, which encodes these
circumstances. References [5] and [6] further define the notion of habitat and how it interacts with
affordances. It is assumed that for an artefact, x, given the appropriate context C, performing the action π
will result in the intended or desired resulting state, R, i.e. C → [π]R. That is, if a context C (a set of contextual

factors) is satisfied, then every time the activity of π is performed, the resulting state R will occur. It is
necessary to specify the precondition context C since this enables the local modality to be satisfied.
Using this notion, a habit is defined as representing an object situated within a partial minimal model; it is
a directed enhancement of the qualia structure. Multi-dimensional affordances determine how habitats are
deployed and how they modify or augment the context, and compositional operations include procedural
(simulation) and operational (selection, specification, refinement) knowledge.
The habitat for an object is built by first placing it within an embedding space and then contextualizing it.
For example, to use a table, the top must be oriented upward, the surface must be accessible, etc. A chair
also must be oriented up, the seat must be free and accessible, it must be able to support the user, etc. An
illustration of how the resulting knowledge structure for the habitat of a chair is shown in Example 2.
EXAMPLE 2
where
F is a formal property;
C is a constitutive property;
T is a telic role;
A is an agentive role.
As described in more detail in 6.4, event or action simulations are constructed from the composition of
object habitats, along with some constraints imposed by the dynamic event structure inherent in the verb
itself, when interpreted as a program.
The final step in contextualizing the semantics of an object is to operationalize the telic value in its habitat.
[7][8]
This effectively means identifying the “affordance structure” for the object. The affordance structure
available to an agent, when presented with an object, is the set of actions that can be performed with it.
These are referred to as “Gibsonian affordances” and they include “grasp”, “move”, “hold”, “turn”, etc.
This is to distinguish them from more goal-directed, intentionally situated activities, referred to as “telic
affordances”.
6 VoxML specification
6.1 Metamodel and VoxML elements
The spatio-temporal annotation schemes given in ISO 24617-1, ISO 24617-7 and ISO 24617-14 shall apply.
The metamodel, graphically depicted by Figure 1, represents a small world of basic elements modelled in
VoxML. These elements form a set of categories:
a) event (program);
b) entity (object);
c) relation over them.
Events, especially actions, work as programs while taking simple objects or spatio-temporally localized
objects as arguments. Entities as objects are individuals or groups that may behave as agents. Relations can
be divided into properties, often referred to as “attributes”, and functions as subcategories. Attributes and
relations evaluate to states, and functions evaluate to geometric regions. These elements can then compose
into visualizations ns of NL concepts and expressions.

The metamodel of VoxML, presented in Figure 1, has no regions or times. These are introduced by functions
such as loc and τ. The function loc, for instance, maps an object x to the region loc(x) to which it is anchored.
Likewise, τ(x) maps an event to an event time, the time of its occurrence. Similarly, the function seq or the
function vec maps a set of regions to a path or a vector. Thereby, the ontology of VoxML is enriched with
spatio-temporal entities and dynamic paths.
NOTE 1 The empty triangular head of an arrow represents a subcategorization relation. Each directed arrow with
a smaller filled-in arrowhead relates one element to one or other more elements while its labelling specifies such a
relation. An entity as an agent, for example, triggers intentionally an action, while the action is a subcategory of an
event, treated as a program.
NOTE 2 SOURCE: Reference [2], reproduced with the permission of the authors.
Figure 1 — Metamodel
6.2 Representation of VoxML structures
This document follows the convention of the current version of VoxML and Voxicon (see Reference [1]). Basic
VoxML structures called “voxemes” are conventionally represented as feature structures, each consisting of
a set of attribute-value specifications, conforming to ISO 24610-1. Voxemes are mostly formed by complex
feature structures, having at least one of their substructures embedded in them as a feature structure, as
illustrated in this clause.
NOTE 1 ISO 24610-1 avoids the use of the term “attribute-value”. Instead, it uses the term “feature-value”, thus
defining a feature structure as a function from a set of features to a set of values.
In the concrete syntax, adopted for representing these feature structures of VoxML in this document, the
names of its attributes are represented in all uppercase characters, while the names of elements start with
their first character in upper case (e.g. the attribute LEX for the element Object as in Figure 2).
NOTE 2 This document follows the convention of the current version of VoxML and Voxicon for representing
attribute names in upper case characters.

Figure 2 — Voxeme structure of a wall
6.3 Objects
The element Object in VoxML is used for modelling nouns. The current set of Object attributes is shown in
Table 1.
Table 1 — Object attributes
LEX Object’s lexical information
TYPE Object’s geometrical typing
HABITAT Object’s habitat for actions
AFFORD_STR Object’s affordance structure
EMBODIMENT Object’s agent-relative embodiment
The attribute LEX in Table 1 contains a substructure, specified by two attributes: PRED and TYPE. The
attribute PRED in the substructure specifies the predicate lexeme denoting the Object, and the attribute
[3]
TYPE in the substructure specifies the Object’s type according to the Generative Lexicon (see Figure 2).
There are two different sorts of the attribute TYPE, as shown in Figure 2. The first sort refers to the attribute
TYPE of the element Object. In contrast, the second sort refers to the attribute TYPE of the substructure
of the attribute LEX, which contains information to define the object geometry in terms of primitives.
This attribute TYPE has an attribute HEAD in its substructure, which specifies a primitive 3D shape that
roughly describes the object’s form (such as calling an apple an “ellipsoid”), or the form of the object’s most
semantically salient subpart. Possible values for the attribute HEAD are grounded in, for completeness,
[9]
mathematical formalism defining families of polyhedra , and, for the annotator’s ease, common primitives
found across the “corpus” of 3D artwork and 3D modelling software.
NOTE Mathematically curved surfaces such as spheres and cylinders are in fact represented, computed and
[10]
rendered as polyhedra by most modern 3D software.
Using common 3D modelling primitives as convenience definitions provides some built-in redundancy to
VoxML, as is found in an NL description of structural forms. For example, a “rectangular_prism” is the same
as a “parallelepiped” that has at least two defined planes of reflectional symmetry, meaning that an object
whose Head is a rectangular_prism can be defined in two ways, an association which a reasoner can unify
axiomatically. Possible values for the attribute HEAD are given in Table 2

Table 2 — Possible values for the attribute HEAD
HEAD prismatoid, pyramid, wedge, parallelepiped, cupola, frustum, cylindroid, ellipsoid,
hemiellipsoid, bypyramid, rectangular_prism, toroid, sheet
These values are not intended to reflect the exact structure of a particular geometry, but rather a cognitive
[11]
approximation of its shape, as is used in some image-recognition work.
The substructures of an object are enumerated in its attribute COMPONENTS. In Figure 2, the attribute
COMPONENTS embedded in the attribute TYPE has its value nil. Concavity can be concave, flat or convex and
refers to any concavity that deforms the Head shape. ROTATSYM, or rotational symmetry, defines any of the
world’s three orthogonal axes around which the object’s geometry may be rotated for an interval of less than
360° and retain identical form as the unrotated geometry. A sphere may be rotated at any interval around
any of the three axes and retain the same form. A rectangular prism may be rotated 180° around any of the
three axes and retain the same shape. An object such as a ceiling fan would only have rotational symmetry
around the y-axis. Reflectional symmetry, or REFLECTSYM, is defined similarly. If an object can be bisected
by a plane defined by two of the world’s three orthogonal axes and then reflected across that plane to obtain
the same geometric form as the original object, it is considered to have reflectional symmetry across that
plane. A sphere or rectangular prism has reflectional symmetry across the XY, XZ and YZ planes. A wine
bottle only has reflectional symmetry across the XY and YZ planes.
The possible values of ROTATSYM and REFLECTSYM are intended to be world-relative, not object-relative.
That is, because objects are only being discussed when situated in a minimal embedding space (MES), even
an otherwise empty one, wherein all coordinates are given Cartesian values, the axis of rotational symmetry
or plane of reflectional symmetry are those denoted in the world, not of the object. Thus, a tetrahedron
(which in isolation has seven axes of rotational symmetry, no two of which are orthogonal) when placed in
the MES such that it cognitively satisfies all “real-world” constraints, is situated with one base downward
(a tetrahedron placed any other way will fall over). Thus, reducing the salient in-world axes of rotational
symmetry to one: the world’s y-axis. When the orientation of the object is ambiguous relative to the world,
the world is assumed to provide the grounding value.
The Habitat element defines habitats “intrinsic” to the object, regardless of what action it participates in,
such as intrinsic orientations or surfaces, as well as “extrinsic” habitats which must be satisfied for some
specified actions to take place. Intrinsic faces of an object can be defined in terms of its geometry and axes.
The model of a computer monitor, when axis-aligned according to 3D modelling convention, aligns the screen
with the world’s Z-axis facing the direction of increasing Z values. When discussing the object “computer
monitor”, the lexeme “front” singles out the screen of the monitor as opposed to any other part. The lexeme
can therefore be correlated with the geometrical representation by establishing an intrinsic habitat of the
computer monitor of front(+Z). The terminology of “alignment” of an object dimension, d ∈ {x, y, z}, is adopted
with the dimension, d', of its embedding space, Ԑ, as follows: align (d, Ԑ, d’).
The attribute AFFORD_STR describes the set of specific actions, along with the requisite conditions, that
the object can potentially take part in. There are low-level affordances, called “Gibsonian”, which involve
manipulation or manoeuv
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

SIST ISO 24617-10:2025は、視覚情報に関する標準の一部であり、特にVisual Object Concept Structure Modelling Language(VoxML)に基づいた注釈言語を規定しています。この標準は、自然言語表現によって示される概念や行動の視覚化を3次元(3D)で実現するための重要な枠組みを提供します。 この文書の範囲は、VoxMLを採用することの重要性を強調しており、ISO 24617-1、ISO 24617-7、ISO 24617-14で示された要件に準拠しています。特に、ISO 24617-14で規定されているセマンティック基盤としてのVoxMLの採用は、現実世界における人間及び人工エージェントによる行動や動作の3Dシミュレーションと視覚化を行うために不可欠です。 この標準の強みは、視覚情報を自然言語で表現された概念にリンクさせることで、多様なアプリケーションにおける表現力を高める点にあります。また、3D表現の必要性を考慮した上での構造的なアプローチは、開発者や研究者にとって非常に有用であり、視覚情報の効果的な利用を促進します。さらに、技術者やユーザーが統一されたモデルを用いることにより、相互運用性を向上させ、さまざまなプラットフォームにおける情報共有を容易にします。 このように、SIST ISO 24617-10:2025は、視覚情報の管理において貴重な指針を提供し、3D環境における自然言語処理と視覚化の相互作用の効率を高めるために欠かせない標準です。

The SIST ISO 24617-10:2025 document presents a comprehensive standard that specifically addresses the needs of language resource management with a significant focus on visual information. The adoption of the VoxML (visual object concept structure modeling language) is a notable strength of this standard, as it provides a robust framework for annotating visual information in a manner that is directly aligned with natural language expressions. This not only enhances the understanding of complex 3D visualizations but also ensures that such annotations are semantically rich and meaningful. One of the primary strengths of the standard lies in its adherence to the requirements laid out in other relevant ISO standards such as ISO 24617-1, ISO 24617-7, and ISO 24617-14. This alignment ensures that the annotation framework is built upon a solid foundation, facilitating interoperability and compatibility with existing linguistic resources and technologies. By conforming to these established standards, SIST ISO 24617-10:2025 enhances its reliability and usability in diverse applications, from academia to industry, where accurate visual representation is critical. The standard's relevance is underscored by the growing demand for high-quality semantic annotation in various fields, including virtual reality, robotics, and automated understanding of visual contexts. As the demands for sophisticated 3D modeling and simulation grow, the importance of a standardized approach to annotation becomes increasingly clear. The VoxML-based scheme presents a forward-thinking solution that not only addresses current needs but also positions itself as a crucial tool for future innovations in visual information processing and representation. In summary, SIST ISO 24617-10:2025 is a pivotal document that significantly contributes to semantic annotation within the realm of visual information. Its strengths, encompassing a well-defined annotation language and strong alignment with other ISO standards, make it an invaluable resource for those working in the intersection of language, vision, and artificial intelligence.

Die Norm SIST ISO 24617-10:2025 stellt eine wichtige Grundlage für das Management von Sprachressourcen und bietet ein präzises Rahmenwerk für die semantische Annotation von visuellen Informationen. Mit der Einführung einer spezifischen Annotation-Sprache, die auf VoxML basiert, ermöglicht die Norm eine systematische und strukturierte Darstellung von Konzepten und Handlungen, die durch natürliche Sprache in dreidimensionalen (3D) Umgebungen dargestellt werden. Ein herausragendes Merkmal dieser Norm ist die Konformität mit anderen relevanten Normen, namentlich ISO 24617-1, ISO 24617-7 und ISO 24617-14. Diese Harmonisierung gewährleistet, dass die in der Norm festgelegten Annotation-Schemata konsistent und umfassend sind, was die Qualität und Reliabilität der semantischen Annotation verbessert. Insbesondere die Verwendung von VoxML als semantische Basis ist von großer Bedeutung, da sie die realistische Simulation und Visualisierung von Handlungen und Bewegungen sowohl menschlicher als auch künstlicher Agenten in realen Situationen unterstützt. Die Stärken der Norm erstrecken sich auch auf ihre Anwendbarkeit in verschiedenen Bereichen, einschließlich der Robotik, der Augmented und Virtual Reality sowie der Künstlichen Intelligenz. Durch die klare Struktur und die umfassenden Vorgaben fördert die Norm eine einheitliche Vorgehensweise beim Umgang mit visuellen Informationen und trägt damit zur Verbesserung der Interoperabilität zwischen verschiedenen Systemen und Anwendungen bei. Insgesamt zeigt die SIST ISO 24617-10:2025 ihre hohe Relevanz für Fachleute, die sich mit der Entwicklung und Umsetzung von Technologien im Zusammenhang mit visuellen Informationen befassen, sowie für Forscher, die im Bereich der Sprach- und Bildverarbeitung tätig sind. Die Norm ist ein unverzichtbares Werkzeug, um die Kommunikation zwischen Mensch und Maschine zu optimieren und um die Verarbeitung und Darstellung von Informationen in einer zunehmend visuell orientierten Welt zu standardisieren.

SIST ISO 24617-10:2025 문서는 시각 정보에 대한 주석 언어를 정의하고 있으며, 이는 VoxML(시각 객체 개념 구조 모델링 언어)을 바탕으로 구성되어 있습니다. 이 표준의 범위는 자연어(NL) 표현에 의해 나타내지는 개념과 행동의 3차원(3D) 시각화에 필요한 주석 언어를 명확히 규정하고 있습니다. 이 문서는 ISO 24617-1, ISO 24617-7, ISO 24617-14에서 제정된 요구 사항에 부합하는 VoxML 기반 주석 체계를 명시하고 있습니다. 이는 특히 인간과 인공지능 에이전트가 현실 세계에서 수행하는 행동과 동작의 3D 시뮬레이션 및 시각화에 필수적입니다. 이 표준의 강점은 복잡한 시각 정보를 효과적으로 다룰 수 있는 체계를 제공하는 것입니다. VoxML의 채택은 semantics(의미) 기반의 접근 방식을 활용하여 자연어로 표현된 개념을 보다 명확하고 정교하게 시각화할 수 있도록 합니다. 이는 사용자들이 시각적 또는 행동적 정보를 이해하고 활용하는 데 상당한 도움을 줄 것입니다. SIST ISO 24617-10:2025는 산업, 교육, 연구 등 다양한 분야에서 시각 정보 관리의 중요성이 증가하고 있는 현 상황에서 특히 중요한 역할을 할 것으로 기대됩니다. 이 표준은 시각적 개념의 상징적 표현을 통해 향상된 커뮤니케이션과 정보 전달을 지원하며, 다양한 응용 프로그램에서의 활용 가능성을 넓히고 있습니다.

Le document SIST ISO 24617-10:2025 se concentre sur la gestion des ressources linguistiques à travers un cadre d'annotation sémantique, spécifiquement destiné aux informations visuelles. Ce standard introduit un langage d'annotation basé sur VoxML, ce qui constitue une avancée significative dans la modélisation des concepts et actions exprimés par des langages naturels dans des environnements tridimensionnels (3D). L'un des principaux points forts de ce standard est son adéquation avec les exigences établies dans les normes ISO 24617-1, ISO 24617-7 et ISO 24617-14. En respectant ces normes, le document établit des bases solides pour l'utilisation et l'intégration des annotations sémantiques dans divers contextes. Cette conformité assure également une compatibilité et une interopérabilité accrue entre les systèmes utilisant des ressources linguistiques et des visualisations complexes. L'adoption de VoxML comme base sémantique est particulièrement pertinente pour les applications nécessitant une simulation et une visualisation 3D des actions effectuées par des agents, qu'ils soient humains ou artificiels. En facilitant les représentations visuelles des mouvements et des interactions dans des environnements réalistes, ce standard répond à des besoins croissants dans les domaines de la réalité augmentée, des simulations d'apprentissage et autres technologies émergentes. En conclusion, le SIST ISO 24617-10:2025 se positionne comme un outil essentiel pour l'intégration de l'annotation sémantique dans les systèmes d'information visuelle, renforçant ainsi la pertinence de la gestion des ressources linguistiques dans un monde de plus en plus tourné vers la visualisation tridimensionnelle et l'interaction dynamique.