ISO/IEC 23093-1:2022
(Main)Information technology — Internet of media things — Part 1: Architecture
Information technology — Internet of media things — Part 1: Architecture
This document describes the architecture of systems for the internet of media things.
Technologies de l'information — Internet des objets media — Partie 1: Architecture
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 23093-1
Second edition
2022-03
Information technology — Internet of
media things —
Part 1:
Architecture
Technologies de l'information — Internet des objets media —
Partie 1: Architecture
Reference number
© ISO/IEC 2022
© ISO/IEC 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
© ISO/IEC 2022 – All rights reserved
Contents Page
Foreword .v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Internet of media things terms . 1
3.2 Internet of things terms . 3
4 Architecture. 5
5 Use cases . 5
5.1 General . 5
5.2 Smart spaces: Monitoring and control with network of audio-video cameras . 6
5.2.1 General . 6
5.2.2 Human tracking with multiple network cameras . 6
5.2.3 Dangerous region surveillance system . 7
5.2.4 Intelligent firefighting with IP surveillance cameras . 7
5.2.5 Automatic security alert and title generation system using, time, GPS and
visual information . 8
5.2.6 Networked digital signs for customized advertisement . 8
5.2.7 Digital signage and second screen use . 8
5.2.8 Self-adaptive quality of experience for multimedia applications . 9
5.2.9 Ultra-wide viewing video composition . 9
5.2.10 Face recognition to evoke sensorial actuations. 9
5.2.11 Automatic video clip generation by detecting event information . 9
5.2.12 Temporal synchronization of multiple videos for creating 360° or multiple
view video . 9
5.2.13 Intelligent similar content recommendations using information from IoMT
devices . 10
5.2.14 Safety equipment detection on construction sites . 10
5.3 Smart spaces: Multi-modal guided navigation . 10
5.3.1 General . 10
5.3.2 Blind person assistant system . 10
5.3.3 Elderly people assistance with consecutive vibration haptic devices . 11
5.3.4 Personalized navigation by visual communication . 11
5.3.5 Personalized tourist navigation with natural language functionalities .12
5.3.6 Smart identifier: Face recognition on smart glasses .13
5.3.7 Smart advertisement: QR code recognition on smart glasses .13
5.4 Smart audio/video environments in smart cities . 13
5.4.1 General .13
5.4.2 Smart factory: Car maintenance assistance A/V system using smart glasses . 14
5.4.3 Smart museum: Augmented visit using smart glasses. 14
5.4.4 Smart house: Light control, vibrating subtitle, olfaction media content
consumption, odour image recognizer . 15
5.4.5 Smart car: Head-light adjustment and speed monitoring to provide
automatic volume control. 16
5.5 Smart multi-modal collaborative health . 16
5.5.1 General . 16
5.5.2 Increasing patient autonomy by remote control of left-ventricular assisted
devices . 16
5.5.3 Diabetic coma prevention by monitoring networks of in-body/near body
sensors . 17
5.5.4 Enhanced physical activity with smart fabrics networks. 17
5.5.5 Medical assistance with smart glasses . 17
iii
© ISO/IEC 2022 – All rights reserved
5.5.6 Managing healthcare information for smart glasses . 18
5.5.7 Indoor air quality prediction . 19
5.6 Blockchain usage for IoMT transactions authentication and monetizing . 19
5.6.1 General . 19
5.6.2 Reward function in IoMT people counting by using blockchains . 19
5.6.3 Content authentication with blockchains . 19
Annex A (informative) Mapping of the components between IoMT and IoT reference
architectures.21
Bibliography .23
iv
© ISO/IEC 2022 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance
are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria
needed for the different types of document should be noted. This document was drafted in
accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC
list of patent declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This second edition cancels and replaces the first edition (ISO/IEC 23093-1:2020), which has been
technically revised.
The main changes are as follows:
— use case description and the underlying technology.
A list of all parts in the ISO/IEC 23093 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
v
© ISO/IEC 2022 – All rights reserved
Introduction
The ISO/IEC 23093 series provides an architecture and specifies application programming interfaces
(APIs) and compressed representation of data flowing between media things.
The APIs for the media things facilitate discovering other media things in the network, connecting
and efficiently exchanging data between media things. The APIs also provide means for supporting
transaction tokens in order to access valuable functionalities, resources, and data from media things.
Media things related information consists of characteristics and discovery data, setup information
from a system designer, raw and processed sensed data, and actuation information. The ISO/IEC 23093
series specifies data formats of input and output for media sensors, media actuators, media storages,
media analysers, etc. Sensed data from media sensors can be processed by media analysers to produce
analysed data, and the media analysers can be cascaded in order to extract semantic information.
This document does not specify how the process of sensing and analysing is carried out but specifies
the interfaces between the media things. This document describes the architecture of systems for the
internet of media things.
The International Organization for Standardization (ISO) and International Electrotechnical
Commission (IEC) draw attention to the fact that it is claimed that compliance with this document may
involve the use of a patent.
ISO and IEC take no position concerning the evidence, validity and scope of this patent right.
The holder of this patent right has assured ISO and IEC that they are willing to negotiate licences under
reasonable and non-discriminatory terms and conditions with applicants throughout the world. In this
respect, the statement of the holder of this patent right is registered with ISO and IEC. Information may
be obtained from the patent database available at www.iso.org/patents.
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights other than those in the patent database. ISO and IEC shall not be held responsible for
identifying any or all such patent rights.
vi
© ISO/IEC 2022 – All rights reserved
INTERNATIONAL STANDARD ISO/IEC 23093-1:2022(E)
Information technology — Internet of media things —
Part 1:
Architecture
1 Scope
This document describes the architecture of systems for the internet of media things.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Internet of media things terms
3.1.1
audio
anything related to sound in terms of receiving, transmitting or reproducing it or of its specific
frequency
3.1.2
camera
special form of an image capture device (3.1.6) that senses and captures photo-optical signals
3.1.3
display
visual representation of the output of an electronic device or the portion of an electronic device that
shows this representation, as a screen, lens or reticle
3.1.4
gesture
movement or position of the hand, arm, body, head or face that is expressive of an idea, opinion, emotion,
etc.
3.1.5
haptics
input or output device that senses or actuates the body's movements by means of physical contact with
the user
3.1.6
image capture device
device which is capable of sensing and capturing acoustic, electrical or photo-optical signals of a
physical entity that can be converted into an image
© ISO/IEC 2022 – All rights reserved
3.1.7
internet of media things
IoMT
special subset of IoT (3.2.9) whose main functionalities are related to media processing
3.1.8
IoMT device
IoT (3.2.9) device that contains more than one MThing (3.1.12)
3.1.9
IoMT system
MSystem
IoT (3.2.9) system whose main functionality is related to media processing
3.1.10
loudspeaker
electroacoustic device, connected as a component in an audio system, generating audible acoustic waves
3.1.11
media
data that can be rendered, including audio, video, text, graphics, images, haptic and tactile information
Note 1 to entry: These data can be timed or non-timed.
3.1.12
media thing
MThing
thing (3.2.20) capable of sensing, acquiring, actuating, or processing of media or metadata
3.1.13
media token
virtual token for accessing functionalities, resources and data of media things
3.1.14
microphone
entity capable of capture and transform acoustic waves into changes in electric currents or voltage,
used in recording or transmitting sound
3.1.15
media wearable
MWearable
MThing (3.1.12) intended to be located near, on or in an organism
3.1.16
motion
action or process of changing place or position
3.1.17
natural user interface
NUI
system for human-computer interaction that the user operates through intuitive actions related to
natural, everyday human behaviour
3.1.18
presentation
act of producing human recognizable output of rendered media
© ISO/IEC 2022 – All rights reserved
3.2 Internet of things terms
3.2.1
actuator
component which conveys digital information to effect a change of some property of a physical entity
3.2.2
capability
characteristic or property of an entity that can be used to describe its state, appearance or other aspects
EXAMPLE An entity type, address information, telephone number, a privilege, a MAC address, a domain
name are possible attributes, see ISO/IEC 24760-1.
3.2.3
component
modular, deployable and replaceable part of a system that encapsulates implementations
Note 1 to entry: A component may expose or use interfaces (local or on a network) to interact with other entities,
see ISO 19104. A component which exposes or uses network interfaces is called an endpoint.
3.2.4
digital entity
any computational or data element of an IT-based system
Note 1 to entry: It may exist as a service based in a data centre or cloud, or a network element or a gateway.
3.2.5
discovery
service to find unknown resources/entities/services based on a rough specification of the desired
result
Note 1 to entry: It may be utilized by a human or another service; credentials for authorization are considered
when executing the discovery, see ISO/IEC 30141.
3.2.6
entity
anything (physical or non-physical) having a distinct existence
3.2.7
identifier
information that unambiguously distinguishes one entity (3.2.6) from another one in a given identity
context
3.2.8
identity
characteristics determining who or what a person or thing is
3.2.9
internet of things
IoT
infrastructure of interconnected objects, people, systems and information resources together with
intelligent services to allow them to process information of the physical and the virtual world and to
react
3.2.10
interface
shared boundary between two functional components, defined by various characteristics pertaining
to the functions, physical interconnections, signal exchanges, and other characteristics, as appropriate
Note 1 to entry: See ISO/IEC 13066-1.
© ISO/IEC 2022 – All rights reserved
3.2.11
IoT system
system that is comprised of functions that provide the system the capabilities for identification, sensing,
actuation, communication and management, and applications and services to a user
[1]
Note 1 to entry: See Bahga and Madisetti .
3.2.12
network
entity that connects endpoints, sources to destinations, and may itself act as a value-added element in
the IoT system or services
3.2.13
process
procedure to carry out operations on data
3.2.14
physical entity
thing (3.2.20) that is discrete, identifiable and observable, and that has material existence in real world
3.2.15
reference architecture
description of common features, common vocabulary, guidelines, interrelations and interactions among
the entities, and a template for an IoT architecture
3.2.16
resource
any element of a data processing system needed to perform required operations
Note 1 to entry: See ISO/IEC 2382.
3.2.17
sensor
device that observes and measures a physical property of a natural phenomenon or a human induced
process and converts that measurement into a signal
Note 1 to entry: A signal can be electrical, chemical, etc., see ISO/IEC 29182-2.
3.2.18
service
distinct part of the functionality that is provided by an entity through interfaces
3.2.19
storage
capacity of a digital entity to store information subject to recall or the components of a digital entity in
which such information is stored
3.2.20
thing
any entity that can communicate with other entities
3.2.21
user
human or any digital entity that is interested in interacting with a particular physical object
3.2.23
visual
any object perceptible by the sense of sight
© ISO/IEC 2022 – All rights reserved
4 Architecture
The global IoMT architecture is presented in Figure 1, which identifies a set of interfaces, protocols and
associated media-related information representations related to:
— user commands (setup information) between a system manager and an MThing, with reference to
interface 1.
— user commands (setup information) forwarded by an MThing to another MThing, possibly in a
modified form (e.g., subset of 1), with reference to interface 1’.
— sensed data (raw or processed data) (compressed or semantic extraction) and actuation information,
with reference to Interface 2.
— wrapped interface 2 (e.g., for transmission), with reference to interface 2’.
— MThing characteristics, discovery, with reference to interface 3.
Figure 1 — IoMT architecture
This IoMT architecture can be mapped to the IoT reference architecture, see ISO/IEC 30141, as shown
in Annex A.
5 Use cases
5.1 General
MPEG identified 31 use-cases for IoMT; they are structured in the following five main categories:
a) Smart spaces: Monitoring and control with network of audio-video cameras (see 5.2)
— human tracking with multiple network cameras
— dangerous region surveillance system
— intelligent firefighting with IP surveillance cameras
— automatic security alert generation system using, time, GPS and visual information
— networked digital signs for customized advertisement
— digital signage and second screen use
— self-adaptive quality of experience for multimedia applications
— ultra-wide viewing video composition
— face recognition to evoke sensorial actuations
— automatic video clip generation by detecting event information
© ISO/IEC 2022 – All rights reserved
— temporal synchronization of multiple videos for creating 360° or multiple view video
— intelligent similar content recommendations using information from IoMT devices
— safety equipment detection in construction sites
b) Smart spaces: Multi-modal guided navigation (see 5.3)
— blind person assistant system
— elderly people assistance with consecutive vibration haptic devices
— personalized navigation by visual communication
— personalized tourist navigation with natural language functionalities
— smart identifier: face recognition on smart glasses
— smart advertisement: QR code recognition on smart glasses
c) Smart audio/video environments in smart cities (see 5.4)
— smart factory: car maintenance assistance A/V system using smart glasses
— smart museum: augmented visit museum using smart glasses
— smart house: light control, vibrating subtitle, olfaction media content consumption
— smart car: head-light adjustment and speed monitoring to provide automatic volume control
d) Smart multi-modal collaborative health (see 5.5)
— increasing patient autonomy by remote control of left-ventricular assisted devices
— diabetic coma prevention by monitoring networks of in-body/near body sensors
— enhanced physical activity with smart fabrics networks
— medical assistance with smart glasses
— managing healthcare information for smart glass
— indoor air quality prediction
e) Blockchain usage for IoMT transactions authentication and monetizing (see 5.6)
— reward function in IoMT by using blockchains
— content authentication with blockchains
5.2 Smart spaces: Monitoring and control with network of audio-video cameras
5.2.1 General
The large variety of sensors, actuators, displays and computational elements acting in our day-by-day
professional and private space in order to provide us with better and easier accessible services lead to
13 use cases of interest for IoMT, mainly related to the processing of video information.
5.2.2 Human tracking with multiple network cameras
As urban growth is today accompanied by an increase in crimes rate (e.g., theft, vandalism), many
local authorities consider surveillance systems as a possible tool to fight this phenomenon. A city video
© ISO/IEC 2022 – All rights reserved
surveillance system is an IoMT system that includes a set of IP surveillance cameras, a storage unit and
a human tracker unit.
A particular IP surveillance camera captures audio-video data and send them to both the storage and
the human tracker unit. When the human tracker detects a person, it traces the person and extract the
moving trajectory.
If the person gets out of the visual scope of the first IP camera but stay in the area protected by the
city video surveillance system, another IP camera from this system can take over the control and keep
capturing A/V data of the corresponding person.
If the person gets out of the protected area, for example the person enters into a commercial centre, then
the city system searches whether this commercial centre is also equipped with a video surveillance
system. Should this be the case, the city video surveillance system sets up a communication with the
commercial centre video surveillance system in order to allow another IP camera from the commercial
centre video surveillance centre to keep capturing A/V data of the corresponding person.
In both cases, the specific descriptors (e.g., moving trajectory information, appearance information,
media locations of detected moments) can be extracted and sent to the storage.
5.2.3 Dangerous region surveillance system
IoMT can serve as a basis for developing intelligent alerting services providing information and/or
alerts when a person approaches danger zones, for accident prevention. For instance, Figure 2 illustrates
the case of a home (private) environment where a child plays (cf. Figure 2.(1)). Heterogeneous IoMT
data (e.g. video, depth, audio, temperature) are analyzed to automatically generate an alert if the child
approaches the dangerous area around a hot oven (cf. Figure 2.(2)).
(1) illustrates the case of a private environment
(2) illustrates the usage of IoMT for preventing dangerous situations
Figure 2 — Example use-case of dangerous area surveillance system operating in a private
(home) environment
5.2.4 Intelligent firefighting with IP surveillance cameras
Figure 3 illustrates an example use-case of intelligent firefighting with IP surveillance cameras. In
this case, the fire station and the security manager can rapidly receive the fire/smoke detection alert,
thereby averting a potential fire hazard. Unlike conventional security systems, the outdoor scene
© ISO/IEC 2022 – All rights reserved
captured by intelligent IP surveillance cameras is immediately analysed and the fire/smoke incident is
automatically alerted to the fire station based on the analysed results of the captured scene.
Figure 3 — Example use-case of intelligent firefighting
5.2.5 Automatic security alert and title generation system using, time, GPS and visual
information
In the sustainable smart city of Seoul, IoMT cameras (smart CCTV) are deployed around the city. These
cameras are continuously capturing video (24 hours/7 days). When unusual events such as a violent
scene, crowd scene, theft scene or busking scene occurs, the title generator (event description generator)
generates a security alert for immediate intervention. Additionally, a title for the video clip with time
and place information is also generated in real-time. The generated title is stored with the video clip
in MStorage. As an example scenario, consider a CCTV capturing videos (visual data), with time and
GPS information. The title generator analyses the video stream, selects a keyframe and combines time,
GPS and keyframe to generate a formatted title. The captured video with the generated title is sent to
storage.
5.2.6 Networked digital signs for customized advertisement
A camera can be either attached to or embedded in a digital screen displaying advertising content, so as
to be able to capture A/V data and send them to both a storage unit and a gaze tracking/ROI analysing
unit. When the gaze tracking/ROI analyser detects a person in front of the corresponding digital sign, it
starts to trace the eye position, calculates the corresponding region of interest on the currently played
advertisement, and deduces the person’s current interest (e.g., goods) on the advertisement. When the
person moves to the other digital sign, that new sign starts playing relevant advertisement according to
the estimated person’s interest data.
5.2.7 Digital signage and second screen use
This use case addresses the pedestrians who want to get additional information (e.g., product
information, characters, places) of content displayed on digital signs with their mobile phones (i.e.,
second screens), as illustrated in Figure 4.
© ISO/IEC 2022 – All rights reserved
Figure 4 — Display signage and second screen use-case
5.2.8 Self-adaptive quality of experience for multimedia applications
The self-adaptive multimedia application is an application working on wearable device with a
middleware providing optimal quality of services (QoS) performance for each application, according to
the static/dynamic status of the application and/or system resources.
The user initially starts the self-adaptive multimedia application and updates the initial setup to
guarantee the application’s performance quality in a wearable device. The self-adaptive application
needs the static/dynamic status information between the wearable device and processing unit. And
then the self-adaptive application is normally running on wearable devices until a status change/update
event is generated. These events happen at the moment of detection of a performance level decrease
and then the status information request is sent to the processing unit.
The processing unit can support a heterogeneous type of wearable devices and it includes static/
dynamic system manager to optimize computing performance. The processing unit performs resource
management optimally, based on the performance requirement of self-adaptive application.
5.2.9 Ultra-wide viewing video composition
The ultra-wide viewing video composition is possible thanks to the videos captured from multiple
cameras equipped with multiple sensors (time, accelerator, gyro, GPS, and compass) along with a video
composer, storages and display devices as MThings.
5.2.10 Face recognition to evoke sensorial actuations
An IP surveillance camera captures audio-video data and send them to both a storage unit and a face
recognizer unit. When the face recognizer detects and recognizes the face of a pre-registered person,
it activates a scent generator to spray some specific scent. The specific descriptors (e.g., detected face
locations, face descriptors, media locations of detected moments) can be alternatively extracted and
sent to a storage unit. In this use case, the scent generator can by replaced by any type of actuators (e.g.,
light bulbs, displays, music players).
5.2.11 Automatic video clip generation by detecting event information
This use case describes automatic video clip generation by detecting event information from audio/
video streaming feed from a video camera. Usually, family or friends hold many events such as birthday
parties, wedding anniversaries or pyjama parties. By using surveillance cameras, these events can be
detected and pictures or videos taken at the event can be used to make a time-lapse video.
5.2.12 Temporal synchronization of multiple videos for creating 360° or multiple view video
A new video can be created by using videos captured by multiple cameras. Any camera has its own
local clock with various sensors and can record the shooting time based on the local clock. As each
© ISO/IEC 2022 – All rights reserved
camera has a different timeline, when creating a new video (e.g. 360° video) using time information
(e.g., stitching) from two different devices, some errors are likely to occur. The time-offset information
between individual videos can be cancelled by performing temporal synchronization using visual and/
or audio information with sensor data, thus obtaining a natural-looking video.
Moreover, if individual videos are transmitted through the network, people can watch the videos taken
from various viewpoints of the event. This means someone can watch just one video whilst another
watches multiple videos at the same time, and someone else can alternately watch videos.
5.2.13 Intelligent similar content recommendations using information from IoMT devices
Currently, video content taken by individuals with unprofessional cameras, smartphones, etc. is
commonly found on various internet resources, from social networks to video sharing systems. Such
content is very heterogeneous: concerts, sports games, unboxing videos of new products, etc. While,
for a person, it is practically impossible to provide precise and rich recommendations of video content,
with intelligent similar content recommendation systems, users can easily have the choice of a large
variety of content related to the content they already posted.
To recommend similar content, metadata of the video content is needed. The metadata of the video
content is generated using the position (GPS) captured by a specific individual, visual, auditory and
time information of that video.
5.2.14 Safety equipment detection on construction sites
Construction is a dynamic process that requires constant information support. As a result, organizing,
monitoring and implementing a construction project including its various safety, security, logistics,
inspection and other aspects can be very challenging.
Within this framework, the use of IoMT coupled to devoted AI solutions can play a significant role in
keeping a safe environment on construction sites. On a construction site, wearing proper equipment
is essential for safety: hence, a network of IoMT cameras and analysers can be used for first detecting
persons in the hazardous areas and then identifying whether the appropriate safety equipment (helmet,
gloves, boots, etc.) is worn correctly. Assuming a safety concern arises, a notification is sent to the site
manager for immediate action.
Such a use case can be extended beyond construction sites, to various places where detection of wearing
safety equipment is important, such as on a ship, in a hospital, or on a manufacturing site.
5.3 Smart spaces: Multi-modal guided navigation
5.3.1 General
This clause regroups 6 use cases to illustrate the way in which multimodal information can be processed
and fused inside IoMT systems in order to provide the user with an enhanced navigation experience.
5.3.2 Blind person assistant system
The navigation in smart spaces can help blind and visually-impaired persons in many ways, for instance
by providing them with information about possible collisions, with guiding directions or the position of
local landmarks.
Collision warning: A blind person carries a smart cane, a vibration band, a smart phone and a
networked headphone. The smart cane equipped with distance sensors (e.g., an ultrasonic sensor,
an infrared sensor) can measure the distance between the cane and obstacles in front. A collision
coordinating unit receives the distance data and decides the actions to be taken. If the distance is
reasonably far, an alarming text data of the corresponding distance (e.g., “5 metres before colliding
obstacles ahead.”) is produced by the collision coordinator and sent to a text-to-speech generating unit.
The text-to-speech generator creates the corresponding audio file and sends its URL to a networked
headphone. The headphone plays the corresponding audio files to the blind person. If the distance is
© ISO/IEC 2022 – All rights reserved
really close, the collision coordinator activates either a wrist band to vibrate or the headphone to create
beeping sounds.
Guiding direction: Assume that a blind person travels to a destination. The global navigation can be
provided by any web service. However, the local navigation can be enhanced by RFID tags that contain
exact location coordinates. The RFID tags can be embedded in every street corner. The blind person
carries a smart cane, a smart phone and a networked headphone. The smart cane is equipped with an
RFID reader, some inertia sensors (e.g., a gyro, a compass). The RFID reader can read the RFID tags
embedded in every street corner. A direction guiding unit receives the RFID tag data and retrieves the
current location of the blind person. Combining with the other inertia information, the direction guider
creates directional guidance (e.g., “turn left”, “turn left a little more”, “OK, go straight”) and sends it to
a text-to-speech generating unit. The text-to-speech generator creates the corresponding audio file and
sends its URL to a networked headphone. The headphone plays the corresponding audio files to the
blind person.
Informing local landmarks: Assume that a blind person arrives at a destination. The blind person
wears a smart glass equipped with a camera, a smart phone and a networked headphone. The camera
(MThing camera) takes an image shot in front of the person and sends it to a visual feature extracting
module. The visual feature extractor extracts feature data from the image and sends again to a landmark
matching unit. The landmark matcher compares the feature data from the database and retrieves the
name of the landmark the person is watching. Upon the retrieved name, the landmark matcher creates
landmark name guidance (e.g., “you are in front of the burger restaurant”) and sends it to a text-to-
speech generating unit. The text-to-speech generator creates the corresponding audio file and sends its
URL to a networked headphone. The headphone plays the corresponding audio files to the blind person.
5.3.3 Elderly people assistance with consecutive vibration haptic devices
Elderly people suffer from declining audiovisual senses rather than of their tactile senses. Hence,
information delivery via tactile senses would be a promising approach to enhance the day-to-day
comfort of such people. A smart vibration device able to convey rich information (thanks to advanced
consecutive vibrations) by stimulating human skin can be designed using MPEG technologies. If a senior
citizen wears a smart device combining various IoMT (e.g. video camera, depth camera, accelerometer,),
as illustrated in Figure 5 a), by analyzing the information sensed, an orchestrated spatio-temporal
sequence of consecutive vibrations can be produced to inform that person about the trajectory to follow
as illustrated in Figure 5 b).
Figure 5 — Elderly people assistance with consecutive vibro-haptic devices
5.3.4 Personalized navigation by visual communication
Visual messages can improve the efficiency of the interaction between the user and the wearable device
when the display resources are very restricted. In a visual communication everyone can intuitively
© ISO/IEC 2022 – All rights reserved
understand a pictogram, so that people can easily express an implicit meaning or an ambiguous
emotion.
The rough map program on wearable devices is executed by a user who is travelling abroad. The user
is locating visual objects such as characters, restaurants, and attractions on a rough map and presses
the button to take a tourist route which is recommended by a processing unit. The wearable device is
transmitting data (which consists of information related to a visual object reflecting user intentions and
context sensed by sensors, e.g., location, weather, time and temperature) to processing units or servers
to request recommendations. The processing unit makes a recommendation including a visual object,
service information and a tourist route based on the processing data received from the wearable device
and sends a recommendation to the wearable device. The wearable device displays the recommended
tourist route according to the processed information by using visual objects.
5.3.5 Personalized tourist navigation with natural language functionalities
The natural language functionalities can serve as a precious tool in improving the comfort of a tourist
travelling abroad. The present use-case illustrates the usage of speech translation, questioning-
answering and multimodal interaction.
Speech translation: Speech translation for people of different languages is a very convenient service
in the multi-cultural, multi-lingual society and in a global environment. Evolving from being delivered
on a PC, laptop or tablet to smartphone, speech translation systems are getting even more usable with
wearable devices. When a user speaks to the microphone embedded in the smart watch or headphone in
one language, an automatic translator can be activated to enable a conversation with a person speaking
a different language.
The result of the translation can be heard by the user of the target language through the wearable
device. The translation engine is either in the remote server (remote translation system) or in the
smartphone (stand-alone translation system) which is connected to the wearable device. With
the wearable translation service, the user is able to use their hands freely while the conversation is
translated. The wearable device is also used for automatically finding someone who can speak one of
the languages which is embedded in the translation system in a travelling situation.
Question-answering: QA is an advanced function to generate answers for the user’s question in a
natural language. More systems in the future are expected to be equipped with QA functions for an
advanced user experience. Consider the case of a user
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...