Health informatics - Omics Markup Language (OML) (ISO 21393:2021)

Basically OML is the data exchanging format that is designed to facilitate exchanging the
omics data around the world without forcing to change any database schema.
- From Informatics side of view, OML is the data exchanging format based on XML. Here the
data exchanging format in the messaging and communication is in the scope, but the
database schema itself is out of the scope of this document.
- From biological side of view, all kinds of omics are in consideration and are in the scope of
this document, the genomic sequence variations and the whole genomic sequence are out of
the scope of this document.
- In otherwise, the annotations as clinical concerns and the relation with other omics concerns
are in the scope of this document.
- Though omics exist in various biological species, the scope of this document is in the
human health associated species as human, cell line, and preclinical animals. The other
biological species are out of the scope of this document.
- The clinical field is in the scope of this document, but the basic research fields and other
scientific fields are out of the scope of this document.
- Here the clinical trials including drug discovery is in the scope of this document. As for
supposed application fields, our main focus is in human health including clinical practice,
preventive medicine, translational research, and clinical researches.

Medizinische Informatik - OMICS Auszeichnungssprache (OML) (ISO 21393:2021)

Informatique de santé — Langage de balisage Omics (OML) (ISO 21393:2021)

Zdravstvena informatika - Označevalski jezik OMICS (OML) (ISO 21393:2021)

General Information

Status
Published
Public Enquiry End Date
19-Sep-2019
Publication Date
08-Sep-2021
Technical Committee
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
19-Aug-2021
Due Date
24-Oct-2021
Completion Date
09-Sep-2021

Buy Standard

Standard
SIST EN ISO 21393:2021 - BARVE
English language
55 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Draft
oSIST prEN ISO 21393:2019 - BARVE
English language
93 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day

Standards Content (sample)

SLOVENSKI STANDARD
SIST EN ISO 21393:2021
01-oktober-2021
Zdravstvena informatika - Označevalski jezik OMICS (OML) (ISO 21393:2021)
Health informatics - Omics Markup Language (OML) (ISO 21393:2021)
Medizinische Informatik - OMICS Auszeichnungssprache (OML) (ISO 21393:2021)
Informatique de santé — Langage de balisage Omics (OML) (ISO 21393:2021)
Ta slovenski standard je istoveten z: EN ISO 21393:2021
ICS:
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
35.240.80 Uporabniške rešitve IT v IT applications in health care
zdravstveni tehniki technology
SIST EN ISO 21393:2021 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST EN ISO 21393:2021
---------------------- Page: 2 ----------------------
SIST EN ISO 21393:2021
EN ISO 21393
EUROPEAN STANDARD
NORME EUROPÉENNE
August 2021
EUROPÄISCHE NORM
ICS 35.240.80
English Version
Genomics informatics - Omics Markup Language (OML)
(ISO 21393:2021)

Informatique génomique - Langage de balisage Omics Medizinische Informatik - OMICS

(OML) (ISO 21393:2021) Auszeichnungssprache (OML) (ISO 21393:2021)
This European Standard was approved by CEN on 29 September 2020.

CEN members are bound to comply with the CEN/CENELEC Internal Regulations which stipulate the conditions for giving this

European Standard the status of a national standard without any alteration. Up-to-date lists and bibliographical references

concerning such national standards may be obtained on application to the CEN-CENELEC Management Centre or to any CEN

member.

This European Standard exists in three official versions (English, French, German). A version in any other language made by

translation under the responsibility of a CEN member into its own language and notified to the CEN-CENELEC Management

Centre has the same status as the official versions.

CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia,

Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway,

Poland, Portugal, Republic of North Macedonia, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey and

United Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION
COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
CEN-CENELEC Management Centre: Rue de la Science 23, B-1040 Brussels

© 2021 CEN All rights of exploitation in any form and by any means reserved Ref. No. EN ISO 21393:2021 E

worldwide for CEN national Members.
---------------------- Page: 3 ----------------------
SIST EN ISO 21393:2021
EN ISO 21393:2021 (E)
Contents Page

European foreword ....................................................................................................................................................... 3

---------------------- Page: 4 ----------------------
SIST EN ISO 21393:2021
EN ISO 21393:2021 (E)
European foreword

This document (EN ISO 21393:2021) has been prepared by Technical Committee ISO/TC 215 "Health

informatics" in collaboration with Technical Committee CEN/TC 251 “Health informatics” the

secretariat of which is held by NEN.

This European Standard shall be given the status of a national standard, either by publication of an

identical text or by endorsement, at the latest by February 2022, and conflicting national standards

shall be withdrawn at the latest by February 2022.

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. CEN shall not be held responsible for identifying any or all such patent rights.

Any feedback and questions on this document should be directed to the users’ national standards

body/national committee. A complete listing of these bodies can be found on the CEN websites.

According to the CEN-CENELEC Internal Regulations, the national standards organizations of the

following countries are bound to implement this European Standard: Austria, Belgium, Bulgaria,

Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland,

Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Republic of

North Macedonia, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey and the

United Kingdom.
Endorsement notice

The text of ISO 21393:2021 has been approved by CEN as EN ISO 21393:2021 without any modification.

---------------------- Page: 5 ----------------------
SIST EN ISO 21393:2021
---------------------- Page: 6 ----------------------
SIST EN ISO 21393:2021
INTERNATIONAL ISO
STANDARD 21393
First edition
2021-07
Genomics informatics — Omics
Markup Language (OML)
Informatique génomique — Langage de balisage Omics (OML)
Reference number
ISO 21393:2021(E)
ISO 2021
---------------------- Page: 7 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 OML specification ................................................................................................................................................................................................ 6

4.1 Specification requirements and OML positioning .................................................................................................... 6

4.2 OML Structure ........................................................................................................................................................................................ 6

4.3 OML DTD and XML Schema.......................................................................................................................................................... 7

5 OML development process ........................................................................................................................................................................ 7

6 Figures ............................................................................................................................................................................................................................. 8

Annex A (informative) Reference works ........................................................................................................................................................28

Bibliography .............................................................................................................................................................................................................................45

© ISO 2021 – All rights reserved iii
---------------------- Page: 9 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www .iso .org/ patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee

SC 1, Genomics informatics, in collaboration with the European Committee for Standardization (CEN)

Technical Committee CEN/TC 251, Health informatics, in accordance with the Agreement on technical

cooperation between ISO and CEN (Vienna Agreement).

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
Introduction

In this post genomic era, the management of health-related data is becoming increasingly important

[1]

to both omics research and omics-based medicine. Informational approaches to the management of

clinical, image and omics data are beginning to have as much worth as basic, bench top research. In the

current electronic world, there are multiple different types of data for healthcare as shown in Figure 1.

Besides, nowadays there are many kinds of omics data around the world awaiting effective utilization

for human health. The development of data format and message standards to support the interchange

of clinical omics data is necessary. Omics data includes omics sequence, sequence variation and other

expression data, proteomics data, molecular network, etc. As an entry point, this document focuses on

the data exchange.

In the present circumstances, omics is expected to be a key to understand human response to external

[2]

stimuli such as any kinds of alien invasions, therapies, and the environmental interactions. Bacterial

infection is an example of alien invasion, and the responses to the infections are different among the

individuals. According to the therapy, the side effects to a drug are different among the patients. These

responses are also different in various environments. As a result of recent explosive amount of these

omics researches, the huge amounts of experimental data have been accumulating in many databases

in various types of data formats. These data are waiting to be used in drug discovery, clinical diagnosis,

and clinical researches.

The Markup Language is a set of symbols and rules for their use when doing a markup of a document.

[3] [4]

The first standardized markup language was ISO 8879 onGeneralized Markup Language (SGML)

which has strong similarities with troff and nroff text layout languages supplied with Unix systems.

[5]

Hypertext Markup Language (HTML) is based on SGML. Extensible Markup Language (XML) is

[6]

a pared-down version of SGML, designed especially for Web documents. XML acts as the basis for

[7] [8]

Extensible HTML (XHTML) and Wireless Markup Language (WML) and for standardized definitions

[9]

of system interaction such as Simple Object Access Protocol (SOAP). By contrast, text layout or

semantics are often defined in a purely machine-interpretable form, as in most word processor file

[10]
formats .

Markup Language for the biomedical field, based on XML, has been in development for several

decades to enhance the exchange data among researchers. Bioinformatic Sequence Markup Language

[11] [12] [13]

(BSML), Systems Biology Markup Language (SBML), Cell Markup Language (Cell ML), and

[14]

Neuro Markup Language (Neuro-ML) are examples of markup languages. Polymorphism Mining

[15]

and Annotation Programs (PolyMAPr) is centric on SNP and tries to achieve mining, annotation,

[16] [17] [18]

and functional analysis of public database as dbSNP, CGAP, and JSNP through programming.

ISO 25720 Genomic Sequence Variation Markup Language (GSVML) is the first standardized ML for

clinical genomic sequence variation data exchange.

The purpose of Omics Markup Language (OML) is to provide a standardized data exchange format for

omics in human health.

The recent expansion in omics research has produced large quantities of data held in many databases

with different formats. Standardization of data exchange is necessary for managing, analysing and

utilizing these data. Considering that omics, especially transcriptomics, proteomics, signalomics and

metabolomics, has significant meaning in molecular-based medicine and pharmacogenomics, the data

exchange format is key to enhancing omics-based clinical research and omics-based medicine.

Recently, informational approaches have become more important to both omics research and omics-

based medicine. The management of omics data is as critical as basic research data in this new era.

There are many kinds of omics data around the world, and the time has come to effectively use this

omics data for human health. To use this data effectively and efficiently, standards should be developed

to permit the interoperable interchange of omics data globally. These standards should define the data

format as well as the messages that would be used to interchange and share this data globally.

OML is a base frame of all kinds of clinical omics data. Each omics category will be introduced as a

specific add on component part. As an instance, Whole Genome sequence Markup Language will be

© ISO 2021 – All rights reserved v
---------------------- Page: 11 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)

a specific add on component part for whole genome sequence data, and Genomic Sequence Variation

Markup Language will be a specific add on component part for genomic sequence variation data.

To utilize the internationally accumulated omics data, standards for the interchange of omics data

should be defined. These standards should define a data format and exchange messages. Markup

Language is a reasonable choice to address this need. As for omics data message handling, Health Level

1) [19]

Seven® Clinical Genomics Work Group has summarized clinical use cases for general omics data.

The OML project has contributed to these efforts. Additionally, this work incorporated use cases based

[20]

on the Japanese millennium project. Based on these contexts and investigations, this document

elucidates the needs and the requirements for OML and after then proposes the specification of OML for

the international standardization based on the elucidated needs and the requirements.

1) Health Level Seven (HL7) is the registered trademark of Health Level Seven International. This information is

given for the convenience of users of this document and does not constitute an endorsement by ISO of the product

named.
vi © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
SIST EN ISO 21393:2021
INTERNATIONAL STANDARD ISO 21393:2021(E)
Genomics informatics — Omics Markup Language (OML)
1 Scope

This document is applicable to the data exchange format that is designed to facilitate exchanging omics

data around the world without forcing changes of any database schema.

This document specifies the characteristics of OML from the following perspectives.

From an informatics perspective, OML defines the data exchange format based on XML. This document

gives guidelines for the specifications of the data exchange format, but this document excludes the

database schema itself.

From a molecular side of view, this document is applicable to all kinds of omics data, while this

document excludes the details of the molecules (e.g., details of genomic sequence variations or whole

genomic sequence). This document is also applicable to the molecular annotations including clinical

concerns and relations with other omics concerns.

From an application side of view, this document is applicable to the clinical field including clinical

practice, preventive medicine, translational research, and clinical research including drug discovery.

This document does not apply to basic research and other scientific fields.

From a biological species side of view, this document is applicable to the human health-associated

species as human, preclinical animals, and cell lines. This document does not apply to the other

biological species.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
actor
something or someone who supplies a stimulus to the system

Note 1 to entry: Actors include both humans and other quasi-autonomous things, such as machines, computer

tasks and systems.
[SOURCE: ISO 25720:2009, 4.1]
3.2
allele

gene that is found in one of two or more different forms in the same position in a chromosome

© ISO 2021 – All rights reserved 1
---------------------- Page: 13 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.3
bioinformatic sequence markup language
BSML
extensible language specification and container for bioinformatic data
[SOURCE: ISO 25720:2009, 4.2]
3.4
cancer genome anatomy project
CGAP

genomic expression data collected for various tumorigenic tissues in both humans and mice

Note 1 to entry: CGAP also provides information on methods and reagents used in deriving the genomic data

[SOURCE: ISO 25720:2009, 4.4, modified]
3.5
codon

sequence of three nucleotides which together form a unit of genetic code in a DNA or RNA molecule

3.6
dbSNP

database of single nucleotide polymorphisms (3.29) provided by the US National Center for Biotechnology

Information (NCBI)
Note 1 to entry: Available at https:// www .ncbi .nlm .nih .gov/ SNP/ .
[SOURCE: ISO/TS 20428:2017, 3.9]
3.7
digital imaging and communications in medicine
DICOM

standard in the field of medical informatics for exchanging digital information between medical

imaging equipment (such as radiological imaging) and other systems, ensuring interoperability

[SOURCE: ISO 25720:2009, 4.6]
3.8
DNA sequence variation
differences of DNA sequence among individuals in a population
Note 1 to entry: DNA sequence variation implies polymorphism 3.25.
[SOURCE: ISO 25720:2009, 4.8]
3.9
document type definition
DTD

document that contains formal definitions of all of the data elements in a particular type of hypertext

markup language 3.13, standard generalized markup language (3.29), or extensible markup language

(3.36) document
[SOURCE: ISO 25720:2009, 4.9]
3.10
entry point

reference point that designate the class(es) from which the messages begin for the domain

[SOURCE: ISO 25720:2009, 4.10, modified]
2 © ISO 2021 – All rights reserved
---------------------- Page: 14 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.11
exon

part of a gene that will encode a part of the final mature RNA produced by that gene after introns (3.16)

have been removed by RNA splicing
3.12
genomic sequence variation markup language
GSVML
standard for data exchange of genomic sequence variation data
3.13
hypertext markup language
HTML

set of markup symbols or codes inserted in a file intended for display in a browser

[SOURCE: ISO 25720:2009, 4.12, modified]
3.14
international classification of diseases
ICD
diagnose coding system for epidemiology, health management and clinical purposes
Note 1 to entry: ICD-10 is the 10th revision and ICD-11th is the 11th revision.
Note 2 to entry: available at https:// icd .who .int/ .
3.15

clinical omics sub-information model for international classification of diseases

clinical omics sub-information model for ICD
iCOS

sub-information model aiming to enhance the representation ability of ICD-11 contents model with

covering omics information as an add-on part.

Note 1 to entry: Add-on sub-information model to enhance the representation ability of ICD-11 contents model to

cover omics information.
3.16
intron

nucleotide sequence within a gene that is removed by RNA splicing during maturation of the final RNA

product
3.17
joint photographic experts group
JPEG
compression technique for images
[SOURCE: ISO 25720:2009, 4.13]
3.18
markup language
set of symbols and rules for their uses when doing a markup of a document
[SOURCE: ISO 25720:2009, 4.15]
3.19
microarray gene expression markup language
MAGE-ML

data format for describing information about DNA-array based experiments and gene expression data

© ISO 2021 – All rights reserved 3
---------------------- Page: 15 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.20
neuro markup language
neuro-ML
markup language (3.18) for describing models of neurons and networks of neurons.
[SOURCE: ISO 25720:2009, 4.16]
3.21
nroff

unix text-formatting program that is a predecessor of the Unix troff (3.33) document processing system

[SOURCE: ISO 25720:2009, 4.17]
3.22
omics
field of study in biology ending in -omics

Note 1 to entry: It includes, but is not limited to, genomics, proteomics, and metabolomics.

3.23
pharmacogenomics

branch of pharmaceutics aiming to develop rational means to optimize drug therapy, with respect to

the patient's genotype
3.24
polymorphism mining and annotation programs
PolyMAPr

programs for polymorphism (3.25) database mining, annotation, and functional analysis

[SOURCE: ISO 25720:2009, 4.19]
3.25
polymorphism
variation in the sequence of DNA among individuals

Note 1 to entry: Polymorphism implies single nucleotide polymorphism (3.29) and short tandem repeat

polymorphism (3.32) .
[SOURCE: ISO 25720:2009, 4.20]
3.26
RNA markup language
RNAML
data format for exchanging RNA information
3.27
systems biology markup language
SBML
markup language (3.18) for simulations in systems biology
[SOURCE: ISO 25720:2009, 4.21]
3.28
standard generalized markup language
SGML

markup language (3.18) for document representation that formalizes markup and frees it of system and

processing dependencies
[SOURCE: ISO 8879:1986, 4.305, modified]
4 © ISO 2021 – All rights reserved
---------------------- Page: 16 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.29
single nucleotide polymorphism
SNP

single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population

[SOURCE: ISO 25720:2009, 4.23]
3.30
systematized nomenclature of medicine-clinical terms®
SNOMED-CT®

dynamic, scientifically validated clinical health care terminology and infrastructure

[SOURCE: ISO 25720:2009, 4.24]
3.31
simple object access protocol
SOAP

lightweight protocol for exchange of information in a decentralized, distributed environment

[SOURCE: ISO 25720:2009, 4.25]
3.32
short tandem repeat polymorphism
STRP
variable segments of DNA that are two to five bases long with numerous repeats
[SOURCE: ISO 25720:2009, 4.26]
3.33
troff

major component of a document processing system developed by AT&T for the Unix operating system

3.34
wireless markup language
WML

extensible markup language used to specify content and user interface for WAP (Wireless Application

Protocol) devices
[SOURCE: ISO 25720:2009, 4.29]
3.35
extensible HTML
XHTML

hybrid between hypertext markup language 3.13 and extensible markup language (3.36) specifically

designed for net device displays
[SOURCE: ISO 25720:2009, 4.30]
3.36
extensible markup language
XML

pared-down version of standard generalized markup language (3.29), designed especially for web

documents
[SOURCE: ISO 25720:2009, 4.31]

2) SNOMED CT is the registered trademark of International Health Terminology Standards Development

Organisation. This information is given for the convenience of users of this document and does not constitute an

endorsement by ISO of the product named.
© ISO 2021 – All rights reserved 5
---------------------- Page: 17 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.37
XML schema

language for describing the structure and constraining the contents of extensible markup language

documents
[SOURCE: ISO 25720:2009, 4.32]
4 OML specification
4.1 Specification requirements and OML positioning

In the current context, annotative information about omics is increasing and that information is

embedding the information holes. The omics data itself is also increasing but is stored in various

databases. The pitfall of omics data handling is the lack of standardization of the data formats for

the organized omics. Historically, markup languages have been used, and programs are developed to

handle the omics information. However, there have been no omics centric markup languages so far.

OML is the first omics centric markup language and is human health centric. Considering that omics

has the great impact especially for human health and response, it can be said that OML has the greatest

potential to be the designated markup language for human healthcare. On the other hand, setting the

applications to practical human health means it shall handle direct or indirect annotations. Here the

direct annotation shall indicate general annotative information such as omics associated other omics

information and experimental preparations. The indirect annotation shall indicate all of omics data

and clinical data that result from omics data. To understand the omics based clinical situation of each

patient, these kinds of additional information is required. Considering the requirements to add many

kinds of additional information, the development and standardization of OML cannot stand alone and

shall need harmonization with the other documents from the other international standardization

organizations.

OML intends to be used in data exchange messages related to human health. In development and

standardization of OML in this application domain, keeping an eye on the patient safety, the clinical

efficiency, and the medical costs shall always be required. For the patient safety from an informational

side, the conservation and the protection of patient information shall be deemed important. For the

enhancement of the clinical efficiency, the simplicity and the easy understandability shall be deemed

important. For the medical cost reduction, the adaptation ability and installation ease shall be deemed

important.

OML tries to respond to these basic requirements by providing the sharable XML based data exchanging

format. OML can be used for the clinically omics data exchange among various types of data formats.

In the greater framework of clinical data standardization, OML shall play a part of describing the omics

data and its necessary information.
4.2 OML Structure

A valid OML expression shall be structured in accordance with the following, also see Figure 2:

— The outline structure of OML is shown in Figure 2.
OML shall consist of three data criteria:
— omics data;
— direct annotation;
— indirect annotation.
The omics data criterion shall describe, for each omics
the straight forward omics data as:
— type;
6 © ISO 2021 – All rights reserved
---------------------- Page: 18 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
— position;
— length;
— region;
— etc.
The direct annotation criterion shall describe, for each omics
the attached data of omics data as:
— experiment analys
...

SLOVENSKI STANDARD
oSIST prEN ISO 21393:2019
01-september-2019
Zdravstvena informatika - Označevalski jezik OMICS (OML) (ISO/DIS 21393:2019)
Health informatics - Omics Markup Language (OML) (ISO/DIS 21393:2019)
Medizinische Informatik - OMICS Auszeichnungssprache (OML) (ISO/DIS 21393:2019)
Informatique de santé — Langage de balisage Omics (OML) (ISO/DIS 21393:2019)
Ta slovenski standard je istoveten z: prEN ISO 21393
ICS:
35.240.80 Uporabniške rešitve IT v IT applications in health care
zdravstveni tehniki technology
oSIST prEN ISO 21393:2019 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
oSIST prEN ISO 21393:2019
---------------------- Page: 2 ----------------------
oSIST prEN ISO 21393:2019
DRAFT INTERNATIONAL STANDARD
ISO/DIS 21393
ISO/TC 215 Secretariat: ANSI
Voting begins on: Voting terminates on:
2019-07-16 2019-10-08
Health informatics — Omics Markup Language (OML)
Informatique de santé — Langage de balisage Omics (OML)
ICS: 35.240.80
THIS DOCUMENT IS A DRAFT CIRCULATED
This document is circulated as received from the committee secretariat.
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
ISO/CEN PARALLEL PROCESSING
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 21393:2019(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION. ISO 2019
---------------------- Page: 3 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2018[E]
ISO/DIS 21393:2019(E)
Contents Page

Foreword ..................................................................................................................................................................... iv

Introduction ................................................................................................................................................................. v

1 Scope ................................................................................................................................................................ 1

2 Normative references ................................................................................................................................ 1

3 Terms and definitions ................................................................................................................................ 1

4 OML specification ........................................................................................................................................ 7

4.1 Specification requirements and OML positioning (informative) ................................................. 7

4.2 OML Structure (normative) ...................................................................................................................... 8

4.3 OML DTD (informative) and XML Schema (normative) .................................................................. 8

5 OML development process (informative) ............................................................................................ 8

6 Figures ............................................................................................................................................................ 9

7 Tables ............................................................................................................................................................ 49

Annex A (informative) Reference Works ........................................................................................................ 78

A.1 Introduction ................................................................................................................................................ 78

A.2 Use case analysis ........................................................................................................................................ 78

A.2.1 Overview ...................................................................................................................................................... 79

A.2.2 Use case of SNP analysis as an example of Omics analysis ........................................................... 79

A.2.3 UML example of SNP analysis as an example of Omics analysis ................................................. 79

A.2.4 Use case of database integration .......................................................................................................... 80

A.2.5 Use case and required elements ........................................................................................................... 80

A.3 Diversity of SNP databases ..................................................................................................................... 80

A.3.1 Diversity of databases .............................................................................................................................. 81

A.3.2 Diversity of data representation .......................................................................................................... 81

A.3.3 Diversity of sequence variation data representation .................................................................... 81

A.4 Markup language comparison ............................................................................................................... 81

A.4.1 Mapping of each markup language to the data categories ........................................................... 82

A.4.2 OML originated needs and its specifications .................................................................................... 83

A.5 Interface analysis to Health Level Seven ............................................................................................ 83

A.5.1 Comparison with HL7 genomics model .............................................................................................. 83

A.5.2 Information Model of Genotype in HL7 .............................................................................................. 84

A.6 Interface analysis to CEN en ISO13606 ............................................................................................... 84

A.7 Interface analysis to SNOMED‐CT ......................................................................................................... 84

A.8 Interface analysis to WHO‐ICD iCOS .................................................................................................... 85

Bibliography.............................................................................................................................................................. 86

COPYRIGHT PROTECTED DOCUMENT
© ISO 2019

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
© ISO 2018 – All rights reserved
iii
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
Contents Page

Foreword ......................................................................................................................................................................... iv

Introduction ..................................................................................................................................................................... v

1 Scope ....................................................................................................................................................................1

2 Normative references ....................................................................................................................................1

3 Terms and definitions ....................................................................................................................................1

4 OML specification ............................................................................................................................................7

4.1 Specification requirements and OML positioning (informative) ...................................................7

4.2 OML Structure (normative)..........................................................................................................................8

4.3 OML DTD (informative) and XML Schema (normative) .....................................................................8

5 OML development process (informative) ...............................................................................................8

6 Figures .................................................................................................................................................................9

7 Tables ................................................................................................................................................................ 49

Annex A (informative) Reference Works .......................................................................................................... 78

A.1 Introduction.................................................................................................................................................... 78

A.2 Use case analysis ........................................................................................................................................... 78

A.2.1 Overview .......................................................................................................................................................... 79

A.2.2 Use case of SNP analysis as an example of Omics analysis ............................................................ 79

A.2.3 UML example of SNP analysis as an example of Omics analysis .................................................. 79

A.2.4 Use case of database integration ............................................................................................................. 80

A.2.5 Use case and required elements.............................................................................................................. 80

A.3 Diversity of SNP databases ........................................................................................................................ 80

A.3.1 Diversity of databases ................................................................................................................................. 81

A.3.2 Diversity of data representation ............................................................................................................. 81

A.3.3 Diversity of sequence variation data representation ...................................................................... 81

A.4 Markup language comparison ................................................................................................................. 81

A.4.1 Mapping of each markup language to the data categories ............................................................ 82

A.4.2 OML originated needs and its specifications ...................................................................................... 83

A.5 Interface analysis to Health Level Seven .............................................................................................. 83

A.5.1 Comparison with HL7 genomics model ................................................................................................ 83

A.5.2 Information Model of Genotype in HL7 ................................................................................................ 84

A.6 Interface analysis to CEN en ISO13606 ................................................................................................. 84

A.7 Interface analysis to SNOMED-CT ........................................................................................................... 84

A.8 Interface analysis to WHO-ICD iCOS ....................................................................................................... 85

Bibliography ................................................................................................................................................................. 86

© ISO 2018 – All rights reserved
iii
---------------------- Page: 5 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO

collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any

patent rights identified during the development of the document will be in the Introduction and/or on

the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the World

Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.

This document was prepared by Technical Committee ISO/TC 125, Health Informatics, Subcommittee SC

1, Clinical Genomics.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2018 – All rights reserved
---------------------- Page: 6 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
Introduction

In this next generation post genomic era, the management of health-related data is becoming increasingly

important to both omics research and omics-based medicine [1]. Informational approaches to the

management of clinical, image and omics data are beginning to have as much worth as basic, bench top

research. Nowadays there are many kinds of omics data around the world awaiting effective utilization

for human health. The hurdle that must be overcome to achieve this goal is the development of data

format and message standards to support the interchange of clinical omics data. Omics data includes

omics sequence, sequence variation and other expression data, proteomics data, molecular network, etc.

As an entry point, this standard focuses on the data exchange.

In the present circumstances, omics is expected to be a key to understand human response to external

stimuli such as any kinds of alien invasions, therapies, and the environmental interactions [2]. Bacterial

infection is an example of alien invasion, and the responses to the infections are different among the

individuals. According to the therapy, the side effects to a drug are different among the patients. These

responses are also different in various environments. As a result of recent explosive amount of these

omics researches, the huge amounts of experimental data have been accumulating in many databases in

various types of data formats. These data are waiting to be used in drug discovery, clinical diagnosis, and

clinical researches.

The Markup Language is a set of symbols and rules for their use when doing a markup of a document [3].

The first standardized markup language was Standard Generalized Markup Language (SGML) [4] which

has strong similarities with troff and nroff text layout languages supplied with Unix systems. Hypertext

Markup Language (HTML) is based on SGML [5]. Extensible Markup Language (XML is a pared-down

version of SGML, designed especially for Web documents [6]. XML acts as the basis for Extensible HTML

(XHTML) [7] and Wireless Markup Language (WML) [8] and for standardized definitions of system

interaction such as Simple Object Access Protocol (SOAP) [9]. By contrast, text layout or semantics are

often defined in a purely machine-interpretable form, as in most word processor file formats [10].

Markup Language for the biomedical field, based on XML, has been in development for several decades

to enhance the exchange data among researchers. Bioinformatic Sequence Markup Language (BSML)

[11], Systems Biology Markup Language (SBML) [12], Cell Markup Language (Cell ML) [13], and Neuro

Markup Language (Neuro-ML) [14] are examples of markup languages. Polymorphism Mining and

Annotation Programs (PolyMAPr) [15] is centric on SNP and tries to achieve mining, annotation, and

functional analysis of public database as dbSNP [16], CGAP [17], and JSNP [18] through programming.

ISO 25720 Genomic Sequence Variation Markup Language (GSVML) is the first standardized ML for

clinical genomic sequence variation data exchange.

The purpose of Omics Markup Language (OML) is to provide a standardized data exchange format for

omics in human health.

The recent expansion in omics research has produced large quantities of data held in many databases

with different formats. Standardization of data exchange is necessary for managing, analysing and

utilizing these data. Considering that omics, especially transcriptomics, proteomics, signalomics and

metabolomics, has significant meaning in molecular-based medicine and pharmacogenomics, the data

exchange format is key to enhancing omics-based clinical research and omics-based medicine.

Recently, informational approaches have become more important to both omics research and omics-

based medicine. The management of omics data is as critical as basic research data in this new era. There

are many kinds of omics data around the world, and the time has come to effectively use this omics data

© ISO 2018 – All rights reserved
---------------------- Page: 7 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)

for human health. To use this data effectively and efficiently, standards must be developed to permit the

interoperable interchange of omics data globally. These standards must define the data format as well as

the messages to be used to interchange and share this data globally. This standard addresses those

requirements, using a markup language.

OML is a base frame of all kinds of clinical omics data. Each omics category will be introduced as a specific

add on component part. As an instance, Whole Genome sequence Markup Language will be a specific add

on component part for whole genome sequence data, and Genomic Sequence Variation Markup Language

will be a specific add on component part for genomic sequence variation data.

To utilize the accumulated omics data among many facilities around the world, standards for the

interchange of omics data must be defined. The required standards include defining a data format and

exchange messages. Markup Language is the reasonable choice to address this need. As for omics data

message handling, Health Level Seven Clinical Genomics Work Group [19] has summarized clinical use

cases for general omics data. The OML project has contributed to these efforts. Additionally, this work

incorporated use cases based on the Japanese millennium project [20] . Based on these contexts and

investigations, this document elucidates the needs and the requirements for OML and then proposes the

specification of OML for the international standardization.

A list of references related this part of ISO/DIS 21393 is given in the bibliography.

© ISO 2018 – All rights reserved
---------------------- Page: 8 ----------------------
oSIST prEN ISO 21393:2019
DIS ISO/DIS 21393:2019[E]
Health informatics — Omics Markup Language
1 Scope

OML is a data exchange format designed to facilitate exchanging omics data around the world without

forcing changes to existing databases.

From an informatics perspective, OML is an XML-based data exchange format. The data exchange format

(e.g., XML schema and DTD) is in scope. The structure of the systems and databases sending or receiving

the information schemas are out of the scope.

From a biological perspective, all kinds of omics are in scope, but the details (e.g., details of genomic

sequence variations or whole genomic sequence) are out of the scope. Annotations including clinical

concerns and relations with other omics concerns are in scope.

The application focus is human health including clinical practice, preventive medicine, translational

research, and clinical research including drug discovery. The scope includes health-associated species,

including human and preclinical animals, and associated cell lines. Other species, basic research, and

other scientific fields are out of scope.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 25720:2009, Health informatics -- Genomic Sequence Variation Markup Language (GSVML)

ISO/HL7 21731:2006, Health informatics – HL7 version 3 – Reference information model – Release 1

CEN EN 13606, Health informatics -- Electronic Healthcare Record Communication
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
actor
something or someone who supplies a stimulus to the system

Note to entry: Actors include both humans and other quasi-autonomous things, such as machines, computer tasks

and systems.
[SOURCE: ISO 25720:2009(en), 4.1]
3.2
allele

a gene that is found in one of two or more different forms in the same position in a chromosome

© ISO 2018 – All rights reserved
---------------------- Page: 9 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
3.3
BSML
bioinformatic sequence markup language
extensible language specification and container for bioinformatic data
[SOURCE: ISO 25720:2009(en), 4.2]
3.4
Cell ML
cell markup language
a standard for representing and exchanging computer-based biological models
[SOURCE: ISO 25720:2009(en), 4.3]
3.5
CGAP
Cancer Gene Anatomy Project

genomic expression data collected for various tumorigenic tissues in both humans and mice.

Note to entry: CGAP also provides information on methods and reagents used in deriving the genomic data

[SOURCE: ISO 25720:2009(en), 4.4]
3.6
codon

a sequence of three nucleotides which together form a unit of genetic code in a DNA or RNA molecule.

3.7
dbSNP

database of SNPs (4.29) provided by the US National Center for Biotechnology Information (NCBI)

Note to entry: available at https://www.ncbi.nlm.nih.gov/SNP/
[SOURCE: ISO/TS 20428:2017(en), 3.9]
3.8
DICOM
digital imaging and communications in medicine

a standard in the field of medical informatics for exchanging digital information between medical imaging

equipment (such as radiological imaging) and other systems, ensuring interoperability

[SOURCE: ISO 25720:2009(en), 4.6]
3.9
DNA
deoxyribonucleic acid
a molecule that encodes genetic information in the nucleus of cells
[SOURCE: ISO 25720:2009(en), 4.7]
3.10
DNA sequence variation
differences of DNA (4.8) sequence among individuals in a population
© ISO 2018 – All rights reserved
---------------------- Page: 10 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
Note to entry: DNA sequence variation implies polymorphism (4.xx)
[SOURCE: ISO 25720:2009(en), 4.8]
3.11
DTD
document type definition

a document that contains formal definitions of all of the data elements in a particular type of HTML (4.15),

SGML (4.28), or XML (4.38) document
[SOURCE: ISO 25720:2009(en), 4.9]
3.12
entry point

reference point that designate the class(es) from which the messages begin for the domain

[SOURCE: ISO 25720:2009(en), 4.10]
3.13
exon

any part of a gene that will encode a part of the final mature RNA produced by that gene after introns

have been removed by RNA splicing.
3.14
gene-based medicine
medicine based on genes or genetic science
[SOURCE: ISO 25720:2009(en), 4.11]
3.15
GSVML
genomic sequence variation markup language
a standard for data exchange of genomic sequence variation data
[SOURCE: ISO 25720:2009(en)]
3.16
HTML
Hypertext Markup Language

a set of markup symbols or codes inserted in a file intended for display in a browser

[SOURCE: ISO 25720:2009(en), 4.12]
3.17
ICD-11
international classification of diseases 11 revision

a standard diagnostic tool for epidemiology, health management and clinical purposes

Note to entry: available at https://icd.who.int/
© ISO 2018 – All rights reserved
---------------------- Page: 11 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
3.18
iCOS
clinical omics sub-information model for ICD

Note to entry: Add-on sub-information model to enhance the representation ability of ICD11 contents model to

cover omics information.
3.19
intron

any nucleotide sequence within a gene that is removed by RNA splicing during maturation of the final

RNA product
3.20
JPEG
joint photographic experts group
compression technique for images
[SOURCE: ISO 25720:2009(en), 4.13]
3.21
JSNP
Japanese single nucleotide polymorphisms
database of Japanese Single Nucleotide Polymorphisms
[SOURCE: ISO 25720:2009(en), 4.14]
3.22
markup language
a set of symbols and rules for their uses when doing a markup of a document
[SOURCE: ISO 25720:2009(en), 4.15]
3.23
microarray gene expression markup language
MAGE-ML

a data format for describing information about DNA-array based experiments and gene expression data

3.24
neuro-ML
Neuro Markup Language
markup language (4.20) for describing models of neurons and networks of neurons.
[SOURCE: ISO 25720:2009(en), 4.16]
3.25
nroff
text-formatting program on Unix and unix-like systems
[SOURCE: https://en.wikipedia.org/wiki/Nroff]
© ISO 2018 – All rights reserved
---------------------- Page: 12 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
3.26
omics
a field of study in biology ending in -omics

Note to entry: includes, but is not limited to, genomics, proteomics, and metabolomics.

3.27
pharmacogenomics

a branch of pharmaceutics aiming to develop rational means to optimize drug therapy, with respect to

the patient's genotype
3.28
PolyMAPr
polymorphism mining and annotation programs
programs for polymorphism database mining, annotation, and functional analysis
[SOURCE: ISO 25720:2009(en), 4.19]
3.29
polymorphism
variation in the sequence of DNA (4.8) among individuals
Note to entry: polymorphism implies SNP (4.29) and STRP (4.32)
[SOURCE: ISO 25720:2009(en), 4.20]
3.30
RNA
ribonucleic acid

polymer of ribonucleotides occurring in a double-stranded or single-stranded form

[SOURCE: ISO 22174:2005, 3.1.3]
3.31
RNAML
a data format for exchanging RNA information
3.32
SBML
systems biology markup language
markup language (4.20) for simulations in systems biology
[SOURCE: ISO 25720:2009(en), 4.21]
3.33
SGML
standard generalized markup language

markup language (4.20) for document representation that formalizes markup and frees it of system and

processing dependencies
[SOURCE: ISO 8879:1986, 4.305]
© ISO 2018 – All rights reserved
---------------------- Page: 13 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
3.34
SNP
single nucleotide polymorphism

single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population

[SOURCE: ISO 25720:2009(en), 4.23]
3.35
SNOMED-CT
systematized nomenclature of medicine - Clinical Terms

dynamic, scientifically validated clinical health care terminology and infrastructure

[SOURCE: ISO 25720:2009(en), 4.24]
3.36
SOAP
simple object access protocol
lightweight protocol for exchange of information in a decen
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.