SIST EN ISO 21393:2021
(Main)Health informatics - Omics Markup Language (OML) (ISO 21393:2021)
Health informatics - Omics Markup Language (OML) (ISO 21393:2021)
Basically OML is the data exchanging format that is designed to facilitate exchanging the
omics data around the world without forcing to change any database schema.
- From Informatics side of view, OML is the data exchanging format based on XML. Here the
data exchanging format in the messaging and communication is in the scope, but the
database schema itself is out of the scope of this document.
- From biological side of view, all kinds of omics are in consideration and are in the scope of
this document, the genomic sequence variations and the whole genomic sequence are out of
the scope of this document.
- In otherwise, the annotations as clinical concerns and the relation with other omics concerns
are in the scope of this document.
- Though omics exist in various biological species, the scope of this document is in the
human health associated species as human, cell line, and preclinical animals. The other
biological species are out of the scope of this document.
- The clinical field is in the scope of this document, but the basic research fields and other
scientific fields are out of the scope of this document.
- Here the clinical trials including drug discovery is in the scope of this document. As for
supposed application fields, our main focus is in human health including clinical practice,
preventive medicine, translational research, and clinical researches.
Medizinische Informatik - OMICS Auszeichnungssprache (OML) (ISO 21393:2021)
Informatique de santé — Langage de balisage Omics (OML) (ISO 21393:2021)
Le présent document est applicable au format d'échange de données qui est conçu pour faciliter l'échange de données omiques dans le monde entier sans imposer le moindre changement de schéma de base de données.
Le présent document spécifie les caractéristiques de l'OML selon les perspectives suivantes.
D'un point de vue informatique, OML définit le format d'échange de données basé sur XML. Le présent document établit des lignes directrices pour la spécification du format d'échange de données, mais il exclut le schéma de base de données proprement dit.
Du point de vue moléculaire, le présent document est applicable à toutes les sortes de données omiques bien qu'il exclue les détails relatifs aux molécules (par exemple, les détails des variations de la séquence génomique ou la séquence génomique complète). Le présent document est également applicable aux annotations moléculaires, y compris les questions cliniques et les relations avec les autres questions omiques.
Du point de vue de l'application, le présent document est applicable à la santé humaine, y compris les pratiques cliniques, la médecine préventive, la recherche translationnelle et la recherche clinique, notamment la découverte de médicaments. Le présent document ne s'applique pas à la recherche fondamentale et aux autres domaines scientifiques.
Du point de vue des espèces biologiques, le présent document est applicable aux espèces associées à la santé humaine telles que l'homme, les animaux en préclinique et les lignées cellulaires. Le présent document ne s'applique pas aux autres espèces biologiques.
Zdravstvena informatika - Označevalski jezik OMICS (OML) (ISO 21393:2021)
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
SIST EN ISO 21393:2021
01-oktober-2021
Zdravstvena informatika - Označevalski jezik OMICS (OML) (ISO 21393:2021)
Health informatics - Omics Markup Language (OML) (ISO 21393:2021)
Medizinische Informatik - OMICS Auszeichnungssprache (OML) (ISO 21393:2021)
Informatique de santé — Langage de balisage Omics (OML) (ISO 21393:2021)
Ta slovenski standard je istoveten z: EN ISO 21393:2021
ICS:
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
35.240.80 Uporabniške rešitve IT v IT applications in health care
zdravstveni tehniki technology
SIST EN ISO 21393:2021 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST EN ISO 21393:2021
---------------------- Page: 2 ----------------------
SIST EN ISO 21393:2021
EN ISO 21393
EUROPEAN STANDARD
NORME EUROPÉENNE
August 2021
EUROPÄISCHE NORM
ICS 35.240.80
English Version
Genomics informatics - Omics Markup Language (OML)
(ISO 21393:2021)
Informatique génomique - Langage de balisage Omics Medizinische Informatik - OMICS
(OML) (ISO 21393:2021) Auszeichnungssprache (OML) (ISO 21393:2021)
This European Standard was approved by CEN on 29 September 2020.
CEN members are bound to comply with the CEN/CENELEC Internal Regulations which stipulate the conditions for giving this
European Standard the status of a national standard without any alteration. Up-to-date lists and bibliographical references
concerning such national standards may be obtained on application to the CEN-CENELEC Management Centre or to any CEN
member.
This European Standard exists in three official versions (English, French, German). A version in any other language made by
translation under the responsibility of a CEN member into its own language and notified to the CEN-CENELEC Management
Centre has the same status as the official versions.
CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia,
Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway,
Poland, Portugal, Republic of North Macedonia, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey and
United Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION
COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
CEN-CENELEC Management Centre: Rue de la Science 23, B-1040 Brussels
© 2021 CEN All rights of exploitation in any form and by any means reserved Ref. No. EN ISO 21393:2021 E
worldwide for CEN national Members.
---------------------- Page: 3 ----------------------
SIST EN ISO 21393:2021
EN ISO 21393:2021 (E)
Contents Page
European foreword . 3
2
---------------------- Page: 4 ----------------------
SIST EN ISO 21393:2021
EN ISO 21393:2021 (E)
European foreword
This document (EN ISO 21393:2021) has been prepared by Technical Committee ISO/TC 215 "Health
informatics" in collaboration with Technical Committee CEN/TC 251 “Health informatics” the
secretariat of which is held by NEN.
This European Standard shall be given the status of a national standard, either by publication of an
identical text or by endorsement, at the latest by February 2022, and conflicting national standards
shall be withdrawn at the latest by February 2022.
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. CEN shall not be held responsible for identifying any or all such patent rights.
Any feedback and questions on this document should be directed to the users’ national standards
body/national committee. A complete listing of these bodies can be found on the CEN websites.
According to the CEN-CENELEC Internal Regulations, the national standards organizations of the
following countries are bound to implement this European Standard: Austria, Belgium, Bulgaria,
Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland,
Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Republic of
North Macedonia, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey and the
United Kingdom.
Endorsement notice
The text of ISO 21393:2021 has been approved by CEN as EN ISO 21393:2021 without any modification.
3
---------------------- Page: 5 ----------------------
SIST EN ISO 21393:2021
---------------------- Page: 6 ----------------------
SIST EN ISO 21393:2021
INTERNATIONAL ISO
STANDARD 21393
First edition
2021-07
Genomics informatics — Omics
Markup Language (OML)
Informatique génomique — Langage de balisage Omics (OML)
Reference number
ISO 21393:2021(E)
©
ISO 2021
---------------------- Page: 7 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 OML specification . 6
4.1 Specification requirements and OML positioning . 6
4.2 OML Structure . 6
4.3 OML DTD and XML Schema. 7
5 OML development process . 7
6 Figures . 8
Annex A (informative) Reference works .28
Bibliography .45
© ISO 2021 – All rights reserved iii
---------------------- Page: 9 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee
SC 1, Genomics informatics, in collaboration with the European Committee for Standardization (CEN)
Technical Committee CEN/TC 251, Health informatics, in accordance with the Agreement on technical
cooperation between ISO and CEN (Vienna Agreement).
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
Introduction
In this post genomic era, the management of health-related data is becoming increasingly important
[1]
to both omics research and omics-based medicine. Informational approaches to the management of
clinical, image and omics data are beginning to have as much worth as basic, bench top research. In the
current electronic world, there are multiple different types of data for healthcare as shown in Figure 1.
Besides, nowadays there are many kinds of omics data around the world awaiting effective utilization
for human health. The development of data format and message standards to support the interchange
of clinical omics data is necessary. Omics data includes omics sequence, sequence variation and other
expression data, proteomics data, molecular network, etc. As an entry point, this document focuses on
the data exchange.
In the present circumstances, omics is expected to be a key to understand human response to external
[2]
stimuli such as any kinds of alien invasions, therapies, and the environmental interactions. Bacterial
infection is an example of alien invasion, and the responses to the infections are different among the
individuals. According to the therapy, the side effects to a drug are different among the patients. These
responses are also different in various environments. As a result of recent explosive amount of these
omics researches, the huge amounts of experimental data have been accumulating in many databases
in various types of data formats. These data are waiting to be used in drug discovery, clinical diagnosis,
and clinical researches.
The Markup Language is a set of symbols and rules for their use when doing a markup of a document.
[3] [4]
The first standardized markup language was ISO 8879 onGeneralized Markup Language (SGML)
which has strong similarities with troff and nroff text layout languages supplied with Unix systems.
[5]
Hypertext Markup Language (HTML) is based on SGML. Extensible Markup Language (XML) is
[6]
a pared-down version of SGML, designed especially for Web documents. XML acts as the basis for
[7] [8]
Extensible HTML (XHTML) and Wireless Markup Language (WML) and for standardized definitions
[9]
of system interaction such as Simple Object Access Protocol (SOAP). By contrast, text layout or
semantics are often defined in a purely machine-interpretable form, as in most word processor file
[10]
formats .
Markup Language for the biomedical field, based on XML, has been in development for several
decades to enhance the exchange data among researchers. Bioinformatic Sequence Markup Language
[11] [12] [13]
(BSML), Systems Biology Markup Language (SBML), Cell Markup Language (Cell ML), and
[14]
Neuro Markup Language (Neuro-ML) are examples of markup languages. Polymorphism Mining
[15]
and Annotation Programs (PolyMAPr) is centric on SNP and tries to achieve mining, annotation,
[16] [17] [18]
and functional analysis of public database as dbSNP, CGAP, and JSNP through programming.
ISO 25720 Genomic Sequence Variation Markup Language (GSVML) is the first standardized ML for
clinical genomic sequence variation data exchange.
The purpose of Omics Markup Language (OML) is to provide a standardized data exchange format for
omics in human health.
The recent expansion in omics research has produced large quantities of data held in many databases
with different formats. Standardization of data exchange is necessary for managing, analysing and
utilizing these data. Considering that omics, especially transcriptomics, proteomics, signalomics and
metabolomics, has significant meaning in molecular-based medicine and pharmacogenomics, the data
exchange format is key to enhancing omics-based clinical research and omics-based medicine.
Recently, informational approaches have become more important to both omics research and omics-
based medicine. The management of omics data is as critical as basic research data in this new era.
There are many kinds of omics data around the world, and the time has come to effectively use this
omics data for human health. To use this data effectively and efficiently, standards should be developed
to permit the interoperable interchange of omics data globally. These standards should define the data
format as well as the messages that would be used to interchange and share this data globally.
OML is a base frame of all kinds of clinical omics data. Each omics category will be introduced as a
specific add on component part. As an instance, Whole Genome sequence Markup Language will be
© ISO 2021 – All rights reserved v
---------------------- Page: 11 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
a specific add on component part for whole genome sequence data, and Genomic Sequence Variation
Markup Language will be a specific add on component part for genomic sequence variation data.
To utilize the internationally accumulated omics data, standards for the interchange of omics data
should be defined. These standards should define a data format and exchange messages. Markup
Language is a reasonable choice to address this need. As for omics data message handling, Health Level
1) [19]
Seven® Clinical Genomics Work Group has summarized clinical use cases for general omics data.
The OML project has contributed to these efforts. Additionally, this work incorporated use cases based
[20]
on the Japanese millennium project. Based on these contexts and investigations, this document
elucidates the needs and the requirements for OML and after then proposes the specification of OML for
the international standardization based on the elucidated needs and the requirements.
1) Health Level Seven (HL7) is the registered trademark of Health Level Seven International. This information is
given for the convenience of users of this document and does not constitute an endorsement by ISO of the product
named.
vi © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
SIST EN ISO 21393:2021
INTERNATIONAL STANDARD ISO 21393:2021(E)
Genomics informatics — Omics Markup Language (OML)
1 Scope
This document is applicable to the data exchange format that is designed to facilitate exchanging omics
data around the world without forcing changes of any database schema.
This document specifies the characteristics of OML from the following perspectives.
From an informatics perspective, OML defines the data exchange format based on XML. This document
gives guidelines for the specifications of the data exchange format, but this document excludes the
database schema itself.
From a molecular side of view, this document is applicable to all kinds of omics data, while this
document excludes the details of the molecules (e.g., details of genomic sequence variations or whole
genomic sequence). This document is also applicable to the molecular annotations including clinical
concerns and relations with other omics concerns.
From an application side of view, this document is applicable to the clinical field including clinical
practice, preventive medicine, translational research, and clinical research including drug discovery.
This document does not apply to basic research and other scientific fields.
From a biological species side of view, this document is applicable to the human health-associated
species as human, preclinical animals, and cell lines. This document does not apply to the other
biological species.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
actor
something or someone who supplies a stimulus to the system
Note 1 to entry: Actors include both humans and other quasi-autonomous things, such as machines, computer
tasks and systems.
[SOURCE: ISO 25720:2009, 4.1]
3.2
allele
gene that is found in one of two or more different forms in the same position in a chromosome
© ISO 2021 – All rights reserved 1
---------------------- Page: 13 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.3
bioinformatic sequence markup language
BSML
extensible language specification and container for bioinformatic data
[SOURCE: ISO 25720:2009, 4.2]
3.4
cancer genome anatomy project
CGAP
genomic expression data collected for various tumorigenic tissues in both humans and mice
Note 1 to entry: CGAP also provides information on methods and reagents used in deriving the genomic data
[SOURCE: ISO 25720:2009, 4.4, modified]
3.5
codon
sequence of three nucleotides which together form a unit of genetic code in a DNA or RNA molecule
3.6
dbSNP
database of single nucleotide polymorphisms (3.29) provided by the US National Center for Biotechnology
Information (NCBI)
Note 1 to entry: Available at https:// www .ncbi .nlm .nih .gov/ SNP/ .
[SOURCE: ISO/TS 20428:2017, 3.9]
3.7
digital imaging and communications in medicine
DICOM
standard in the field of medical informatics for exchanging digital information between medical
imaging equipment (such as radiological imaging) and other systems, ensuring interoperability
[SOURCE: ISO 25720:2009, 4.6]
3.8
DNA sequence variation
differences of DNA sequence among individuals in a population
Note 1 to entry: DNA sequence variation implies polymorphism 3.25.
[SOURCE: ISO 25720:2009, 4.8]
3.9
document type definition
DTD
document that contains formal definitions of all of the data elements in a particular type of hypertext
markup language 3.13, standard generalized markup language (3.29), or extensible markup language
(3.36) document
[SOURCE: ISO 25720:2009, 4.9]
3.10
entry point
reference point that designate the class(es) from which the messages begin for the domain
[SOURCE: ISO 25720:2009, 4.10, modified]
2 © ISO 2021 – All rights reserved
---------------------- Page: 14 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.11
exon
part of a gene that will encode a part of the final mature RNA produced by that gene after introns (3.16)
have been removed by RNA splicing
3.12
genomic sequence variation markup language
GSVML
standard for data exchange of genomic sequence variation data
3.13
hypertext markup language
HTML
set of markup symbols or codes inserted in a file intended for display in a browser
[SOURCE: ISO 25720:2009, 4.12, modified]
3.14
international classification of diseases
ICD
diagnose coding system for epidemiology, health management and clinical purposes
Note 1 to entry: ICD-10 is the 10th revision and ICD-11th is the 11th revision.
Note 2 to entry: available at https:// icd .who .int/ .
3.15
clinical omics sub-information model for international classification of diseases
clinical omics sub-information model for ICD
iCOS
sub-information model aiming to enhance the representation ability of ICD-11 contents model with
covering omics information as an add-on part.
Note 1 to entry: Add-on sub-information model to enhance the representation ability of ICD-11 contents model to
cover omics information.
3.16
intron
nucleotide sequence within a gene that is removed by RNA splicing during maturation of the final RNA
product
3.17
joint photographic experts group
JPEG
compression technique for images
[SOURCE: ISO 25720:2009, 4.13]
3.18
markup language
ML
set of symbols and rules for their uses when doing a markup of a document
[SOURCE: ISO 25720:2009, 4.15]
3.19
microarray gene expression markup language
MAGE-ML
data format for describing information about DNA-array based experiments and gene expression data
© ISO 2021 – All rights reserved 3
---------------------- Page: 15 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.20
neuro markup language
neuro-ML
markup language (3.18) for describing models of neurons and networks of neurons.
[SOURCE: ISO 25720:2009, 4.16]
3.21
nroff
unix text-formatting program that is a predecessor of the Unix troff (3.33) document processing system
[SOURCE: ISO 25720:2009, 4.17]
3.22
omics
field of study in biology ending in -omics
Note 1 to entry: It includes, but is not limited to, genomics, proteomics, and metabolomics.
3.23
pharmacogenomics
branch of pharmaceutics aiming to develop rational means to optimize drug therapy, with respect to
the patient's genotype
3.24
polymorphism mining and annotation programs
PolyMAPr
programs for polymorphism (3.25) database mining, annotation, and functional analysis
[SOURCE: ISO 25720:2009, 4.19]
3.25
polymorphism
variation in the sequence of DNA among individuals
Note 1 to entry: Polymorphism implies single nucleotide polymorphism (3.29) and short tandem repeat
polymorphism (3.32) .
[SOURCE: ISO 25720:2009, 4.20]
3.26
RNA markup language
RNAML
data format for exchanging RNA information
3.27
systems biology markup language
SBML
markup language (3.18) for simulations in systems biology
[SOURCE: ISO 25720:2009, 4.21]
3.28
standard generalized markup language
SGML
markup language (3.18) for document representation that formalizes markup and frees it of system and
processing dependencies
[SOURCE: ISO 8879:1986, 4.305, modified]
4 © ISO 2021 – All rights reserved
---------------------- Page: 16 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.29
single nucleotide polymorphism
SNP
single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population
[SOURCE: ISO 25720:2009, 4.23]
3.30
2)
systematized nomenclature of medicine-clinical terms®
SNOMED-CT®
dynamic, scientifically validated clinical health care terminology and infrastructure
[SOURCE: ISO 25720:2009, 4.24]
3.31
simple object access protocol
SOAP
lightweight protocol for exchange of information in a decentralized, distributed environment
[SOURCE: ISO 25720:2009, 4.25]
3.32
short tandem repeat polymorphism
STRP
variable segments of DNA that are two to five bases long with numerous repeats
[SOURCE: ISO 25720:2009, 4.26]
3.33
troff
major component of a document processing system developed by AT&T for the Unix operating system
3.34
wireless markup language
WML
extensible markup language used to specify content and user interface for WAP (Wireless Application
Protocol) devices
[SOURCE: ISO 25720:2009, 4.29]
3.35
extensible HTML
XHTML
hybrid between hypertext markup language 3.13 and extensible markup language (3.36) specifically
designed for net device displays
[SOURCE: ISO 25720:2009, 4.30]
3.36
extensible markup language
XML
pared-down version of standard generalized markup language (3.29), designed especially for web
documents
[SOURCE: ISO 25720:2009, 4.31]
2) SNOMED CT is the registered trademark of International Health Terminology Standards Development
Organisation. This information is given for the convenience of users of this document and does not constitute an
endorsement by ISO of the product named.
© ISO 2021 – All rights reserved 5
---------------------- Page: 17 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
3.37
XML schema
language for describing the structure and constraining the contents of extensible markup language
documents
[SOURCE: ISO 25720:2009, 4.32]
4 OML specification
4.1 Specification requirements and OML positioning
In the current context, annotative information about omics is increasing and that information is
embedding the information holes. The omics data itself is also increasing but is stored in various
databases. The pitfall of omics data handling is the lack of standardization of the data formats for
the organized omics. Historically, markup languages have been used, and programs are developed to
handle the omics information. However, there have been no omics centric markup languages so far.
OML is the first omics centric markup language and is human health centric. Considering that omics
has the great impact especially for human health and response, it can be said that OML has the greatest
potential to be the designated markup language for human healthcare. On the other hand, setting the
applications to practical human health means it shall handle direct or indirect annotations. Here the
direct annotation shall indicate general annotative information such as omics associated other omics
information and experimental preparations. The indirect annotation shall indicate all of omics data
and clinical data that result from omics data. To understand the omics based clinical situation of each
patient, these kinds of additional information is required. Considering the requirements to add many
kinds of additional information, the development and standardization of OML cannot stand alone and
shall need harmonization with the other documents from the other international standardization
organizations.
OML intends to be used in data exchange messages related to human health. In development and
standardization of OML in this application domain, keeping an eye on the patient safety, the clinical
efficiency, and the medical costs shall always be required. For the patient safety from an informational
side, the conservation and the protection of patient information shall be deemed important. For the
enhancement of the clinical efficiency, the simplicity and the easy understandability shall be deemed
important. For the medical cost reduction, the adaptation ability and installation ease shall be deemed
important.
OML tries to respond to these basic requirements by providing the sharable XML based data exchanging
format. OML can be used for the clinically omics data exchange among various types of data formats.
In the greater framework of clinical data standardization, OML shall play a part of describing the omics
data and its necessary information.
4.2 OML Structure
A valid OML expression shall be structured in accordance with the following, also see Figure 2:
— The outline structure of OML is shown in Figure 2.
OML shall consist of three data criteria:
— omics data;
— direct annotation;
— indirect annotation.
The omics data criterion shall describe, for each omics
the straight forward omics data as:
— type;
6 © ISO 2021 – All rights reserved
---------------------- Page: 18 ----------------------
SIST EN ISO 21393:2021
ISO 21393:2021(E)
— position;
— length;
— region;
— etc.
The direct annotation criterion shall describe, for each omics
the attached data of omics data as:
— experiment analys
...
SLOVENSKI STANDARD
oSIST prEN ISO 21393:2019
01-september-2019
Zdravstvena informatika - Označevalski jezik OMICS (OML) (ISO/DIS 21393:2019)
Health informatics - Omics Markup Language (OML) (ISO/DIS 21393:2019)
Medizinische Informatik - OMICS Auszeichnungssprache (OML) (ISO/DIS 21393:2019)
Informatique de santé — Langage de balisage Omics (OML) (ISO/DIS 21393:2019)
Ta slovenski standard je istoveten z: prEN ISO 21393
ICS:
35.240.80 Uporabniške rešitve IT v IT applications in health care
zdravstveni tehniki technology
oSIST prEN ISO 21393:2019 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
oSIST prEN ISO 21393:2019
---------------------- Page: 2 ----------------------
oSIST prEN ISO 21393:2019
DRAFT INTERNATIONAL STANDARD
ISO/DIS 21393
ISO/TC 215 Secretariat: ANSI
Voting begins on: Voting terminates on:
2019-07-16 2019-10-08
Health informatics — Omics Markup Language (OML)
Informatique de santé — Langage de balisage Omics (OML)
ICS: 35.240.80
THIS DOCUMENT IS A DRAFT CIRCULATED
This document is circulated as received from the committee secretariat.
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
ISO/CEN PARALLEL PROCESSING
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/DIS 21393:2019(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
©
PROVIDE SUPPORTING DOCUMENTATION. ISO 2019
---------------------- Page: 3 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2018[E]
ISO/DIS 21393:2019(E)
Contents Page
Foreword . iv
Introduction . v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 OML specification . 7
4.1 Specification requirements and OML positioning (informative) . 7
4.2 OML Structure (normative) . 8
4.3 OML DTD (informative) and XML Schema (normative) . 8
5 OML development process (informative) . 8
6 Figures . 9
7 Tables . 49
Annex A (informative) Reference Works . 78
A.1 Introduction . 78
A.2 Use case analysis . 78
A.2.1 Overview . 79
A.2.2 Use case of SNP analysis as an example of Omics analysis . 79
A.2.3 UML example of SNP analysis as an example of Omics analysis . 79
A.2.4 Use case of database integration . 80
A.2.5 Use case and required elements . 80
A.3 Diversity of SNP databases . 80
A.3.1 Diversity of databases . 81
A.3.2 Diversity of data representation . 81
A.3.3 Diversity of sequence variation data representation . 81
A.4 Markup language comparison . 81
A.4.1 Mapping of each markup language to the data categories . 82
A.4.2 OML originated needs and its specifications . 83
A.5 Interface analysis to Health Level Seven . 83
A.5.1 Comparison with HL7 genomics model . 83
A.5.2 Information Model of Genotype in HL7 . 84
A.6 Interface analysis to CEN en ISO13606 . 84
A.7 Interface analysis to SNOMED‐CT . 84
A.8 Interface analysis to WHO‐ICD iCOS . 85
Bibliography. 86
COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
© ISO 2018 – All rights reserved
iii
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved
---------------------- Page: 4 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
Contents Page
Foreword . iv
Introduction . v
1 Scope .1
2 Normative references .1
3 Terms and definitions .1
4 OML specification .7
4.1 Specification requirements and OML positioning (informative) .7
4.2 OML Structure (normative).8
4.3 OML DTD (informative) and XML Schema (normative) .8
5 OML development process (informative) .8
6 Figures .9
7 Tables . 49
Annex A (informative) Reference Works . 78
A.1 Introduction. 78
A.2 Use case analysis . 78
A.2.1 Overview . 79
A.2.2 Use case of SNP analysis as an example of Omics analysis . 79
A.2.3 UML example of SNP analysis as an example of Omics analysis . 79
A.2.4 Use case of database integration . 80
A.2.5 Use case and required elements. 80
A.3 Diversity of SNP databases . 80
A.3.1 Diversity of databases . 81
A.3.2 Diversity of data representation . 81
A.3.3 Diversity of sequence variation data representation . 81
A.4 Markup language comparison . 81
A.4.1 Mapping of each markup language to the data categories . 82
A.4.2 OML originated needs and its specifications . 83
A.5 Interface analysis to Health Level Seven . 83
A.5.1 Comparison with HL7 genomics model . 83
A.5.2 Information Model of Genotype in HL7 . 84
A.6 Interface analysis to CEN en ISO13606 . 84
A.7 Interface analysis to SNOMED-CT . 84
A.8 Interface analysis to WHO-ICD iCOS . 85
Bibliography . 86
© ISO 2018 – All rights reserved
iii
---------------------- Page: 5 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO
collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on
the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the World
Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 125, Health Informatics, Subcommittee SC
1, Clinical Genomics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2018 – All rights reserved
iv
---------------------- Page: 6 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
Introduction
In this next generation post genomic era, the management of health-related data is becoming increasingly
important to both omics research and omics-based medicine [1]. Informational approaches to the
management of clinical, image and omics data are beginning to have as much worth as basic, bench top
research. Nowadays there are many kinds of omics data around the world awaiting effective utilization
for human health. The hurdle that must be overcome to achieve this goal is the development of data
format and message standards to support the interchange of clinical omics data. Omics data includes
omics sequence, sequence variation and other expression data, proteomics data, molecular network, etc.
As an entry point, this standard focuses on the data exchange.
In the present circumstances, omics is expected to be a key to understand human response to external
stimuli such as any kinds of alien invasions, therapies, and the environmental interactions [2]. Bacterial
infection is an example of alien invasion, and the responses to the infections are different among the
individuals. According to the therapy, the side effects to a drug are different among the patients. These
responses are also different in various environments. As a result of recent explosive amount of these
omics researches, the huge amounts of experimental data have been accumulating in many databases in
various types of data formats. These data are waiting to be used in drug discovery, clinical diagnosis, and
clinical researches.
The Markup Language is a set of symbols and rules for their use when doing a markup of a document [3].
The first standardized markup language was Standard Generalized Markup Language (SGML) [4] which
has strong similarities with troff and nroff text layout languages supplied with Unix systems. Hypertext
Markup Language (HTML) is based on SGML [5]. Extensible Markup Language (XML is a pared-down
version of SGML, designed especially for Web documents [6]. XML acts as the basis for Extensible HTML
(XHTML) [7] and Wireless Markup Language (WML) [8] and for standardized definitions of system
interaction such as Simple Object Access Protocol (SOAP) [9]. By contrast, text layout or semantics are
often defined in a purely machine-interpretable form, as in most word processor file formats [10].
Markup Language for the biomedical field, based on XML, has been in development for several decades
to enhance the exchange data among researchers. Bioinformatic Sequence Markup Language (BSML)
[11], Systems Biology Markup Language (SBML) [12], Cell Markup Language (Cell ML) [13], and Neuro
Markup Language (Neuro-ML) [14] are examples of markup languages. Polymorphism Mining and
Annotation Programs (PolyMAPr) [15] is centric on SNP and tries to achieve mining, annotation, and
functional analysis of public database as dbSNP [16], CGAP [17], and JSNP [18] through programming.
ISO 25720 Genomic Sequence Variation Markup Language (GSVML) is the first standardized ML for
clinical genomic sequence variation data exchange.
The purpose of Omics Markup Language (OML) is to provide a standardized data exchange format for
omics in human health.
The recent expansion in omics research has produced large quantities of data held in many databases
with different formats. Standardization of data exchange is necessary for managing, analysing and
utilizing these data. Considering that omics, especially transcriptomics, proteomics, signalomics and
metabolomics, has significant meaning in molecular-based medicine and pharmacogenomics, the data
exchange format is key to enhancing omics-based clinical research and omics-based medicine.
Recently, informational approaches have become more important to both omics research and omics-
based medicine. The management of omics data is as critical as basic research data in this new era. There
are many kinds of omics data around the world, and the time has come to effectively use this omics data
© ISO 2018 – All rights reserved
v
---------------------- Page: 7 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
for human health. To use this data effectively and efficiently, standards must be developed to permit the
interoperable interchange of omics data globally. These standards must define the data format as well as
the messages to be used to interchange and share this data globally. This standard addresses those
requirements, using a markup language.
OML is a base frame of all kinds of clinical omics data. Each omics category will be introduced as a specific
add on component part. As an instance, Whole Genome sequence Markup Language will be a specific add
on component part for whole genome sequence data, and Genomic Sequence Variation Markup Language
will be a specific add on component part for genomic sequence variation data.
To utilize the accumulated omics data among many facilities around the world, standards for the
interchange of omics data must be defined. The required standards include defining a data format and
exchange messages. Markup Language is the reasonable choice to address this need. As for omics data
message handling, Health Level Seven Clinical Genomics Work Group [19] has summarized clinical use
cases for general omics data. The OML project has contributed to these efforts. Additionally, this work
incorporated use cases based on the Japanese millennium project [20] . Based on these contexts and
investigations, this document elucidates the needs and the requirements for OML and then proposes the
specification of OML for the international standardization.
A list of references related this part of ISO/DIS 21393 is given in the bibliography.
© ISO 2018 – All rights reserved
vi
---------------------- Page: 8 ----------------------
oSIST prEN ISO 21393:2019
DIS ISO/DIS 21393:2019[E]
Health informatics — Omics Markup Language
1 Scope
OML is a data exchange format designed to facilitate exchanging omics data around the world without
forcing changes to existing databases.
From an informatics perspective, OML is an XML-based data exchange format. The data exchange format
(e.g., XML schema and DTD) is in scope. The structure of the systems and databases sending or receiving
the information schemas are out of the scope.
From a biological perspective, all kinds of omics are in scope, but the details (e.g., details of genomic
sequence variations or whole genomic sequence) are out of the scope. Annotations including clinical
concerns and relations with other omics concerns are in scope.
The application focus is human health including clinical practice, preventive medicine, translational
research, and clinical research including drug discovery. The scope includes health-associated species,
including human and preclinical animals, and associated cell lines. Other species, basic research, and
other scientific fields are out of scope.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 25720:2009, Health informatics -- Genomic Sequence Variation Markup Language (GSVML)
ISO/HL7 21731:2006, Health informatics – HL7 version 3 – Reference information model – Release 1
CEN EN 13606, Health informatics -- Electronic Healthcare Record Communication
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
actor
something or someone who supplies a stimulus to the system
Note to entry: Actors include both humans and other quasi-autonomous things, such as machines, computer tasks
and systems.
[SOURCE: ISO 25720:2009(en), 4.1]
3.2
allele
a gene that is found in one of two or more different forms in the same position in a chromosome
© ISO 2018 – All rights reserved
1
---------------------- Page: 9 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
3.3
BSML
bioinformatic sequence markup language
extensible language specification and container for bioinformatic data
[SOURCE: ISO 25720:2009(en), 4.2]
3.4
Cell ML
cell markup language
a standard for representing and exchanging computer-based biological models
[SOURCE: ISO 25720:2009(en), 4.3]
3.5
CGAP
Cancer Gene Anatomy Project
genomic expression data collected for various tumorigenic tissues in both humans and mice.
Note to entry: CGAP also provides information on methods and reagents used in deriving the genomic data
[SOURCE: ISO 25720:2009(en), 4.4]
3.6
codon
a sequence of three nucleotides which together form a unit of genetic code in a DNA or RNA molecule.
3.7
dbSNP
database of SNPs (4.29) provided by the US National Center for Biotechnology Information (NCBI)
Note to entry: available at https://www.ncbi.nlm.nih.gov/SNP/
[SOURCE: ISO/TS 20428:2017(en), 3.9]
3.8
DICOM
digital imaging and communications in medicine
a standard in the field of medical informatics for exchanging digital information between medical imaging
equipment (such as radiological imaging) and other systems, ensuring interoperability
[SOURCE: ISO 25720:2009(en), 4.6]
3.9
DNA
deoxyribonucleic acid
a molecule that encodes genetic information in the nucleus of cells
[SOURCE: ISO 25720:2009(en), 4.7]
3.10
DNA sequence variation
differences of DNA (4.8) sequence among individuals in a population
© ISO 2018 – All rights reserved
2
---------------------- Page: 10 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
Note to entry: DNA sequence variation implies polymorphism (4.xx)
[SOURCE: ISO 25720:2009(en), 4.8]
3.11
DTD
document type definition
a document that contains formal definitions of all of the data elements in a particular type of HTML (4.15),
SGML (4.28), or XML (4.38) document
[SOURCE: ISO 25720:2009(en), 4.9]
3.12
entry point
reference point that designate the class(es) from which the messages begin for the domain
[SOURCE: ISO 25720:2009(en), 4.10]
3.13
exon
any part of a gene that will encode a part of the final mature RNA produced by that gene after introns
have been removed by RNA splicing.
3.14
gene-based medicine
medicine based on genes or genetic science
[SOURCE: ISO 25720:2009(en), 4.11]
3.15
GSVML
genomic sequence variation markup language
a standard for data exchange of genomic sequence variation data
[SOURCE: ISO 25720:2009(en)]
3.16
HTML
Hypertext Markup Language
a set of markup symbols or codes inserted in a file intended for display in a browser
[SOURCE: ISO 25720:2009(en), 4.12]
3.17
ICD-11
th
international classification of diseases 11 revision
a standard diagnostic tool for epidemiology, health management and clinical purposes
Note to entry: available at https://icd.who.int/
© ISO 2018 – All rights reserved
3
---------------------- Page: 11 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
3.18
iCOS
clinical omics sub-information model for ICD
Note to entry: Add-on sub-information model to enhance the representation ability of ICD11 contents model to
cover omics information.
3.19
intron
any nucleotide sequence within a gene that is removed by RNA splicing during maturation of the final
RNA product
3.20
JPEG
joint photographic experts group
compression technique for images
[SOURCE: ISO 25720:2009(en), 4.13]
3.21
JSNP
Japanese single nucleotide polymorphisms
database of Japanese Single Nucleotide Polymorphisms
[SOURCE: ISO 25720:2009(en), 4.14]
3.22
markup language
ML
a set of symbols and rules for their uses when doing a markup of a document
[SOURCE: ISO 25720:2009(en), 4.15]
3.23
microarray gene expression markup language
MAGE-ML
a data format for describing information about DNA-array based experiments and gene expression data
3.24
neuro-ML
Neuro Markup Language
markup language (4.20) for describing models of neurons and networks of neurons.
[SOURCE: ISO 25720:2009(en), 4.16]
3.25
nroff
text-formatting program on Unix and unix-like systems
[SOURCE: https://en.wikipedia.org/wiki/Nroff]
© ISO 2018 – All rights reserved
4
---------------------- Page: 12 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019[E]
3.26
omics
a field of study in biology ending in -omics
Note to entry: includes, but is not limited to, genomics, proteomics, and metabolomics.
3.27
pharmacogenomics
a branch of pharmaceutics aiming to develop rational means to optimize drug therapy, with respect to
the patient's genotype
3.28
PolyMAPr
polymorphism mining and annotation programs
programs for polymorphism database mining, annotation, and functional analysis
[SOURCE: ISO 25720:2009(en), 4.19]
3.29
polymorphism
variation in the sequence of DNA (4.8) among individuals
Note to entry: polymorphism implies SNP (4.29) and STRP (4.32)
[SOURCE: ISO 25720:2009(en), 4.20]
3.30
RNA
ribonucleic acid
polymer of ribonucleotides occurring in a double-stranded or single-stranded form
[SOURCE: ISO 22174:2005, 3.1.3]
3.31
RNAML
a data format for exchanging RNA information
3.32
SBML
systems biology markup language
markup language (4.20) for simulations in systems biology
[SOURCE: ISO 25720:2009(en), 4.21]
3.33
SGML
standard generalized markup language
markup language (4.20) for document representation that formalizes markup and frees it of system and
processing dependencies
[SOURCE: ISO 8879:1986, 4.305]
© ISO 2018 – All rights reserved
5
---------------------- Page: 13 ----------------------
oSIST prEN ISO 21393:2019
ISO/DIS 21393:2019(E)
3.34
SNP
single nucleotide polymorphism
single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population
[SOURCE: ISO 25720:2009(en), 4.23]
3.35
SNOMED-CT
systematized nomenclature of medicine - Clinical Terms
dynamic, scientifically validated clinical health care terminology and infrastructure
[SOURCE: ISO 25720:2009(en), 4.24]
3.36
SOAP
simple object access protocol
lightweight protocol for exchange of information in a decen
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.