ISO/TS 24420:2023
(Main)Biotechnology — Massively parallel DNA sequencing — General requirements for data processing of shotgun metagenomic sequences
Biotechnology — Massively parallel DNA sequencing — General requirements for data processing of shotgun metagenomic sequences
This document illustrates the workflow of shotgun metagenomic sequence data processing of host-derived microbiome and environmental metagenomes. This document specifies the requirements for quality control of shotgun metagenomic sequence data processing for massively parallel DNA sequencing. This document provides guidelines for data directory, data archive and metadata for shotgun metagenomic sequence data. This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence data. This document applies to shotgun metagenomic sequence data processing and analyses, but excludes functional analysis.
Biotechnologie — Séquençage d'ADN massivement parallèle — Exigences générales pour le traitement des données des séquences métagénomiques "Shotgun"
General Information
Buy Standard
Standards Content (Sample)
TECHNICAL ISO/TS
SPECIFICATION 24420
First edition
2023-05
Biotechnology — Massively parallel
DNA sequencing — General
requirements for data processing of
shotgun metagenomic sequences
Biotechnologie — Séquençage d'ADN massivement parallèle —
Exigences générales pour le traitement des données des séquences
métagénomiques "Shotgun"
Reference number
ISO/TS 24420:2023(E)
© ISO 2023
---------------------- Page: 1 ----------------------
ISO/TS 24420:2023(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO 2023 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/TS 24420:2023(E)
Contents Page
Foreword ........................................................................................................................................................................................................................................iv
Introduction .................................................................................................................................................................................................................................v
1 Scope ................................................................................................................................................................................................................................. 1
2 Normative references ..................................................................................................................................................................................... 1
3 Terms and definitions .................................................................................................................................................................................... 1
4 Processing workflow .......................................................................................................................................................................................4
5 Data processing ..................................................................................................................................................................................................... 5
5.1 Facilities and software requirements ................................................................................................................................ 5
5.2 Sequence quality control and error determination ............................................................................................... 5
5.3 Sequence assembly ............................................................................................................................................................................. 6
6 Data analysis ............................................................................................................................................................................................................ 6
6.1 Annotation ...................................................................... ............................................................................................................................ 6
6.2 Calculation of species relative abundance ..................................................................................................................... 7
6.2.1 Species analysis ................................................................................................................................................................... 7
6.2.2 Gene analysis ......................................................................................................................................................................... 7
7 Data archive and metadata ....................................................................................................................................................................... 7
7.1 Original data ............................................................................................................................................................................................. 7
7.2 Sequencing analytical data .......................................................................................................................................................... 7
7.3 Data directory and archive .......................................................................................................................................................... 8
7.3.1 General ........................................................................................................................................................................................ 8
7.3.2 Directory of data elements ........................................................................................................................................ 8
7.3.3 Data archiving ...................................................................................................................................................................... 8
7.4 Metadata ...................................................................................................................................................................................................... 8
Annex A (informative) Examples of data format ................................................................................................................................10
Annex B (informative) Directory of data elements ..........................................................................................................................16
Bibliography .............................................................................................................................................................................................................................18
iii© ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/TS 24420:2023(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).ISO draws attention to the possibility that the implementation of this document may involve the use
of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed
patent rights in respect thereof. As of the date of publication of this document, ISO had not received
notice of (a) patent(s) which may be required to implement this document. However, implementers are
cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all
such patent rights.Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.This document was prepared by Technical Committee ISO/TC 276, Biotechnology.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.© ISO 2023 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/TS 24420:2023(E)
Introduction
Shotgun metagenomic sequencing of organisms’ genomes from a complex sample is widely used in
life science and clinical applications (e.g. human complex disease associated analysis, environmental
microecology and other fields) in order to gain knowledge of their composition and function. It has
potential to provide significant scientific data for life science research.The utility of this technique is its ability to reveal the microbial diversity and abundance found in
microbial populations from multiple environments and to determine sequence information (taxonomic
characterization, functional annotation, and comparative analysis/metagenomics) for individual
organisms in these populations. The resulting data can be subjected to comparative analytics.
Massively parallel shotgun metagenomic sequencing generates a large amount of data containing a
high complexity of microbial genomes and a large number of unknown species. It is important to use
effective processing procedures and address quality control for shotgun metagenomic sequencing data.
A standardised data format is essential to promote data sharing.As with any advanced technology, massively parallel sequencing technologies is error prone.
Overcoming these shortcomings to ensure a reliable sequencing and analytical outcome is important.
This document provides a uniform standard for the collation, storage and subsequent analysis of
metagenomic data, and guidelines. It provides requirements and recommendations for the workflow
and process of shotgun metagenomic analyses including quality control of sequencing data and
metadata, and the compositional and functional analysis of microbial community. These requirements
and recommendations can ensure accuracy of data generated from metagenomic analysis, address
potential errors and facilitate downstream applications.© ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/TS 24420:2023(E)
Biotechnology — Massively parallel DNA sequencing —
General requirements for data processing of shotgun
metagenomic sequences
1 Scope
This document illustrates the workflow of shotgun metagenomic sequence data processing of host-
derived microbiome and environmental metagenomes.This document specifies the requirements for quality control of shotgun metagenomic sequence data
processing for massively parallel DNA sequencing.This document provides guidelines for data directory, data archive and metadata for shotgun
metagenomic sequence data.This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence
data.This document applies to shotgun metagenomic sequence data processing and analyses, but excludes
functional analysis.2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 20397-1:2022, Biotechnology — Massively parallel sequencing — Part 1: Nucleic acid and library
preparation3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp— IEC Electropedia: available at https:// www .electropedia .org/
3.1
attribute value
value associated with an attribute instance
[SOURCE: ISO 21962:2003, 1.5.2.3]
3.2
category
set of items or concepts that share a common attribute or feature
© ISO 2023 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/TS 24420:2023(E)
3.3
classification
exhaustive set of mutually exclusive categories to aggregate data at a pre-prescribed level of
specialization for a specific purpose[SOURCE: ISO 17115:2007, 2.7.1]
3.4
clean data
sequencing data obtained after a pre-processing procedure which usually includes multiple trimming
and filtering steps to ensure specific quality levels (e.g., per-base quality, host/contaminant sequences
removed, linkers/adaptors removed)3.5
code
system of rule(s) to convert information such as text, images, sounds or electric, photonic or magnetic
signals into another form or representation to facilitate analysis, communication or storage in a storage
medium[SOURCE: ISO 20691:2022, 3.6]
3.6
encoding
process of assigning code to things or concepts
3.7
contig
contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome
or plasmid3.8
data format
arrangement of data according to preset specifications
Note 1 to entry: Preset specifications are usually made for computer processing.
3.9
data element
single unit of data that in a certain context is considered indivisible
[SOURCE: ISO/TS 21089:2018, 3.44]
3.10
directory
list of data items, which gives itemized information enabling traceability, identification and findability
of related dataNote 1 to entry: A directory can be arranged in alphabetical, chronological or systematic order.
3.11directory identifier
unique language-independent sign assigned to the archive directory in the structure
3.12gene
sequence of nucleotides in DNA or RNA encoding either an RNA or a protein product
Note 1 to entry: Genes are recognized as the basic unit of heredity.Note 2 to entry: A gene can consist of non-contiguous nucleic acid segments that are rearranged through a
nuclear processing step.© ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/TS 24420:2023(E)
Note 3 to entry: A gene may include or be part of an operon that includes elements for gene expression.
[SOURCE: ISO 20397-2:2021, 3.16]3.13
identifier
sequence of characters, capable of uniquely identifying that with which it is associated, within a
specified context[SOURCE: ISO/IEC 11179-1:2015, 3.1.3]
3.14
analytical data
set of elements to describe qualitative or quantitative analytical attributes of processed metagenomic
raw data3.15
name
semantic, natural language labels given to data elements, and variations of these labels serve different
functions[SOURCE: ISO/IEC 11179-1:2015, 3.43]
3.16
public attribute
attribute that can have same attribute value for different data in the directory
3.17
quality score
Q score
Phred score
quality of base calling
measure of the probability of correct base recognition, usually expressed directly by a numerical value
Note 1 to entry: Q is defined by the following equation:Q = −10log (p)
where p is the estimated probability of the base call being wrong.
Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of
99 %.Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of
99,9 %.Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a
significant portion of the reads being unusable. Low quality scores may also indicate false-positive variant calls,
resulting in inaccurate conclusions.3.18
raw data
primary sequencing data produced by a sequencer without involving any software-based pre-filtering
for analysis purpose[SOURCE: ISO 20397-2:2021, 3.21]
© ISO 2023 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/TS 24420:2023(E)
3.19
relative abundance
fraction of a single microorganism operational taxonomic unit in the total microbial community of a
defined environmentNote 1 to entry: It usually represented as a percentage.
3.20
repeatability requirement
requirement of consistency under a set of repeatable measurement conditions
3.21
scaffold
reconstructed genomic sequence created by chaining contigs together using additional information
about the relative position and orientation of the contigs in the genome3.22
sequence assembly
processing, aligning and merging individual sequencing reads in order to reconstruct longer DNA
sequences, entire genes or genomesNote 1 to entry: When sequencing a novel genome where there is no reference sequence available for alignment,
sequence reads are assembled as contigs, that is the de novo assembly.3.23
shotgun metagenomic sequencing
shotgun metagenomics
nucleotide sequence determination of the genomes of untargeted cells in communities in order to
determine community composition and functionNote 1 to entry: For the microbiome, shotgun metagenomics focuses on microbial communities in specific
environments.Note 2 to entry: For shotgun metagenomic sequencing, DNA is extracted from the microbes in the sample directly
without isolation and culture. That DNA is then used to analyse the genetic composition, species classification,
phylogeny, gene function, or metabolic network or combinations thereof.3.24
specialized attribute
attribute that is unique for each sample in the directory
4 Processing workflow
The basic workflow of metagenomics should include sequencing, data processing and data analysis.
Data processing includes pre-processing, quality control, data assembly, data profiling and annotation,
as shown in Figure 1.© ISO 2023 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/TS 24420:2023(E)
Figure 1 — Workflow of metagenomic data processing
5 Data processing
5.1 Facilities and software requirements
5.1.1 The software pipeline for metagenomics bioinformatics shall be validated. Applications for the
pipeline should be locked down including the complete set of tools, code, operational environment, and
network connections that compose the pipeline before using it for analytical purposes such as shell
(e.g., BASH), GNU R, and Python. Changes to any components of the pipeline require revalidation to
ensure that there is no impact in the performance characteristics of the pipeline.
5.1.2 High-performance computing technologies may be used at any step in the process to ensure
proper management and curation of large collections of complex procaryotic and eucaryotic genomes
as processing massive datasets is a prerequisite for NGS metagenomics analytics.5.2 Sequence quality control and error determination
5.2.1 Raw metagenomic sequencing data shall initially be passed through a quality control (QC)
process to ensure a clean dataset. The evaluation should follow ISO 20397-1:2022, Clause 4 and 8.3, and
ISO 20397-2:2021, 4.3.5.2.2 The available data quality values for each DNA sample after sequencing should meet the
following requirements:a) Q20 ≥ 90 %, above 90 % of the sample base mass value shall be more than 20;
b) Q30 ≥ 80 %, above 80 % of the sample base mass value shall be more than 30.
The above requirements only apply to short sequence reads ≤ 350 bp.
© ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/TS 24420:2023(E)
5.2.3 For human or animal or plant or all sourced samples, the host-reads shall be removed by mapping
to a human or animal or plant genome reference, such as UniRef, Unified Human Gastrointestinal
Protein catalog. Only clean data should be used in further bioinformatic analysis.
5.2.4 Detection and elimination of repeats and sequencing errors shall be performed as the first
step in data processing. The following factors and situations shall be considered in the process of
elimination.a) Mismatch, insertion or missing (indels) (only when a reference genome is available) and uncertain
bases (N characters).b) Unrecognizable sequence, which can be caused if the reads extend to the 3'end of the adaptor when
the target sequence is shorter.c) PCR biases in the library preparation in accordance with ISO 20397-1:2022, 5.8.
5.3 Sequence assembly5.3.1 The depth of sequencing shall be evaluated before the sequence assembly, which should take
the complexity of the sample into account.5.3.2 Samples lacking a reference
...
© ISO 2022 – All rights reserved
ISO/DTS.2 24420:20222023(E)
Date: 2023-01-09
ISO/TC 276
Secretariat: DIN
Biotechnology — Massively parallel DNA sequencing — General requirements for data
processing of shotgun metagenomic sequencesBiotechnologie — Séquençage d'ADN massivement parallèle -— Exigences générales pour le
traitement des données des séquences métagénomiques "Shotgun"DTS.2 stage
Warning for WDs and CDs
This document is not an ISO International Standard. It is distributed for review and comment.
It is subject to change without notice and may not be referred to as an International Standard.
Recipients of this draft are invited to submit, with their comments, notification of any relevant
patent rights of which they are aware and to provide supporting documentation.To help you, this guide on writing standards was produced by the ISO/TMB and is available at
http://www.iso.org/iso/how-to-write-standards.pdfA model manuscript of a draft International Standard (known as “The Rice Model”) is available at
http://www.iso.org/iso/model_document-rice_model.pdf---------------------- Page: 1 ----------------------
© ISO 2022
---------------------- Page: 2 ----------------------
ISO/DTS.2 24420:20222023(E)
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part
of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or
mechanical, including photocopying, or posting on the internet or an intranet, without prior written
permission. Permission can be requested from either ISO at the address below or ISO’s member body
in the country of the requester.ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.orgwww.iso.org
Published in Switzerland
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
2 © ISO 2022 – All rights reserved
ii © ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/DTS.2 24420:20222023(E)
Contents
Foreword .......................................................................................................................................................................... 5
Introduction..................................................................................................................................................................... 6
1 Scope .................................................................................................................................................................... 1
2 Normative references .................................................................................................................................... 1
3 Terms and definitions .................................................................................................................................... 1
4 Processing workflow ...................................................................................................................................... 5
5 Data processing ................................................................................................................................................ 6
5.1 Facilities and software requirements ...................................................................................................... 6
5.2 Sequence quality control and error determination ............................................................................ 6
5.3 Sequence assembly ......................................................................................................................................... 7
6 Data analysis ..................................................................................................................................................... 7
6.1 Annotation ......................................................................................................................................................... 7
6.2 Calculation of species relative abundance ............................................................................................. 8
7 Data archive and metadata .......................................................................................................................... 8
7.1 Original data ...................................................................................................................................................... 8
7.2 Sequencing analytical data .......................................................................................................................... 9
7.3 Data directory and archive .......................................................................................................................... 9
Annex A ........................................................................................................................................................................... 11
Annex B ........................................................................................................................................................................... 18
Bibliography ................................................................................................................................................................. 21
Foreword ......................................................................................................................................................................... iv
Introduction..................................................................................................................................................................... v
1 Scope .................................................................................................................................................................... 1
2 Normative references .................................................................................................................................... 1
3 Terms and definitions .................................................................................................................................... 1
4 Processing workflow ...................................................................................................................................... 5
5 Data processing ................................................................................................................................................ 5
5.1 Facilities and software requirements ...................................................................................................... 5
5.2 Sequence quality control and error determination ............................................................................ 5
5.3 Sequence assembly ......................................................................................................................................... 6
6 Data analysis ..................................................................................................................................................... 6
6.1 Annotation ......................................................................................................................................................... 6
6.2 Calculation of species relative abundance ............................................................................................. 7
7 Data archive and metadata .......................................................................................................................... 7
7.1 Original data ...................................................................................................................................................... 7
7.2 Sequencing analytical data .......................................................................................................................... 7
7.3 Data directory and archive .......................................................................................................................... 8
Formatted: Font: 11 ptAnnex A ........................................................................................................................................................................... 10
Formatted: Space Before: 0 pt, Line spacing: Exactly 11pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 3
© ISO 2023 – All rights reserved iii
---------------------- Page: 4 ----------------------
ISO/DTS.2 24420:20222023(E)
Annex B ........................................................................................................................................................................... 16
Bibliography ................................................................................................................................................................. 18
Formatted: Font: 11 ptFormatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
4 © ISO 2022 – All rights reserved
iv © ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/DTS.2 24420:20222023(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO
collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives 2 (see
www.iso.org/directives).Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on
the ISO list of patent declarations received (see www.iso.org/patentswww.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the World
Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.htmlwww.iso.org/iso/foreword.html.This document was prepared by Technical Committee ISO/TC 276, Biotechnology.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found atwww.iso.org/members.htmlwww.iso.org/members.html.
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 5
© ISO 2023 – All rights reserved v
---------------------- Page: 6 ----------------------
ISO/DTS.2 24420:20222023(E)
Introduction
Shotgun metagenomic sequencing genomes of organisms in a complex sample in a community to gain
knowledge of its composition and function is widely used in life science and clinical applications, such as
human complex disease associated analysis, environmental microecology and other fields. It has
potential to provide significant scientific data for life science research.The utility of this technique is its ability to reveal the microbial diversity and abundance found in
microbial populations from multiple environments and to determine sequence information
(Taxonomictaxonomic characterization, functional annotation, and comparative analysis
/metagenomics) for individual organisms in these populations. The resulting data can be subjected to
comparative analytics. Massively parallel shotgun metagenomic sequencing generates a large amount of
data containing a high complexity of microbial genomes and a large number of unknown species. It is
important to use effective processing procedures and address quality control for shotgun metagenomic
sequencing data. A standardised data format is essential to promote data sharing.
As with any advanced technology, massively parallel sequencing technologies is error prone. Overcoming
these shortcomings to ensure a reliable sequencing and analytical outcome is important. This document
provides a uniform standard for the collation, storage and subsequent analysis of metagenomic data, and
guidelines. It provides requirements and recommendations for the workflow and process of shotgun
metagenomic analyses including quality control of sequencing data and metadata, and the compositional
and functional analysis of microbial community. These requirements and recommendations can ensure
accuracy of data generated from metagenomic analysis, address potential errors and facilitate
downstream applications.Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
6 © ISO 2022 – All rights reserved
vi © ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/DTS.2 24420:20222023(E)
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 7
© ISO 2023 – All rights reserved vii
---------------------- Page: 8 ----------------------
TECHNICAL SPECIFICATION ISO/DTS 24420:2023(E)
Biotechnology — Massively parallel DNA sequencing — General
requirements for data processing of shotgun metagenomic
sequences
1 Scope
This document illustrates the workflow of shotgun metagenomic sequence data processing of host-
derived microbiome and environmental metagenomes.This document specifies the requirements for quality control of shotgun metagenomic sequence data
processing for massively parallel DNA sequencing.This document provides guidelines for data directory, data archive and metadata for shotgun
metagenomic sequence data.This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence
data.This document applies to shotgun metagenomic sequence data processing and analyses, but excludes
functional analysis.2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 20397--1:2022, Biotechnology — General requirements for massivelyMassively parallel sequencing
— Part 1: Nucleic acid and library preparationISO 20397-2:2021, Biotechnology — Massively parallel sequencing — Part 2: Methods to evaluate
the quality of sequencing data43 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp— IEC Electropedia: available at https://www.electropedia.org/
3.1
attribute value
value associated with an attribute instance
[SOURCE: ISO 21962:2003, 1.5.2.3]
3.2
category
set of thingsitems or concepts that share a common attribute or feature
© ISO 2023 – All rights reserved 1
---------------------- Page: 9 ----------------------
ISO/DTS.2 24420:20222023(E)
3.3
classification
exhaustive set of mutually exclusive categories to aggregate data at a pre-prescribed level of
specialization for a specific purpose[SOURCE: ISO 17115:2007, 2.7.1]
3.4
clean data
sequencing data obtained after a pre-processing procedure which usually includes multiple trimming and
filtering steps to ensure specific quality levels (e.g., per-base quality, host/contaminant sequences
removed, linkers/adaptors removed)3.5
code
system of rule(s) to convert information such as text, images, sounds or electric, photonic or magnetic
signals into another form or representation to facilitate analysis, communication or storage in a storage
medium[SOURCE: ISO 20691:2022, 3.6]
3.6
coding
encoding
process of assigning code to things or concepts
3.7
contig
contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome
or plasmid3.8
data format
arrangement of data according to preset specifications
Note 1 to entry: Preset specifications are usually made for computer processing.
3.9
data element
single unit of data that in a certain context is considered indivisible
[SOURCE: ISO/TS 21089:2018, 3.44]
3.10
directory
list of data items, which gives itemized information enabling traceability, identification and findability of
related dataNote 1 to entry: A directory can be arranged in alphabetical, chronological or systematic order.
Formatted: Font: 11 pt3.11
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
directory identifier
pt, Tab stops: Not at 487.6 pt
2 © ISO 2022 – All rights reserved
2 © ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/DTS 24420:2023(E)
unique language-independent sign assigned to the archive directory in the structure
3.12gene
sequence of nucleotides in DNA or RNA encoding either an RNA or a protein product
Note 1 to entry: Genes are recognized as the basic unit of heredity.Note 2 to entry: A gene can consist of non-contiguous nucleic acid segments that are rearranged through a nuclear
processing step.Note 3 to entry: A gene may include or be part of an operon that includes elements for gene expression.
[SOURCE: ISO 20397-2:2021, 3.16]3.13
identifier
sequence of characters, capable of uniquely identifying that with which it is associated, within a specified
context[SOURCE: ISO/IEC 11179-1:2015, 3.33]
3.14
layer code
hierarchical code consisting of membership order of coded objects
3.15
metagenomics analytical data
set of elements to describe qualitative or quantitative analytical attributes of processed metagenomic
raw data3.1615
name
semantic, natural language labels given to data elements, and variations of these labels serve different
functions[SOURCE: ISO/IEC 11179-1:2015, 3.43]
3.1716
public attribute
attribute that can have same attribute value for different data in the directory
3.1817
quality score
Q score
Phred score
quality of base calling
measure of the probability of correct base recognition, usually expressed directly by a numerical value
Note 1 to entry: Q is defined by the following equation:Formatted: Font: 11 pt
Q = ―10log 10 = −10log (p) Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt© ISO 2022 – All rights reserved 3
© ISO 2023 – All rights reserved 3
---------------------- Page: 11 ----------------------
ISO/DTS.2 24420:20222023(E)
where
p is the estimated probability of the base call being wrong.
Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of
99 %.Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of
99,9 %.Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a
significant portion of the reads being unusable. Low quality scores may also indicate false-positive variant calls,
resulting in inaccurate conclusions.[SOURCE: ISO 20397-2:2021, 3.32, modified ― Note 3 was added.]
3.1918
raw data
primary sequencing data produced by a sequencer without involving any software-based pre-filtering for
analysis purpose[SOURCE: ISO 20397-2:2021, 3.21]
3.2019
relative abundance
fraction of a single microorganism operational taxonomic unit in the total microbial community of a
defined environmentNote 1 to entry: It usually represented as a percentage.
3.2120
repeatability requirement
requirement of consistency under a set of repeatable measurement conditions
3.2221
scaffold
reconstructed genomic sequence created by chaining contigs together using additional information about
the relative position and orientation of the contigs in the genome3.2322
sequence assembly
processing, aligning and merging individual sequencing reads in order to reconstruct longer DNA
sequences, entire genes or genomesNote 1 to entry: whenWhen sequencing a novel genome where there is no reference sequence available for
alignment, sequence reads are assembled as contigs, that is the de novo assembly.
3.2423shotgun metagenomic sequencing
shotgun metagenomics
nucleotide sequence determination of the genomes of untargeted cells in communities in order to
determine community composition and functionFormatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
4 © ISO 2022 – All rights reserved
4 © ISO 2023 – All rights reserved
---------------------- Page: 12 ----------------------
ISO/DTS 24420:2023(E)
Note 1 to entry: For the microbiome, shotgun metagenomics focuses on microbial communities in specific
environments.Note 2 to entry: For shotgun metagenomic sequencing, DNA is extracted from the microbes in the sample directly
without isolation and culture. That DNA is then used to analyse the genetic composition, species classification,
phylogeny, gene function, or metabolic network or combinations thereof.3.2524
specialized attribute
attribute that is unique for each sample in the directory
54 Processing workflow
The basic workflow of metagenomics should include sequencing, data processing and data analysis. Data
processing includes pre-processing, quality control, data assembly, data profiling and annotation, as
shown in Figure 1.Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 5
© ISO 2023 – All rights reserved 5
---------------------- Page: 13 ----------------------
ISO/DTS.2 24420:20222023(E)
Figure 1 — Workflow of metagenomic data processing
65 Data processing
6.15.1 Facilities and software requirements
5.1.1 The software pipeline for metagenomics bioinformatics shall be validated. Applications for the
pipeline should be locked down including the complete set of tools, code, operational environment, and
network connections that compose the pipeline before using it for analytical purposes such as shell (e.g.,
BASH), GNU R, and Python. Changes to any components of the pipeline require revalidation to ensure that
there is no impact in the performance characteristics of the pipeline.5.1.2 High -performance computing technologies may be used at any step in the process to ensure
proper management and curation of large collections of complex procaryotic and eucaryotic genomes as
processing massive datasets is a prerequisite for NGS metagenomics analytics.6.25.2 Sequence quality control and error determination
5.2.1 Raw metagenomic sequencing data shall initially be passed through a quality control (QC) process
to ensure a clean dataset. The evaluation should follow ISO 20397-1:2022, Clause 4 and 8.3, and
ISO 20397-2:20202021, 4.3.5.2.2 The available data quality values for each DNA sample after sequencing should meet the following
requirements:a) Q20 ≥ ≥ 90 %, above 90 % of the sample base mass value shall be more than 20;
b) Q30 ≥ ≥ 80 %, above 80 % of the sample base mass value shall be more than 30.
Formatted: Font: 11 ptFormatted: Space Before: 0 pt, Line spacing: Exactly 11
The above requirements only apply to short sequence reads ≤ 350 bp.
pt, Tab stops: Not at 487.6 pt
6 © ISO 2022 – All rights reserved
6 © ISO 2023 – All rights reserved
---------------------- Page: 14 ----------------------
ISO/DTS 24420:2023(E)
5.2.3 For human or animal or plant or all sourced samples, the host-reads shall be removed by mapping
to a human or animal or plant genome reference, such as UniRef, Unified Human Gastrointestinal Protein
catalog. Only clean sequencing data should be used in further bioinformatic analysis.
5.2.4 Detection and elimination of repeats and sequencing errors shall be performed as the first step in
data processing. The following factors and situations shall be taken into accountconsidered in the process
of elimination: .a) Mismatch, insertion or missing (indels) (only when a reference genome is available) and uncertain
bases (N characters)).b) Unrecognizable sequence, which can be caused if the reads extend to the 3 'end3'end of the adaptor
when the target sequence is shorter.c) PCR biases in the library preparation according toin accordance with ISO 20397-1:2022, 5.8.
6.35.3 Sequence assembly5.3.1 The depth of sequencing shall be evaluated before the sequence assembly, which should take the
complexity of the sample into account.5.3.2 Samples lacking a reference genome dataset, such as soil or ocean samples, should use sequence
assembly.5.3.3 Contigs or scaffolds or both that are directly obtained from sequence fragments without any
reference should be regarded as de novo assemblies.5.3.4 The selection of the sequence assembly software should depend on the relative importance of the
accuracy, contigs’ size, input data type, and available computational resources.5.3.5 A non-redundant gene catalogue can be obtained by predicting genes from assembled contigs.
For well characterized microbiomes, e.g., human gut-borne, a credible gene catalogue (e.g., Integrated
Gene Catalog (IGC)) can be used for quick identification and quantification of data from metagenomic
sequencing.5.3.6 Created assemblies should be evaluated to assess their quality, e.g., QUAST.
76 Data analysis7.16.1 Annotation
6.1.1 The annotation methods should be described. The number of reference genomes selected, and
reference genomes or reference database used for the annotation should be documented.
6.1.2 Taxonomy profile methods should be chosen according to data and application needs to obtain a
higher-level taxonomy profile (e.g., species, genus, order, phylum) including metagenomic linkage groups
(MLG), metagenomic clusters (MGC) or metagenomic species (MGS).Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 7
© ISO 2023 – All rights reserved 7
---------------------- Page: 15 ----------------------
ISO/DTS.2 24420:20222023(E)
6.1.3 Taxonomy profiling should base on reference databases, such as RefSeq complete genomes
(RefSeq CG) for microbial species and the BLAST databases for high-quality nucleotide and protein
sequences. Classification accuracy, speed, and computational requirements should be taken into account
when select taxonomic classification tools.6.1.4 If the profile is obtained by a de novo sequence assembly method, the species information should
be identified when the alignment with a sequence similarity of more than 97 % and the coverage of more
than 90 % to the most related reference database is determined.6.1.5 For read-based approaches, the read should do mapping to NR for taxonomy data (e.g.,
Blast,BLAST , Diamond or Last) or marker genes after read merge.6.1.6 Metagenomic profiles should be annotated to various levels according to the reference
annotation, i.e., species, genus, or higher.7.26.2 Calculation of species relative abundance
7.2.16.2.1 Species analysis
6.2.1.1 The relative abundance calculation method should be defined and implemented to meet the
repeatability requirement. The relative abundance calculation method shall be documented.
6.2.1.2 Calculation tools should be selected consideringwith consideration to reflect the actual relative
abundance of the target operational taxonomic unit in the sample.7.2.26.2.2 Gene analysis
6.2.2.1 A gene abundance table can be generated using alignment-based tools or alignment-free
methods.6.2.2.2 The relative abundance distribution at the gene level can be obtained by comparing the clean
sequencesdata to the assembled gene set or the appropriate reference database.6.2.2.3 Superposition of the relative abundance of the gene sequence of the same species shall be done
to get the operational taxonomic unit.87 Data archive and metadata
8.17.1 Original data
7.1.1 Sequencing data volume should be evaluated to obtain saturated gene information in
metagenome-wide association studies (MWAS).7.1.2 Regardless of the sample source, the sequencing data format should be the same for each
sequence. The sequence data format should be stored in a standard format that can preserve the
information of biological sequences (usually nucleic acid sequences) or their sequencing quality; ISO
Formatted: Font: 11 pt1 ®
BLAST is the trademark of a product supplied by the National Center for Biotechnology Informa
...FINAL
TECHNICAL ISO/DTS
DRAFT
SPECIFICATION 24420
ISO/TC 276
Biotechnology — Massively parallel
Secretariat: DIN
DNA sequencing — General
Voting begins on:
2023-02-20 requirements for data processing of
shotgun metagenomic sequences
Voting terminates on:
2023-04-17
Biotechnologie — Séquençage d'ADN massivement parallèle —
Exigences générales pour le traitement des données des séquences
métagénomiques "Shotgun"
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/DTS 24420:2023(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS. © ISO 2023
---------------------- Page: 1 ----------------------
ISO/DTS 24420:2023(E)
FINAL
TECHNICAL ISO/DTS
DRAFT
SPECIFICATION 24420
ISO/TC 276
Biotechnology — Massively parallel
Secretariat: DIN
DNA sequencing — General
Voting begins on:
requirements for data processing of
shotgun metagenomic sequences
Voting terminates on:
Biotechnologie — Séquençage d'ADN massivement parallèle —
Exigences générales pour le traitement des données des séquences
métagénomiques "Shotgun"
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.RECIPIENTS OF THIS DRAFT ARE INVITED TO
ISO copyright office
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
CP 401 • Ch. de Blandonnet 8
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
CH-1214 Vernier, Geneva
DOCUMENTATION.
Phone: +41 22 749 01 11
IN ADDITION TO THEIR EVALUATION AS
Reference number
Email: copyright@iso.org
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO
ISO/DTS 24420:2023(E)
Website: www.iso.org
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
Published in Switzerland
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN
DARDS TO WHICH REFERENCE MAY BE MADE IN
© ISO 2023 – All rights reserved
NATIONAL REGULATIONS. © ISO 2023
---------------------- Page: 2 ----------------------
ISO/DTS 24420:2023(E)
Contents Page
Foreword ........................................................................................................................................................................................................................................iv
Introduction .................................................................................................................................................................................................................................v
1 Scope ................................................................................................................................................................................................................................. 1
2 Normative references ..................................................................................................................................................................................... 1
3 Terms and definitions .................................................................................................................................................................................... 1
4 Processing workflow .......................................................................................................................................................................................4
5 Data processing ..................................................................................................................................................................................................... 5
5.1 Facilities and software requirements ................................................................................................................................ 5
5.2 Sequence quality control and error determination ............................................................................................... 5
5.3 Sequence assembly ............................................................................................................................................................................. 6
6 Data analysis ............................................................................................................................................................................................................ 6
6.1 Annotation ...................................................................... ............................................................................................................................ 6
6.2 Calculation of species relative abundance ..................................................................................................................... 7
6.2.1 Species analysis ................................................................................................................................................................... 7
6.2.2 Gene analysis ......................................................................................................................................................................... 7
7 Data archive and metadata ....................................................................................................................................................................... 7
7.1 Original data ............................................................................................................................................................................................. 7
7.2 Sequencing analytical data .......................................................................................................................................................... 7
7.3 Data directory and archive .......................................................................................................................................................... 8
7.3.1 General ........................................................................................................................................................................................ 8
7.3.2 Directory of data elements ........................................................................................................................................ 8
7.3.3 Data archiving ...................................................................................................................................................................... 8
7.4 Metadata ...................................................................................................................................................................................................... 8
Annex A (informative) Examples of data format .................................................................................................................................10
Annex B (informative) Directory of data elements ...........................................................................................................................16
Bibliography .............................................................................................................................................................................................................................18
iii© ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/DTS 24420:2023(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and nongovernmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.This document was prepared by Technical Committee ISO/TC 276, Biotechnology.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.© ISO 2023 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/DTS 24420:2023(E)
Introduction
Shotgun metagenomic sequencing genomes of organisms in a complex sample in a community to gain
knowledge of its composition and function is widely used in life science and clinical applications, such
as human complex disease associated analysis, environmental microecology and other fields. It has
potential to provide significant scientific data for life science research.The utility of this technique is its ability to reveal the microbial diversity and abundance found in
microbial populations from multiple environments and to determine sequence information (taxonomic
characterization, functional annotation, and comparative analysis/metagenomics) for individual
organisms in these populations. The resulting data can be subjected to comparative analytics.
Massively parallel shotgun metagenomic sequencing generates a large amount of data containing a
high complexity of microbial genomes and a large number of unknown species. It is important to use
effective processing procedures and address quality control for shotgun metagenomic sequencing data.
A standardised data format is essential to promote data sharing.As with any advanced technology, massively parallel sequencing technologies is error prone.
Overcoming these shortcomings to ensure a reliable sequencing and analytical outcome is important.
This document provides a uniform standard for the collation, storage and subsequent analysis of
metagenomic data, and guidelines. It provides requirements and recommendations for the workflow
and process of shotgun metagenomic analyses including quality control of sequencing data and
metadata, and the compositional and functional analysis of microbial community. These requirements
and recommendations can ensure accuracy of data generated from metagenomic analysis, address
potential errors and facilitate downstream applications.© ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/DTS 24420:2023(E)
Biotechnology — Massively parallel DNA sequencing —
General requirements for data processing of shotgun
metagenomic sequences
1 Scope
This document illustrates the workflow of shotgun metagenomic sequence data processing of host-
derived microbiome and environmental metagenomes.This document specifies the requirements for quality control of shotgun metagenomic sequence data
processing for massively parallel DNA sequencing.This document provides guidelines for data directory, data archive and metadata for shotgun
metagenomic sequence data.This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence
data.This document applies to shotgun metagenomic sequence data processing and analyses, but excludes
functional analysis.2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 203971:2022, Biotechnology — Massively parallel sequencing — Part 1: Nucleic acid and library
preparation3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp— IEC Electropedia: available at https:// www .electropedia .org/
3.1
attribute value
value associated with an attribute instance
[SOURCE: ISO 21962:2003, 1.5.2.3]
3.2
category
set of items or concepts that share a common attribute or feature
© ISO 2023 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/DTS 24420:2023(E)
3.3
classification
exhaustive set of mutually exclusive categories to aggregate data at a pre-prescribed level of
specialization for a specific purpose[SOURCE: ISO 17115:2007, 2.7.1]
3.4
clean data
sequencing data obtained after a pre-processing procedure which usually includes multiple trimming
and filtering steps to ensure specific quality levels (e.g., per-base quality, host/contaminant sequences
removed, linkers/adaptors removed)3.5
code
system of rule(s) to convert information such as text, images, sounds or electric, photonic or magnetic
signals into another form or representation to facilitate analysis, communication or storage in a storage
medium[SOURCE: ISO 20691:2022, 3.6]
3.6
encoding
process of assigning code to things or concepts
3.7
contig
contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome
or plasmid3.8
data format
arrangement of data according to preset specifications
Note 1 to entry: Preset specifications are usually made for computer processing.
3.9
data element
single unit of data that in a certain context is considered indivisible
[SOURCE: ISO/TS 21089:2018, 3.44]
3.10
directory
list of data items, which gives itemized information enabling traceability, identification and findability
of related dataNote 1 to entry: A directory can be arranged in alphabetical, chronological or systematic order.
3.11directory identifier
unique language-independent sign assigned to the archive directory in the structure
3.12gene
sequence of nucleotides in DNA or RNA encoding either an RNA or a protein product
Note 1 to entry: Genes are recognized as the basic unit of heredity.Note 2 to entry: A gene can consist of non-contiguous nucleic acid segments that are rearranged through a
nuclear processing step.© ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/DTS 24420:2023(E)
Note 3 to entry: A gene may include or be part of an operon that includes elements for gene expression.
[SOURCE: ISO 203972:2021, 3.16]3.13
identifier
sequence of characters, capable of uniquely identifying that with which it is associated, within a
specified context[SOURCE: ISO/IEC 111791:2015, 3.33]
3.14
analytical data
set of elements to describe qualitative or quantitative analytical attributes of processed metagenomic
raw data3.15
name
semantic, natural language labels given to data elements, and variations of these labels serve different
functions[SOURCE: ISO/IEC 111791:2015, 3.43]
3.16
public attribute
attribute that can have same attribute value for different data in the directory
3.17
quality score
Q score
Phred score
quality of base calling
measure of the probability of correct base recognition, usually expressed directly by a numerical value
Note 1 to entry: Q is defined by the following equation:Q = −10log (p)
where p is the estimated probability of the base call being wrong.
Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of
99 %.Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of
99,9 %.Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a
significant portion of the reads being unusable. Low quality scores may also indicate false-positive variant calls,
resulting in inaccurate conclusions.[SOURCE: ISO 20397-2:2021, 3.32, modified ― Note 3 was added.]
3.18
raw data
primary sequencing data produced by a sequencer without involving any software-based pre-filtering
for analysis purpose[SOURCE: ISO 203972:2021, 3.21]
© ISO 2023 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/DTS 24420:2023(E)
3.19
relative abundance
fraction of a single microorganism operational taxonomic unit in the total microbial community of a
defined environmentNote 1 to entry: It usually represented as a percentage.
3.20
repeatability requirement
requirement of consistency under a set of repeatable measurement conditions
3.21
scaffold
reconstructed genomic sequence created by chaining contigs together using additional information
about the relative position and orientation of the contigs in the genome3.22
sequence assembly
processing, aligning and merging individual sequencing reads in order to reconstruct longer DNA
sequences, entire genes or genomesNote 1 to entry: When sequencing a novel genome where there is no reference sequence available for alignment,
sequence reads are assembled as contigs, that is the de novo assembly.3.23
shotgun metagenomic sequencing
shotgun metagenomics
nucleotide sequence determination of the genomes of untargeted cells in communities in order to
determine community composition and functionNote 1 to entry: For the microbiome, shotgun metagenomics focuses on microbial communities in specific
environments.Note 2 to entry: For shotgun metagenomic sequencing, DNA is extracted from the microbes in the sample directly
without isolation and culture. That DNA is then used to analyse the genetic composition, species classification,
phylogeny, gene function, or metabolic network or combinations thereof.3.24
specialized attribute
attribute that is unique for each sample in the directory
4 Processing workflow
The basic workflow of metagenomics should include sequencing, data processing and data analysis.
Data processing includes pre-processing, quality control, data assembly, data profiling and annotation,
as shown in Figure 1.© ISO 2023 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/DTS 24420:2023(E)
Figure 1 — Workflow of metagenomic data processing
5 Data processing
5.1 Facilities and software requirements
5.1.1 The software pipeline for metagenomics bioinformatics shall be validated. Applications for the
pipeline should be locked down including the complete set of tools, code, operational environment, and
network connections that compose the pipeline before using it for analytical purposes such as shell
(e.g., BASH), GNU R, and Python. Changes to any components of the pipeline require revalidation to
ensure that there is no impact in the performance characteristics of the pipeline.
5.1.2 High-performance computing technologies may be used at any step in the process to ensure
proper management and curation of large collections of complex procaryotic and eucaryotic genomes
as processing massive datasets is a prerequisite for NGS metagenomics analytics.5.2 Sequence quality control and error determination
5.2.1 Raw metagenomic sequencing data shall initially be passed through a quality control (QC)
process to ensure a clean dataset. The evaluation should follow ISO 203971:2022, Clause 4 and 8.3, and
ISO 203972:2021, 4.3.5.2.2 The available data quality values for each DNA sample after sequencing should meet the
following requirements:a) Q20 ≥ 90 %, above 90 % of the sample base mass value shall be more than 20;
b) Q30 ≥ 80 %, above 80 % of the sample base mass value shall be more than 30.
The above requirements only apply to short sequence reads ≤ 350 bp.
© ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/DTS 24420:2023(E)
5.2.3 For human or animal or plant or all sourced samples, the host-reads shall be removed by mapping
to a human or animal or plant genome reference, such as UniRef, Unified Human Gastrointestinal
Protein catalog. Only clean data should be used in further bioinformatic analysis.
5.2.4 Detection and elimination of repeats and sequencing errors shall be performed as the first
step in data processing. The following factors a...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.