Biotechnology — Massively parallel DNA sequencing — General requirements for data processing of shotgun metagenomic sequences

This document illustrates the workflow of shotgun metagenomic sequence data processing of host-derived microbiome and environmental metagenomes. This document specifies the requirements for quality control of shotgun metagenomic sequence data processing for massively parallel DNA sequencing. This document provides guidelines for data directory, data archive and metadata for shotgun metagenomic sequence data. This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence data. This document applies to shotgun metagenomic sequence data processing and analyses, but excludes functional analysis.

Biotechnologie — Séquençage d'ADN massivement parallèle — Exigences générales pour le traitement des données des séquences métagénomiques "Shotgun"

General Information

Status
Published
Publication Date
18-May-2023
Current Stage
6060 - International Standard published
Start Date
19-May-2023
Due Date
22-Jun-2023
Completion Date
19-May-2023
Ref Project

Buy Standard

Technical specification
ISO/TS 24420:2023 - Biotechnology — Massively parallel DNA sequencing — General requirements for data processing of shotgun metagenomic sequences Released:19. 05. 2023
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
REDLINE ISO/DTS 24420 - Biotechnology — Massively parallel DNA sequencing — General requirements for data processing of shotgun metagenomic sequences Released:2/6/2023
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/DTS 24420 - Biotechnology — Massively parallel DNA sequencing — General requirements for data processing of shotgun metagenomic sequences Released:2/6/2023
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

TECHNICAL ISO/TS
SPECIFICATION 24420
First edition
2023-05
Biotechnology — Massively parallel
DNA sequencing — General
requirements for data processing of
shotgun metagenomic sequences
Biotechnologie — Séquençage d'ADN massivement parallèle —
Exigences générales pour le traitement des données des séquences
métagénomiques "Shotgun"
Reference number
ISO/TS 24420:2023(E)
© ISO 2023
---------------------- Page: 1 ----------------------
ISO/TS 24420:2023(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on

the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below

or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO 2023 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/TS 24420:2023(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction .................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ..................................................................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................................................................... 1

4 Processing workflow .......................................................................................................................................................................................4

5 Data processing ..................................................................................................................................................................................................... 5

5.1 Facilities and software requirements ................................................................................................................................ 5

5.2 Sequence quality control and error determination ............................................................................................... 5

5.3 Sequence assembly ............................................................................................................................................................................. 6

6 Data analysis ............................................................................................................................................................................................................ 6

6.1 Annotation ...................................................................... ............................................................................................................................ 6

6.2 Calculation of species relative abundance ..................................................................................................................... 7

6.2.1 Species analysis ................................................................................................................................................................... 7

6.2.2 Gene analysis ......................................................................................................................................................................... 7

7 Data archive and metadata ....................................................................................................................................................................... 7

7.1 Original data ............................................................................................................................................................................................. 7

7.2 Sequencing analytical data .......................................................................................................................................................... 7

7.3 Data directory and archive .......................................................................................................................................................... 8

7.3.1 General ........................................................................................................................................................................................ 8

7.3.2 Directory of data elements ........................................................................................................................................ 8

7.3.3 Data archiving ...................................................................................................................................................................... 8

7.4 Metadata ...................................................................................................................................................................................................... 8

Annex A (informative) Examples of data format ................................................................................................................................10

Annex B (informative) Directory of data elements ..........................................................................................................................16

Bibliography .............................................................................................................................................................................................................................18

iii
© ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/TS 24420:2023(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO document should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

ISO draws attention to the possibility that the implementation of this document may involve the use

of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed

patent rights in respect thereof. As of the date of publication of this document, ISO had not received

notice of (a) patent(s) which may be required to implement this document. However, implementers are

cautioned that this may not represent the latest information, which may be obtained from the patent

database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all

such patent rights.

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to

the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 276, Biotechnology.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2023 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/TS 24420:2023(E)
Introduction

Shotgun metagenomic sequencing of organisms’ genomes from a complex sample is widely used in

life science and clinical applications (e.g. human complex disease associated analysis, environmental

microecology and other fields) in order to gain knowledge of their composition and function. It has

potential to provide significant scientific data for life science research.

The utility of this technique is its ability to reveal the microbial diversity and abundance found in

microbial populations from multiple environments and to determine sequence information (taxonomic

characterization, functional annotation, and comparative analysis/metagenomics) for individual

organisms in these populations. The resulting data can be subjected to comparative analytics.

Massively parallel shotgun metagenomic sequencing generates a large amount of data containing a

high complexity of microbial genomes and a large number of unknown species. It is important to use

effective processing procedures and address quality control for shotgun metagenomic sequencing data.

A standardised data format is essential to promote data sharing.

As with any advanced technology, massively parallel sequencing technologies is error prone.

Overcoming these shortcomings to ensure a reliable sequencing and analytical outcome is important.

This document provides a uniform standard for the collation, storage and subsequent analysis of

metagenomic data, and guidelines. It provides requirements and recommendations for the workflow

and process of shotgun metagenomic analyses including quality control of sequencing data and

metadata, and the compositional and functional analysis of microbial community. These requirements

and recommendations can ensure accuracy of data generated from metagenomic analysis, address

potential errors and facilitate downstream applications.
© ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/TS 24420:2023(E)
Biotechnology — Massively parallel DNA sequencing —
General requirements for data processing of shotgun
metagenomic sequences
1 Scope

This document illustrates the workflow of shotgun metagenomic sequence data processing of host-

derived microbiome and environmental metagenomes.

This document specifies the requirements for quality control of shotgun metagenomic sequence data

processing for massively parallel DNA sequencing.

This document provides guidelines for data directory, data archive and metadata for shotgun

metagenomic sequence data.

This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence

data.

This document applies to shotgun metagenomic sequence data processing and analyses, but excludes

functional analysis.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 20397-1:2022, Biotechnology — Massively parallel sequencing — Part 1: Nucleic acid and library

preparation
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminology databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
attribute value
value associated with an attribute instance
[SOURCE: ISO 21962:2003, 1.5.2.3]
3.2
category
set of items or concepts that share a common attribute or feature
© ISO 2023 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/TS 24420:2023(E)
3.3
classification

exhaustive set of mutually exclusive categories to aggregate data at a pre-prescribed level of

specialization for a specific purpose
[SOURCE: ISO 17115:2007, 2.7.1]
3.4
clean data

sequencing data obtained after a pre-processing procedure which usually includes multiple trimming

and filtering steps to ensure specific quality levels (e.g., per-base quality, host/contaminant sequences

removed, linkers/adaptors removed)
3.5
code

system of rule(s) to convert information such as text, images, sounds or electric, photonic or magnetic

signals into another form or representation to facilitate analysis, communication or storage in a storage

medium
[SOURCE: ISO 20691:2022, 3.6]
3.6
encoding
process of assigning code to things or concepts
3.7
contig

contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome

or plasmid
3.8
data format
arrangement of data according to preset specifications
Note 1 to entry: Preset specifications are usually made for computer processing.
3.9
data element
single unit of data that in a certain context is considered indivisible
[SOURCE: ISO/TS 21089:2018, 3.44]
3.10
directory

list of data items, which gives itemized information enabling traceability, identification and findability

of related data

Note 1 to entry: A directory can be arranged in alphabetical, chronological or systematic order.

3.11
directory identifier

unique language-independent sign assigned to the archive directory in the structure

3.12
gene

sequence of nucleotides in DNA or RNA encoding either an RNA or a protein product

Note 1 to entry: Genes are recognized as the basic unit of heredity.

Note 2 to entry: A gene can consist of non-contiguous nucleic acid segments that are rearranged through a

nuclear processing step.
© ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/TS 24420:2023(E)

Note 3 to entry: A gene may include or be part of an operon that includes elements for gene expression.

[SOURCE: ISO 20397-2:2021, 3.16]
3.13
identifier

sequence of characters, capable of uniquely identifying that with which it is associated, within a

specified context
[SOURCE: ISO/IEC 11179-1:2015, 3.1.3]
3.14
analytical data

set of elements to describe qualitative or quantitative analytical attributes of processed metagenomic

raw data
3.15
name

semantic, natural language labels given to data elements, and variations of these labels serve different

functions
[SOURCE: ISO/IEC 11179-1:2015, 3.43]
3.16
public attribute
attribute that can have same attribute value for different data in the directory
3.17
quality score
Q score
Phred score
quality of base calling

measure of the probability of correct base recognition, usually expressed directly by a numerical value

Note 1 to entry: Q is defined by the following equation:
Q = −10log (p)
where p is the estimated probability of the base call being wrong.

Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of

99 %.

Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of

99,9 %.

Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a

significant portion of the reads being unusable. Low quality scores may also indicate false-positive variant calls,

resulting in inaccurate conclusions.
3.18
raw data

primary sequencing data produced by a sequencer without involving any software-based pre-filtering

for analysis purpose
[SOURCE: ISO 20397-2:2021, 3.21]
© ISO 2023 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/TS 24420:2023(E)
3.19
relative abundance

fraction of a single microorganism operational taxonomic unit in the total microbial community of a

defined environment
Note 1 to entry: It usually represented as a percentage.
3.20
repeatability requirement
requirement of consistency under a set of repeatable measurement conditions
3.21
scaffold

reconstructed genomic sequence created by chaining contigs together using additional information

about the relative position and orientation of the contigs in the genome
3.22
sequence assembly

processing, aligning and merging individual sequencing reads in order to reconstruct longer DNA

sequences, entire genes or genomes

Note 1 to entry: When sequencing a novel genome where there is no reference sequence available for alignment,

sequence reads are assembled as contigs, that is the de novo assembly.
3.23
shotgun metagenomic sequencing
shotgun metagenomics

nucleotide sequence determination of the genomes of untargeted cells in communities in order to

determine community composition and function

Note 1 to entry: For the microbiome, shotgun metagenomics focuses on microbial communities in specific

environments.

Note 2 to entry: For shotgun metagenomic sequencing, DNA is extracted from the microbes in the sample directly

without isolation and culture. That DNA is then used to analyse the genetic composition, species classification,

phylogeny, gene function, or metabolic network or combinations thereof.
3.24
specialized attribute
attribute that is unique for each sample in the directory
4 Processing workflow

The basic workflow of metagenomics should include sequencing, data processing and data analysis.

Data processing includes pre-processing, quality control, data assembly, data profiling and annotation,

as shown in Figure 1.
© ISO 2023 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/TS 24420:2023(E)
Figure 1 — Workflow of metagenomic data processing
5 Data processing
5.1 Facilities and software requirements

5.1.1 The software pipeline for metagenomics bioinformatics shall be validated. Applications for the

pipeline should be locked down including the complete set of tools, code, operational environment, and

network connections that compose the pipeline before using it for analytical purposes such as shell

(e.g., BASH), GNU R, and Python. Changes to any components of the pipeline require revalidation to

ensure that there is no impact in the performance characteristics of the pipeline.

5.1.2 High-performance computing technologies may be used at any step in the process to ensure

proper management and curation of large collections of complex procaryotic and eucaryotic genomes

as processing massive datasets is a prerequisite for NGS metagenomics analytics.
5.2 Sequence quality control and error determination

5.2.1 Raw metagenomic sequencing data shall initially be passed through a quality control (QC)

process to ensure a clean dataset. The evaluation should follow ISO 20397-1:2022, Clause 4 and 8.3, and

ISO 20397-2:2021, 4.3.

5.2.2 The available data quality values for each DNA sample after sequencing should meet the

following requirements:
a) Q20 ≥ 90 %, above 90 % of the sample base mass value shall be more than 20;
b) Q30 ≥ 80 %, above 80 % of the sample base mass value shall be more than 30.
The above requirements only apply to short sequence reads ≤ 350 bp.
© ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/TS 24420:2023(E)

5.2.3 For human or animal or plant or all sourced samples, the host-reads shall be removed by mapping

to a human or animal or plant genome reference, such as UniRef, Unified Human Gastrointestinal

Protein catalog. Only clean data should be used in further bioinformatic analysis.

5.2.4 Detection and elimination of repeats and sequencing errors shall be performed as the first

step in data processing. The following factors and situations shall be considered in the process of

elimination.

a) Mismatch, insertion or missing (indels) (only when a reference genome is available) and uncertain

bases (N characters).

b) Unrecognizable sequence, which can be caused if the reads extend to the 3'end of the adaptor when

the target sequence is shorter.

c) PCR biases in the library preparation in accordance with ISO 20397-1:2022, 5.8.

5.3 Sequence assembly

5.3.1 The depth of sequencing shall be evaluated before the sequence assembly, which should take

the complexity of the sample into account.
5.3.2 Samples lacking a reference
...

© ISO 2022 – All rights reserved
ISO/DTS.2 24420:20222023(E)
Date: 2023-01-09
ISO/TC 276
Secretariat: DIN

Biotechnology — Massively parallel DNA sequencing — General requirements for data

processing of shotgun metagenomic sequences

Biotechnologie — Séquençage d'ADN massivement parallèle -— Exigences générales pour le

traitement des données des séquences métagénomiques "Shotgun"
DTS.2 stage
Warning for WDs and CDs

This document is not an ISO International Standard. It is distributed for review and comment.

It is subject to change without notice and may not be referred to as an International Standard.

Recipients of this draft are invited to submit, with their comments, notification of any relevant

patent rights of which they are aware and to provide supporting documentation.

To help you, this guide on writing standards was produced by the ISO/TMB and is available at

http://www.iso.org/iso/how-to-write-standards.pdf

A model manuscript of a draft International Standard (known as “The Rice Model”) is available at

http://www.iso.org/iso/model_document-rice_model.pdf
---------------------- Page: 1 ----------------------
© ISO 2022
---------------------- Page: 2 ----------------------
ISO/DTS.2 24420:20222023(E)
© ISO 2023

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part

of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or

mechanical, including photocopying, or posting on the internet or an intranet, without prior written

permission. Permission can be requested from either ISO at the address below or ISO’s member body

in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.orgwww.iso.org
Published in Switzerland
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
2 © ISO 2022 – All rights reserved
ii © ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/DTS.2 24420:20222023(E)
Contents

Foreword .......................................................................................................................................................................... 5

Introduction..................................................................................................................................................................... 6

1 Scope .................................................................................................................................................................... 1

2 Normative references .................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................... 1

4 Processing workflow ...................................................................................................................................... 5

5 Data processing ................................................................................................................................................ 6

5.1 Facilities and software requirements ...................................................................................................... 6

5.2 Sequence quality control and error determination ............................................................................ 6

5.3 Sequence assembly ......................................................................................................................................... 7

6 Data analysis ..................................................................................................................................................... 7

6.1 Annotation ......................................................................................................................................................... 7

6.2 Calculation of species relative abundance ............................................................................................. 8

7 Data archive and metadata .......................................................................................................................... 8

7.1 Original data ...................................................................................................................................................... 8

7.2 Sequencing analytical data .......................................................................................................................... 9

7.3 Data directory and archive .......................................................................................................................... 9

Annex A ........................................................................................................................................................................... 11

Annex B ........................................................................................................................................................................... 18

Bibliography ................................................................................................................................................................. 21

Foreword ......................................................................................................................................................................... iv

Introduction..................................................................................................................................................................... v

1 Scope .................................................................................................................................................................... 1

2 Normative references .................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................... 1

4 Processing workflow ...................................................................................................................................... 5

5 Data processing ................................................................................................................................................ 5

5.1 Facilities and software requirements ...................................................................................................... 5

5.2 Sequence quality control and error determination ............................................................................ 5

5.3 Sequence assembly ......................................................................................................................................... 6

6 Data analysis ..................................................................................................................................................... 6

6.1 Annotation ......................................................................................................................................................... 6

6.2 Calculation of species relative abundance ............................................................................................. 7

7 Data archive and metadata .......................................................................................................................... 7

7.1 Original data ...................................................................................................................................................... 7

7.2 Sequencing analytical data .......................................................................................................................... 7

7.3 Data directory and archive .......................................................................................................................... 8

Formatted: Font: 11 pt

Annex A ........................................................................................................................................................................... 10

Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 3
© ISO 2023 – All rights reserved iii
---------------------- Page: 4 ----------------------
ISO/DTS.2 24420:20222023(E)

Annex B ........................................................................................................................................................................... 16

Bibliography ................................................................................................................................................................. 18

Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
4 © ISO 2022 – All rights reserved
iv © ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/DTS.2 24420:20222023(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO

collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives 2 (see

www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any

patent rights identified during the development of the document will be in the Introduction and/or on

the ISO list of patent declarations received (see www.iso.org/patentswww.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the World

Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.htmlwww.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 276, Biotechnology.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at
www.iso.org/members.htmlwww.iso.org/members.html.
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 5
© ISO 2023 – All rights reserved v
---------------------- Page: 6 ----------------------
ISO/DTS.2 24420:20222023(E)
Introduction

Shotgun metagenomic sequencing genomes of organisms in a complex sample in a community to gain

knowledge of its composition and function is widely used in life science and clinical applications, such as

human complex disease associated analysis, environmental microecology and other fields. It has

potential to provide significant scientific data for life science research.

The utility of this technique is its ability to reveal the microbial diversity and abundance found in

microbial populations from multiple environments and to determine sequence information

(Taxonomictaxonomic characterization, functional annotation, and comparative analysis

/metagenomics) for individual organisms in these populations. The resulting data can be subjected to

comparative analytics. Massively parallel shotgun metagenomic sequencing generates a large amount of

data containing a high complexity of microbial genomes and a large number of unknown species. It is

important to use effective processing procedures and address quality control for shotgun metagenomic

sequencing data. A standardised data format is essential to promote data sharing.

As with any advanced technology, massively parallel sequencing technologies is error prone. Overcoming

these shortcomings to ensure a reliable sequencing and analytical outcome is important. This document

provides a uniform standard for the collation, storage and subsequent analysis of metagenomic data, and

guidelines. It provides requirements and recommendations for the workflow and process of shotgun

metagenomic analyses including quality control of sequencing data and metadata, and the compositional

and functional analysis of microbial community. These requirements and recommendations can ensure

accuracy of data generated from metagenomic analysis, address potential errors and facilitate

downstream applications.
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
6 © ISO 2022 – All rights reserved
vi © ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/DTS.2 24420:20222023(E)
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 7
© ISO 2023 – All rights reserved vii
---------------------- Page: 8 ----------------------
TECHNICAL SPECIFICATION ISO/DTS 24420:2023(E)
Biotechnology — Massively parallel DNA sequencing — General
requirements for data processing of shotgun metagenomic
sequences
1 Scope

This document illustrates the workflow of shotgun metagenomic sequence data processing of host-

derived microbiome and environmental metagenomes.

This document specifies the requirements for quality control of shotgun metagenomic sequence data

processing for massively parallel DNA sequencing.

This document provides guidelines for data directory, data archive and metadata for shotgun

metagenomic sequence data.

This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence

data.

This document applies to shotgun metagenomic sequence data processing and analyses, but excludes

functional analysis.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 20397--1:2022, Biotechnology — General requirements for massivelyMassively parallel sequencing

— Part 1: Nucleic acid and library preparation

ISO 20397-2:2021, Biotechnology — Massively parallel sequencing — Part 2: Methods to evaluate

the quality of sequencing data
43 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminology databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https://www.iso.org/obp
— IEC Electropedia: available at https://www.electropedia.org/
3.1
attribute value
value associated with an attribute instance
[SOURCE: ISO 21962:2003, 1.5.2.3]
3.2
category
set of thingsitems or concepts that share a common attribute or feature
© ISO 2023 – All rights reserved 1
---------------------- Page: 9 ----------------------
ISO/DTS.2 24420:20222023(E)
3.3
classification

exhaustive set of mutually exclusive categories to aggregate data at a pre-prescribed level of

specialization for a specific purpose
[SOURCE: ISO 17115:2007, 2.7.1]
3.4
clean data

sequencing data obtained after a pre-processing procedure which usually includes multiple trimming and

filtering steps to ensure specific quality levels (e.g., per-base quality, host/contaminant sequences

removed, linkers/adaptors removed)
3.5
code

system of rule(s) to convert information such as text, images, sounds or electric, photonic or magnetic

signals into another form or representation to facilitate analysis, communication or storage in a storage

medium
[SOURCE: ISO 20691:2022, 3.6]
3.6
coding
encoding
process of assigning code to things or concepts
3.7
contig

contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome

or plasmid
3.8
data format
arrangement of data according to preset specifications
Note 1 to entry: Preset specifications are usually made for computer processing.
3.9
data element
single unit of data that in a certain context is considered indivisible
[SOURCE: ISO/TS 21089:2018, 3.44]
3.10
directory

list of data items, which gives itemized information enabling traceability, identification and findability of

related data

Note 1 to entry: A directory can be arranged in alphabetical, chronological or systematic order.

Formatted: Font: 11 pt
3.11
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
directory identifier
pt, Tab stops: Not at 487.6 pt
2 © ISO 2022 – All rights reserved
2 © ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/DTS 24420:2023(E)

unique language-independent sign assigned to the archive directory in the structure

3.12
gene

sequence of nucleotides in DNA or RNA encoding either an RNA or a protein product

Note 1 to entry: Genes are recognized as the basic unit of heredity.

Note 2 to entry: A gene can consist of non-contiguous nucleic acid segments that are rearranged through a nuclear

processing step.

Note 3 to entry: A gene may include or be part of an operon that includes elements for gene expression.

[SOURCE: ISO 20397-2:2021, 3.16]
3.13
identifier

sequence of characters, capable of uniquely identifying that with which it is associated, within a specified

context
[SOURCE: ISO/IEC 11179-1:2015, 3.33]
3.14
layer code
hierarchical code consisting of membership order of coded objects
3.15
metagenomics analytical data

set of elements to describe qualitative or quantitative analytical attributes of processed metagenomic

raw data
3.1615
name

semantic, natural language labels given to data elements, and variations of these labels serve different

functions
[SOURCE: ISO/IEC 11179-1:2015, 3.43]
3.1716
public attribute
attribute that can have same attribute value for different data in the directory
3.1817
quality score
Q score
Phred score
quality of base calling

measure of the probability of correct base recognition, usually expressed directly by a numerical value

Note 1 to entry: Q is defined by the following equation:
Formatted: Font: 11 pt

Q = ―10log 10 = −10log (p) Formatted: Space Before: 0 pt, Line spacing: Exactly 11

pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 3
© ISO 2023 – All rights reserved 3
---------------------- Page: 11 ----------------------
ISO/DTS.2 24420:20222023(E)
where
p is the estimated probability of the base call being wrong.

Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of

99 %.

Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of

99,9 %.

Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a

significant portion of the reads being unusable. Low quality scores may also indicate false-positive variant calls,

resulting in inaccurate conclusions.
[SOURCE: ISO 20397-2:2021, 3.32, modified ― Note 3 was added.]
3.1918
raw data

primary sequencing data produced by a sequencer without involving any software-based pre-filtering for

analysis purpose
[SOURCE: ISO 20397-2:2021, 3.21]
3.2019
relative abundance

fraction of a single microorganism operational taxonomic unit in the total microbial community of a

defined environment
Note 1 to entry: It usually represented as a percentage.
3.2120
repeatability requirement
requirement of consistency under a set of repeatable measurement conditions
3.2221
scaffold

reconstructed genomic sequence created by chaining contigs together using additional information about

the relative position and orientation of the contigs in the genome
3.2322
sequence assembly

processing, aligning and merging individual sequencing reads in order to reconstruct longer DNA

sequences, entire genes or genomes

Note 1 to entry: whenWhen sequencing a novel genome where there is no reference sequence available for

alignment, sequence reads are assembled as contigs, that is the de novo assembly.

3.2423
shotgun metagenomic sequencing
shotgun metagenomics

nucleotide sequence determination of the genomes of untargeted cells in communities in order to

determine community composition and function
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
4 © ISO 2022 – All rights reserved
4 © ISO 2023 – All rights reserved
---------------------- Page: 12 ----------------------
ISO/DTS 24420:2023(E)

Note 1 to entry: For the microbiome, shotgun metagenomics focuses on microbial communities in specific

environments.

Note 2 to entry: For shotgun metagenomic sequencing, DNA is extracted from the microbes in the sample directly

without isolation and culture. That DNA is then used to analyse the genetic composition, species classification,

phylogeny, gene function, or metabolic network or combinations thereof.
3.2524
specialized attribute
attribute that is unique for each sample in the directory
54 Processing workflow

The basic workflow of metagenomics should include sequencing, data processing and data analysis. Data

processing includes pre-processing, quality control, data assembly, data profiling and annotation, as

shown in Figure 1.
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 5
© ISO 2023 – All rights reserved 5
---------------------- Page: 13 ----------------------
ISO/DTS.2 24420:20222023(E)
Figure 1 — Workflow of metagenomic data processing
65 Data processing
6.15.1 Facilities and software requirements

5.1.1 The software pipeline for metagenomics bioinformatics shall be validated. Applications for the

pipeline should be locked down including the complete set of tools, code, operational environment, and

network connections that compose the pipeline before using it for analytical purposes such as shell (e.g.,

BASH), GNU R, and Python. Changes to any components of the pipeline require revalidation to ensure that

there is no impact in the performance characteristics of the pipeline.

5.1.2 High -performance computing technologies may be used at any step in the process to ensure

proper management and curation of large collections of complex procaryotic and eucaryotic genomes as

processing massive datasets is a prerequisite for NGS metagenomics analytics.
6.25.2 Sequence quality control and error determination

5.2.1 Raw metagenomic sequencing data shall initially be passed through a quality control (QC) process

to ensure a clean dataset. The evaluation should follow ISO 20397-1:2022, Clause 4 and 8.3, and

ISO 20397-2:20202021, 4.3.

5.2.2 The available data quality values for each DNA sample after sequencing should meet the following

requirements:

a) Q20 ≥ ≥ 90 %, above 90 % of the sample base mass value shall be more than 20;

b) Q30 ≥ ≥ 80 %, above 80 % of the sample base mass value shall be more than 30.

Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
The above requirements only apply to short sequence reads ≤ 350 bp.
pt, Tab stops: Not at 487.6 pt
6 © ISO 2022 – All rights reserved
6 © ISO 2023 – All rights reserved
---------------------- Page: 14 ----------------------
ISO/DTS 24420:2023(E)

5.2.3 For human or animal or plant or all sourced samples, the host-reads shall be removed by mapping

to a human or animal or plant genome reference, such as UniRef, Unified Human Gastrointestinal Protein

catalog. Only clean sequencing data should be used in further bioinformatic analysis.

5.2.4 Detection and elimination of repeats and sequencing errors shall be performed as the first step in

data processing. The following factors and situations shall be taken into accountconsidered in the process

of elimination: .

a) Mismatch, insertion or missing (indels) (only when a reference genome is available) and uncertain

bases (N characters)).

b) Unrecognizable sequence, which can be caused if the reads extend to the 3 'end3'end of the adaptor

when the target sequence is shorter.

c) PCR biases in the library preparation according toin accordance with ISO 20397-1:2022, 5.8.

6.35.3 Sequence assembly

5.3.1 The depth of sequencing shall be evaluated before the sequence assembly, which should take the

complexity of the sample into account.

5.3.2 Samples lacking a reference genome dataset, such as soil or ocean samples, should use sequence

assembly.

5.3.3 Contigs or scaffolds or both that are directly obtained from sequence fragments without any

reference should be regarded as de novo assemblies.

5.3.4 The selection of the sequence assembly software should depend on the relative importance of the

accuracy, contigs’ size, input data type, and available computational resources.

5.3.5 A non-redundant gene catalogue can be obtained by predicting genes from assembled contigs.

For well characterized microbiomes, e.g., human gut-borne, a credible gene catalogue (e.g., Integrated

Gene Catalog (IGC)) can be used for quick identification and quantification of data from metagenomic

sequencing.

5.3.6 Created assemblies should be evaluated to assess their quality, e.g., QUAST.

76 Data analysis
7.16.1 Annotation

6.1.1 The annotation methods should be described. The number of reference genomes selected, and

reference genomes or reference database used for the annotation should be documented.

6.1.2 Taxonomy profile methods should be chosen according to data and application needs to obtain a

higher-level taxonomy profile (e.g., species, genus, order, phylum) including metagenomic linkage groups

(MLG), metagenomic clusters (MGC) or metagenomic species (MGS).
Formatted: Font: 11 pt
Formatted: Space Before: 0 pt, Line spacing: Exactly 11
pt, Tab stops: Not at 487.6 pt
© ISO 2022 – All rights reserved 7
© ISO 2023 – All rights reserved 7
---------------------- Page: 15 ----------------------
ISO/DTS.2 24420:20222023(E)

6.1.3 Taxonomy profiling should base on reference databases, such as RefSeq complete genomes

(RefSeq CG) for microbial species and the BLAST databases for high-quality nucleotide and protein

sequences. Classification accuracy, speed, and computational requirements should be taken into account

when select taxonomic classification tools.

6.1.4 If the profile is obtained by a de novo sequence assembly method, the species information should

be identified when the alignment with a sequence similarity of more than 97 % and the coverage of more

than 90 % to the most related reference database is determined.

6.1.5 For read-based approaches, the read should do mapping to NR for taxonomy data (e.g.,

Blast,BLAST , Diamond or Last) or marker genes after read merge.

6.1.6 Metagenomic profiles should be annotated to various levels according to the reference

annotation, i.e., species, genus, or higher.
7.26.2 Calculation of species relative abundance
7.2.16.2.1 Species analysis

6.2.1.1 The relative abundance calculation method should be defined and implemented to meet the

repeatability requirement. The relative abundance calculation method shall be documented.

6.2.1.2 Calculation tools should be selected consideringwith consideration to reflect the actual relative

abundance of the target operational taxonomic unit in the sample.
7.2.26.2.2 Gene analysis

6.2.2.1 A gene abundance table can be generated using alignment-based tools or alignment-free

methods.

6.2.2.2 The relative abundance distribution at the gene level can be obtained by comparing the clean

sequencesdata to the assembled gene set or the appropriate reference database.

6.2.2.3 Superposition of the relative abundance of the gene sequence of the same species shall be done

to get the operational taxonomic unit.
87 Data archive and metadata
8.17.1 Original data

7.1.1 Sequencing data volume should be evaluated to obtain saturated gene information in

metagenome-wide association studies (MWAS).

7.1.2 Regardless of the sample source, the sequencing data format should be the same for each

sequence. The sequence data format should be stored in a standard format that can preserve the

information of biological sequences (usually nucleic acid sequences) or their sequencing quality; ISO

Formatted: Font: 11 pt
1 ®

BLAST is the trademark of a product supplied by the National Center for Biotechnology Informa

...

FINAL
TECHNICAL ISO/DTS
DRAFT
SPECIFICATION 24420
ISO/TC 276
Biotechnology — Massively parallel
Secretariat: DIN
DNA sequencing — General
Voting begins on:
2023-02-20 requirements for data processing of
shotgun metagenomic sequences
Voting terminates on:
2023-04-17
Biotechnologie — Séquençage d'ADN massivement parallèle —
Exigences générales pour le traitement des données des séquences
métagénomiques "Shotgun"
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/DTS 24420:2023(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS. © ISO 2023
---------------------- Page: 1 ----------------------
ISO/DTS 24420:2023(E)
FINAL
TECHNICAL ISO/DTS
DRAFT
SPECIFICATION 24420
ISO/TC 276
Biotechnology — Massively parallel
Secretariat: DIN
DNA sequencing — General
Voting begins on:
requirements for data processing of
shotgun metagenomic sequences
Voting terminates on:
Biotechnologie — Séquençage d'ADN massivement parallèle —
Exigences générales pour le traitement des données des séquences
métagénomiques "Shotgun"
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on

the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below

or ISO’s member body in the country of the requester.
RECIPIENTS OF THIS DRAFT ARE INVITED TO
ISO copyright office
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
CP 401 • Ch. de Blandonnet 8
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
CH-1214 Vernier, Geneva
DOCUMENTATION.
Phone: +41 22 749 01 11
IN ADDITION TO THEIR EVALUATION AS
Reference number
Email: copyright@iso.org
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO­
ISO/DTS 24420:2023(E)
Website: www.iso.org
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
Published in Switzerland
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN­
DARDS TO WHICH REFERENCE MAY BE MADE IN
© ISO 2023 – All rights reserved
NATIONAL REGULATIONS. © ISO 2023
---------------------- Page: 2 ----------------------
ISO/DTS 24420:2023(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction .................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ..................................................................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................................................................... 1

4 Processing workflow .......................................................................................................................................................................................4

5 Data processing ..................................................................................................................................................................................................... 5

5.1 Facilities and software requirements ................................................................................................................................ 5

5.2 Sequence quality control and error determination ............................................................................................... 5

5.3 Sequence assembly ............................................................................................................................................................................. 6

6 Data analysis ............................................................................................................................................................................................................ 6

6.1 Annotation ...................................................................... ............................................................................................................................ 6

6.2 Calculation of species relative abundance ..................................................................................................................... 7

6.2.1 Species analysis ................................................................................................................................................................... 7

6.2.2 Gene analysis ......................................................................................................................................................................... 7

7 Data archive and metadata ....................................................................................................................................................................... 7

7.1 Original data ............................................................................................................................................................................................. 7

7.2 Sequencing analytical data .......................................................................................................................................................... 7

7.3 Data directory and archive .......................................................................................................................................................... 8

7.3.1 General ........................................................................................................................................................................................ 8

7.3.2 Directory of data elements ........................................................................................................................................ 8

7.3.3 Data archiving ...................................................................................................................................................................... 8

7.4 Metadata ...................................................................................................................................................................................................... 8

Annex A (informative) Examples of data format .................................................................................................................................10

Annex B (informative) Directory of data elements ...........................................................................................................................16

Bibliography .............................................................................................................................................................................................................................18

iii
© ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/DTS 24420:2023(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non­governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to

the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 276, Biotechnology.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2023 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/DTS 24420:2023(E)
Introduction

Shotgun metagenomic sequencing genomes of organisms in a complex sample in a community to gain

knowledge of its composition and function is widely used in life science and clinical applications, such

as human complex disease associated analysis, environmental microecology and other fields. It has

potential to provide significant scientific data for life science research.

The utility of this technique is its ability to reveal the microbial diversity and abundance found in

microbial populations from multiple environments and to determine sequence information (taxonomic

characterization, functional annotation, and comparative analysis/metagenomics) for individual

organisms in these populations. The resulting data can be subjected to comparative analytics.

Massively parallel shotgun metagenomic sequencing generates a large amount of data containing a

high complexity of microbial genomes and a large number of unknown species. It is important to use

effective processing procedures and address quality control for shotgun metagenomic sequencing data.

A standardised data format is essential to promote data sharing.

As with any advanced technology, massively parallel sequencing technologies is error prone.

Overcoming these shortcomings to ensure a reliable sequencing and analytical outcome is important.

This document provides a uniform standard for the collation, storage and subsequent analysis of

metagenomic data, and guidelines. It provides requirements and recommendations for the workflow

and process of shotgun metagenomic analyses including quality control of sequencing data and

metadata, and the compositional and functional analysis of microbial community. These requirements

and recommendations can ensure accuracy of data generated from metagenomic analysis, address

potential errors and facilitate downstream applications.
© ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/DTS 24420:2023(E)
Biotechnology — Massively parallel DNA sequencing —
General requirements for data processing of shotgun
metagenomic sequences
1 Scope

This document illustrates the workflow of shotgun metagenomic sequence data processing of host-

derived microbiome and environmental metagenomes.

This document specifies the requirements for quality control of shotgun metagenomic sequence data

processing for massively parallel DNA sequencing.

This document provides guidelines for data directory, data archive and metadata for shotgun

metagenomic sequence data.

This document applies to data storage, sharing and interoperability of shotgun metagenomic sequence

data.

This document applies to shotgun metagenomic sequence data processing and analyses, but excludes

functional analysis.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 20397­1:2022, Biotechnology — Massively parallel sequencing — Part 1: Nucleic acid and library

preparation
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminology databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
attribute value
value associated with an attribute instance
[SOURCE: ISO 21962:2003, 1.5.2.3]
3.2
category
set of items or concepts that share a common attribute or feature
© ISO 2023 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/DTS 24420:2023(E)
3.3
classification

exhaustive set of mutually exclusive categories to aggregate data at a pre-prescribed level of

specialization for a specific purpose
[SOURCE: ISO 17115:2007, 2.7.1]
3.4
clean data

sequencing data obtained after a pre-processing procedure which usually includes multiple trimming

and filtering steps to ensure specific quality levels (e.g., per-base quality, host/contaminant sequences

removed, linkers/adaptors removed)
3.5
code

system of rule(s) to convert information such as text, images, sounds or electric, photonic or magnetic

signals into another form or representation to facilitate analysis, communication or storage in a storage

medium
[SOURCE: ISO 20691:2022, 3.6]
3.6
encoding
process of assigning code to things or concepts
3.7
contig

contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome

or plasmid
3.8
data format
arrangement of data according to preset specifications
Note 1 to entry: Preset specifications are usually made for computer processing.
3.9
data element
single unit of data that in a certain context is considered indivisible
[SOURCE: ISO/TS 21089:2018, 3.44]
3.10
directory

list of data items, which gives itemized information enabling traceability, identification and findability

of related data

Note 1 to entry: A directory can be arranged in alphabetical, chronological or systematic order.

3.11
directory identifier

unique language-independent sign assigned to the archive directory in the structure

3.12
gene

sequence of nucleotides in DNA or RNA encoding either an RNA or a protein product

Note 1 to entry: Genes are recognized as the basic unit of heredity.

Note 2 to entry: A gene can consist of non-contiguous nucleic acid segments that are rearranged through a

nuclear processing step.
© ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/DTS 24420:2023(E)

Note 3 to entry: A gene may include or be part of an operon that includes elements for gene expression.

[SOURCE: ISO 20397­2:2021, 3.16]
3.13
identifier

sequence of characters, capable of uniquely identifying that with which it is associated, within a

specified context
[SOURCE: ISO/IEC 11179­1:2015, 3.33]
3.14
analytical data

set of elements to describe qualitative or quantitative analytical attributes of processed metagenomic

raw data
3.15
name

semantic, natural language labels given to data elements, and variations of these labels serve different

functions
[SOURCE: ISO/IEC 11179­1:2015, 3.43]
3.16
public attribute
attribute that can have same attribute value for different data in the directory
3.17
quality score
Q score
Phred score
quality of base calling

measure of the probability of correct base recognition, usually expressed directly by a numerical value

Note 1 to entry: Q is defined by the following equation:
Q = −10log (p)
where p is the estimated probability of the base call being wrong.

Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of

99 %.

Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of

99,9 %.

Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a

significant portion of the reads being unusable. Low quality scores may also indicate false-positive variant calls,

resulting in inaccurate conclusions.
[SOURCE: ISO 20397-2:2021, 3.32, modified ― Note 3 was added.]
3.18
raw data

primary sequencing data produced by a sequencer without involving any software-based pre-filtering

for analysis purpose
[SOURCE: ISO 20397­2:2021, 3.21]
© ISO 2023 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/DTS 24420:2023(E)
3.19
relative abundance

fraction of a single microorganism operational taxonomic unit in the total microbial community of a

defined environment
Note 1 to entry: It usually represented as a percentage.
3.20
repeatability requirement
requirement of consistency under a set of repeatable measurement conditions
3.21
scaffold

reconstructed genomic sequence created by chaining contigs together using additional information

about the relative position and orientation of the contigs in the genome
3.22
sequence assembly

processing, aligning and merging individual sequencing reads in order to reconstruct longer DNA

sequences, entire genes or genomes

Note 1 to entry: When sequencing a novel genome where there is no reference sequence available for alignment,

sequence reads are assembled as contigs, that is the de novo assembly.
3.23
shotgun metagenomic sequencing
shotgun metagenomics

nucleotide sequence determination of the genomes of untargeted cells in communities in order to

determine community composition and function

Note 1 to entry: For the microbiome, shotgun metagenomics focuses on microbial communities in specific

environments.

Note 2 to entry: For shotgun metagenomic sequencing, DNA is extracted from the microbes in the sample directly

without isolation and culture. That DNA is then used to analyse the genetic composition, species classification,

phylogeny, gene function, or metabolic network or combinations thereof.
3.24
specialized attribute
attribute that is unique for each sample in the directory
4 Processing workflow

The basic workflow of metagenomics should include sequencing, data processing and data analysis.

Data processing includes pre-processing, quality control, data assembly, data profiling and annotation,

as shown in Figure 1.
© ISO 2023 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/DTS 24420:2023(E)
Figure 1 — Workflow of metagenomic data processing
5 Data processing
5.1 Facilities and software requirements

5.1.1 The software pipeline for metagenomics bioinformatics shall be validated. Applications for the

pipeline should be locked down including the complete set of tools, code, operational environment, and

network connections that compose the pipeline before using it for analytical purposes such as shell

(e.g., BASH), GNU R, and Python. Changes to any components of the pipeline require revalidation to

ensure that there is no impact in the performance characteristics of the pipeline.

5.1.2 High-performance computing technologies may be used at any step in the process to ensure

proper management and curation of large collections of complex procaryotic and eucaryotic genomes

as processing massive datasets is a prerequisite for NGS metagenomics analytics.
5.2 Sequence quality control and error determination

5.2.1 Raw metagenomic sequencing data shall initially be passed through a quality control (QC)

process to ensure a clean dataset. The evaluation should follow ISO 20397­1:2022, Clause 4 and 8.3, and

ISO 20397­2:2021, 4.3.

5.2.2 The available data quality values for each DNA sample after sequencing should meet the

following requirements:
a) Q20 ≥ 90 %, above 90 % of the sample base mass value shall be more than 20;
b) Q30 ≥ 80 %, above 80 % of the sample base mass value shall be more than 30.
The above requirements only apply to short sequence reads ≤ 350 bp.
© ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/DTS 24420:2023(E)

5.2.3 For human or animal or plant or all sourced samples, the host-reads shall be removed by mapping

to a human or animal or plant genome reference, such as UniRef, Unified Human Gastrointestinal

Protein catalog. Only clean data should be used in further bioinformatic analysis.

5.2.4 Detection and elimination of repeats and sequencing errors shall be performed as the first

step in data processing. The following factors a
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.