Genomics informatics - Requirement of data analysis for direct-to-consumer testing

This document specifies detection SNP site and evaluation metrics for direct-to-consumer detection. This document specifies the requirement for the database that used for direct-to-consumer detection. This document specifies the elements of assessment reports. This document provides data analysis process requirements for genotyping arrays and genome sequencing. This document applies to the analysis of genetic data for direct-to-consumer in vitro diagnostics products without the involvement of a health care provider.

Titre manque

General Information

Status
Not Published
Current Stage
5020 - FDIS ballot initiated: 2 months. Proof sent to secretariat
Start Date
15-Dec-2025
Completion Date
15-Dec-2025
Ref Project

Overview - ISO/DTS 20738 (Genomics informatics for DTC testing)

ISO/DTS 20738 specifies data analysis requirements for direct‑to‑consumer (DTC) genetic testing products and services. It covers preprocessing, quality control, detection SNP sites, evaluation models and databases, report elements, and analytical workflows for both genotyping arrays (DNA chips) and whole genome sequencing (WGS). The draft applies specifically to DTC in vitro diagnostics delivered without a health care provider and aims to improve consistency, transparency and consumer confidence in DTC genetic results.

Key topics and technical requirements

  • Data analysis workflows
    • DNA chip: preprocessing, genotype calling, quality evaluation, genotype imputation and cluster analysis.
    • WGS: sequencing QC, alignment and deduplication, variant calling (hcWGS) or imputation (lcWGS), variant QC, annotation, interpretation and manual confirmation.
  • File formats and raw data
    • FASTQ for raw sequencing reads; VCF for variant output (Annex A provides VCF example).
    • Raw genotype files should include RSID, chromosome, position and allele calls, with metadata (platform, reference genome, run date).
  • Quality control thresholds
    • DNA chip: high‑density arrays (>600 000 markers) should have sample call rate ≥ 0.98; single‑site detection rate in a batch ≥ 0.8. (See Annex C for call‑rate thresholds by application.)
    • WGS: paired‑end reads ≥ 100 bp; filtered data QC metrics include Q20 ratio ≥ 90%, Q30 ratio ≥ 80%, and GC content ~40–45%. Sequencing depth guidance: high‑coverage WGS > 20× (clinical often >100×); low‑coverage WGS ~0.5–6×.
  • Genotype imputation and reference data
    • Imputation with appropriate reference haplotype panels is required for low‑coverage data or array gaps; reference sequences (e.g., GRCh38) must be used for alignment and reporting.
  • Evaluation model, databases and reporting
    • Requirements for evaluation models and annotation databases (informative Annex B) and mandatory elements of consumer assessment reports, plus provisions for use and disclosure of consumer data.

Applications and who should use this standard

  • DTC genetic test providers and marketplaces
  • Clinical and commercial sequencing laboratories offering consumer products
  • Bioinformatics pipelines and software developers for genotyping arrays and WGS
  • Quality assurance, regulatory and compliance teams assessing DTC product claims
  • Test developers preparing consumer‑facing interpretation and reporting workflows

ISO/DTS 20738 helps organizations ensure robust QC, transparent reporting and consistent interpretation practices for consumer genomics services.

Related standards

  • ISO/TC 215 (Health informatics), Subcommittee SC 1 - Genomics Informatics
  • Standards referenced in the draft: ISO 20397‑2:2021, ISO 16577:2022, ISO/IEC 23092‑2 (as cited)

Keywords: ISO/DTS 20738, genomics informatics, direct‑to‑consumer testing, DTC, genotyping arrays, whole genome sequencing, VCF, FASTQ, quality control, genotype imputation, variant calling.

Draft
ISO/DTS 20738 - Genomics informatics — Requirement of data analysis for direct-to-consumer testing Released:12/1/2025
English language
14 pages
sale 15% off
sale 15% off
Draft
REDLINE ISO/DTS 20738 - Genomics informatics — Requirement of data analysis for direct-to-consumer testing Released:12/1/2025
English language
14 pages
sale 15% off
sale 15% off

Standards Content (Sample)


FINAL DRAFT
Technical
Specification
ISO/TC 215/SC 1
Genomics informatics —
Secretariat: KATS
Requirement of data analysis for
Voting begins on:
direct-to-consumer testing
2025-12-15
Voting terminates on:
2026-02-09
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO­
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
Reference number
FINAL DRAFT
Technical
Specification
ISO/TC 215/SC 1
Genomics informatics —
Secretariat: KATS
Requirement of data analysis for
Voting begins on:
direct-to-consumer testing
Voting terminates on:
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
© ISO 2025
IN ADDITION TO THEIR EVALUATION AS
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO­
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
or ISO’s member body in the country of the requester.
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Data analysis process . 3
5 Quality control of raw data . 4
5.1 DNA chip data preprocessing and quality control requirements .4
5.1.1 Data preprocessing .4
5.1.2 Data quality control .5
5.2 Whole genome sequencing quality requirements.5
5.2.1 Sequencing type and data quality . .5
5.2.2 Sequencing data comparison and quality control .6
6 Evaluation model and database . 6
6.1 DNA chip analysis requirements .6
6.1.1 DNA chip selection .6
6.1.2 Genotyping analysis .6
6.1.3 Genotype imputation analysis .7
6.2 WGS analysis requirements .7
6.2.1 Variant detection, genotype imputation and quality control .7
6.2.2 Variant site annotation .7
6.2.3 Interpretation of variation .8
7 Evaluation report . 8
7.1 Interpretation .8
7.2 Use and disclosure of data .8
Annex A (normative) VCF format file example . 9
Annex B (informative) Annotation databases .11
Annex C (informative) Call rate thresholds by application .12
Bibliography .13

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee SC 1,
Genomics Informatics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
With increasing people’s awareness of their right to know their own body and of the need for disease
prevention, prediction, participation and personalized treatment, and with the rapid development of
sequencing technology, genetic testing has expanded from clinical application to general consumer
application. Direct-to-customer (DTC) testing refers to genetic testing that individuals can order without
needing a clinician or a health care provider. These tests typically analyze DNA from a sample ‒ often saliva
‒ to provide insights into various genetic traits.
DTC tests cover a wide range of genetic analyses, including ancestry and heritage (understanding ethnic
background and lineage), health and disease risk (identifying genetic predispositions to conditions such as
cancer or heart disease), traits and lifestyle (examining genetic influences on taste preferences, hair loss,
or lactose digestion), pharmacogenomics (assessing how genetic variations affect drug metabolism). DTC
testing improves the awareness and attention to certain diseases, and it allows to adjust existing precaution
under the guidance of professionals. It provides the necessary basis for the formation of personalized disease
prevention programs. As an increasing prevalent commonality that connects clinical care and lifestyle, DTC
testing has grown enormously both in practical and expected use, becoming more and more indispensable
in the genetic testing ecosystem.
This document is based on current DTC industry data, combined with the needs of upstream and downstream
industry users. It puts forward general requirements and suggestions on the data and technical content of
genotype imputation technology, analysis and interpretation of results, as well as specific requirements in
the development of a supporting evaluation model and database. With this document’s specifications as the
basis of data analysis in the development of DTC testing products and services, consumers can have greater
confidence in the conclusions drawn from the data, thereby facilitating greater confidence in DTC testing.

v
FINAL DRAFT Technical Specification ISO/DTS 20738:2025(en)
Genomics informatics — Requirement of data analysis for
direct-to-consumer testing
1 Scope
This document specifies the requirements for genetic data analysis relating to direct-to-consumer (DTC)
testing, including preprocessing, detection site, evaluation models, the use of databases and the elements of
assessment reports.
This document applies to the analysis of genetic data from DTC testing without the involvement of a health
care provider.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
coverage
coverage depth
number of times that a given base position is read in a sequencing run
Note 1 to entry: The number of reads that cover a particular position.
[SOURCE: ISO 20397-2:2021 3.6]
3.2
DNA chip
DNA microarray
solid substrate where a collection of probe DNA arranged in a specific design is attached in a high-density
fashion, directly or indirectly, that assays large amounts of biological material using high-throughput
screening methods
[SOURCE: ISO 16577:2022 3.4.13]
3.3
direct-to-customer
DTC
retail business model which eliminates any intermediaries and sells direct to consumer
Note 1 to entry: Also referred to as business to consumer (B2C).
Note 2 to entry: The sample, blood, saliva, cheek swab (cells from buccal cavity), fecal matter, nail clipping, are
provided by the consumer in assumed accordance with the collection protocol provided by the business.

3.4
FASTQ
genomic information representation that includes FASTA and quality values
[SOURCE: ISO/IEC 23092-2:2024, 3.8]
3.5
GC content
proportion of guanine and cytosine in a DNA molecule
3.6
genotype imputation
computational process to infer unobserved or missing genotypes in sequencing/genotyping data
Note 1 to entry: Using statistical models (e.g. hidden Markov models) and reference haplotype panels (e.g. 1 000
Genomes Project, TOPMed), imputation predicts missing variants by leveraging linkage disequilibrium (LD) patterns.
Common tools include IMPUTE2, Minimac, and BGI-lowpass.
Note 2 to entry: The output is typically a completed genomic variant dataset. This step is critical for enhancing data
utility in low-coverage whole genome sequencing (lcWGS) or genome-wide association studies (GWAS).
3.7
haplotype
combination of alleles at multiple sites that are inherited together on the same chromosome
3.8
InDel
insertion or deletion, or both, that occurs at a certain position in the genome
Note 1 to entry: InDel length is less than 50 bp.
3.9
quality score
Q score
Phred score
quality of base calling
measure of the probability of correct base recognition, usually expressed directly by a numerical value
Note 1 to entry: Q score is defined by the following formula:
Q = −10log (p)
where p is the estimated probability of the base call being wrong.
Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of 99 %.
Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of
99,9 %.
Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a
significant portion of the reads being unusable. Low quality scores can also indicate false-positive variant calls,
resulting in inaccurate conclusions.
3.10
sequencing depth
average number of times a nucleotide in a genome has been sequenced
Note 1 to entry: It is calculated by dividing the total number of sequenced bases in the aligned genome by the total
number of bases in the genome (excluding N).

3.11
whole genome sequencing
WGS
process that determines the complete DNA sequence of a human’s genome, including all 23 chromosome
pairs and mitochondrial DNA
Note 1 to entry: While performed through a coordinated workflow, current next-generation sequencing systems
cannot process the entire genome in a single run. The DNA is fragmented, sequenced in sections, and computationally
reconstructed using bioinformatics tools to assemble the complete genomic sequence.
Note 2 to entry: WGS is divided into high-coverage whole genome sequencing (hcWGS) and low-coverage whole
genome sequencing (lcWGS) according to the amount of sequencing.
Note 3 to entry: High-coverage WGS has sequencing depth >20×, while the coverage of clinical grade WGS is usually
>100×.
Note 4 to entry: For low-coverage WGS: 0,5× ≤ sequencing depth ≤ 6×.
4 Data analysis process
4.1 The integrity of the sample provided should be checked and verified prior to performing the analysis.
4.2 The data analysis process supported by the DNA chip shall include data preprocessing, genotype
calling, quality evaluation (quality assurance or quality control), genotype imputation, cluster analysis.
4.3 The WGS data analysis process shall include sequencing data quality control, compare and
deduplication, comparison quality control, variant calling (hcWGS) or genotype imputation (lcWGS),
variation quality control, variant annotation, variant interpretation and variation manual confirmation,
shown in Figure 1.
Figure 1 — Analysis and interpretation process based on WGS
5 Quality control of raw data
5.1 DNA chip data preprocessing and quality control requirements
5.1.1 Data preprocessing
5.1.1.1 The original data format of the DNA chip shall be subject to the chip manufacturer. Individual
data should be converted into VCF files or raw genotype data files for subsequent analysis. File formats are
provided as 5.1.1.2 and 5.1.1.3.
5.1.1.2 When converting chip data from raw data to variant call format(VCF) files or raw genotype
data files, cluster analysis should be used. The reference data used in the cluster analysis should be the
target population data of the detection service. The source of the reference data should be explained to the
consumer so there is a clear understanding of the relative nature of the results.

5.1.1.3 The raw genotype data file shall consist of four columns, including RSID (Reference SNP Cluster
ID), chromosome, the position on the chromosome, and a pair of bases. The raw genotype data file shall
explain the detection platform, detection time, reference genome sequence and other information in the
form of comments at the beginning of the file.
5.1.1.4 VCF file format requ
...


© ISO #### – All rights reserved
ISO #####-#:####(X)/DTS 20738
ISO/TC ###/ 215/SC ##/WG # 1
Secretariat: XXXX KATS
Date: 2025-12-01
Genomics informatics — Requirements — Requirement of data
analysis for direct-to-consumer testing

WD/CD/DIS/FDIS stage
Warning for WDs and CDs
This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change
without notice and may not be referred to as an International Standard.
Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of which
they are aware and to provide supporting documentation.

ISO #####-#:####(X/DTS 20738:(en)
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying,
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO
at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
Fax: +41 22 749 09 47
EmailE-mail: copyright@iso.org
Website: www.iso.orgwww.iso.org
Published in Switzerland
© ISO #### 2025 – All rights reserved
ii
ISO #####-#:####(X/DTS 20738:(en)
Contents
Foreword . iv
Introduction . v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Data analysis process . 3
5 Quality control of raw data . 5
5.1 DNA chip data preprocessing and quality control requirements . 5
5.2 Whole genome sequencing quality requirements . 6
6 Evaluation model and database . 7
6.1 DNA chip analysis requirements . 7
6.2 WGS analysis requirements . 8
7 Evaluation report . 9
7.1 Interpretation . 9
7.2 Use and disclosure of data . 9
Annex A (normative) VCF format file example . 11
Annex B (informative) Annotation databases . 13
Annex C (informative) Call rate thresholds by application . 14
Bibliography . 15

© ISO #### 2025 – All rights reserved
iii
ISO #####-#:####(X/DTS 20738:(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent rights
in respect thereof. As of the date of publication of this document, ISO [had/had not] received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that this
may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO'sISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee SC 1,
Genomics Informatics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
© ISO #### 2025 – All rights reserved
iv
ISO #####-#:####(X/DTS 20738:(en)
Introduction
With the improvementincreasing people’s awareness of human'stheir right to know their own body and of the
need for disease prevention, prediction, participation and personalized treatment, and with the rapid
development of sequencing technology, genetic testing has expanded from clinical application to general
consumer application. Direct-to-customer (DTC) testing refers to genetic testing that individuals can order
without needing a clinician or a health care provider. These tests typically analyze DNA from a sample -‒ often
saliva -‒ to provide insights into various genetic traits.
DTC tests cover a wide range of genetic analyses, including: ancestry &and heritage – (understanding ethnic
background and lineage. Health & ), health and disease risk – (identifying genetic predispositions to conditions
likesuch as cancer or heart disease. Traits &), traits and lifestyle – (examining genetic influences on taste
preferences, hair loss, or lactose digestion. Pharmacogenomics – ), pharmacogenomics (assessing how genetic
variations affect drug metabolism. It). DTC testing improves the awareness and attention to certain diseases
of subjects, and adjustsit allows to adjust existing precaution under the guidance of professionals. It provides
the necessary basis for the formation of personalized disease prevention programs. As an increasing prevalent
commonality that connects clinical care and lifestyle, DTC testing has grown enormously both in practical and
expected use, becoming more and more indispensable in the genetic testing ecosystem.
This document is based on current DTC industry data, combined with the needs of upstream and downstream
industry users. It puts forward general requirements as well asand suggestions on the data and technical
content of genotype imputation technology, analysis and interpretation of results, as well as specific
requirements in the development of a supporting evaluation model and database. With this document’s
specifications as the basis of data analysis in the development of DTC genetic testing products and services,
consumers can have greater confidence in the conclusions drawn from the data, thereby facilitating greater
confidence in DTC genetic testing.
© ISO #### 2025 – All rights reserved
v
Genomics informatics — Requirement of data analysis for direct-to-
consumer testing
1 Scope
This document specifies the requirements for genetic data analysis relating to direct-to-consumer (DTC)
testing, including preprocessing, detection site, evaluation models, the use of databases and the elements of
assessment reports.
This document applies to the analysis of genetic data from direct-to-consumerDTC testing without the
involvement of a health care provider.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— — ISO Online browsing platform: available at https://www.iso.org/obp
— — IEC Electropedia: available at https://www.electropedia.org/
3.1 3.1
coverage
coverage depth
number of times that a given base position is read in a sequencing run
Note 1 to entry: The number of reads that cover a particular position.
[SourceSOURCE: ISO 20397-2:2021 3.6]
3.2 3.2
DNA chip
DNA microarray
solid substrate where a collection of probe DNA arranged in a specific design is attached in a high-density
fashion, directly or indirectly, that assays large amounts of biological material using high-throughput
screening methods
[SourceSOURCE: ISO 16577:2022 3.4.13]
3.3 3.3
direct-to-customer
DTC
retail business model which eliminates any intermediaries and sells direct to consumer
Note 1 to entry: Also referreferred to as business to consumer (B2C).
Note 2 to entry: The sample, blood, saliva, cheek swab (cells from buccal cavity), fecal matter, nail clipping, are provided
by the consumer in assumed accordance with the collection protocol provided by the business.
© ISO #### 2025 – All rights reserved
3.4 3.4
FASTQ
genomic information representation that includes FASTA and quality values
[SourceSOURCE: ISO/IEC 23092-2:20192024, 3.8]
3.5 3.5
GC content
proportion of guanine and cytosine in a DNA molecule
3.6 3.4
genotype imputation
computational process to infer unobserved or missing genotypes in sequencing/genotyping data
Note 1 to entry1: entry: Using statistical models (e.g.,. hidden Markov models) and reference haplotype panels (e.g., 1000.
1 000 Genomes Project, TOPMed), imputation predicts missing variants by leveraging linkage disequilibrium (LD)
patterns. Common tools include IMPUTE2, Minimac, and BGI-lowpass.
Note 2 to entry2: entry: The output is typically a completed genomic variant dataset. This step is critical for enhancing
data utility in low-coverage WGSwhole genome sequencing (lcWGS) or genome-wide association studies (GWAS).
3.7 3.5
haplotype
combination of alleles at multiple sites that are inherited together on the same chromosome
3.8 3.6
InDel
insertion or deletion, or both, that occurs at a certain position in the genome
Note 1 to entry: InDel length is less than 50 bp.
3.9 3.7
quality score
Q score
Phred score
quality of base calling
measure of the probability of correct base recognition, usually expressed directly by a numerical value
Note 1 to entry: Q score is defined by the following equationformula:
Q = −10log10(p)
where p is the estimated probability of the base call being wrong.
Note 2 to entry: A quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of 99 %.
Note 3 to entry: A quality score of 30 represents an error rate of 1 in 1 000, with a corresponding call accuracy of 99,9 %.
Note 4 to entry: Higher quality scores indicate a smaller probability of error. Lower quality scores can result in a
significant portion of the reads being unusable. Low quality scores maycan also indicate false-positive variant calls,
resulting in inaccurate conclusions.
3.10 3.8
sequencing depth
average number of times a nucleotide in a genome has been sequenced
© ISO #### 2025 – All rights reserved
Note 1 to entry: It is calculated by dividing the total number of sequenced bases in the aligned genome by the total
number of bases in the genome (excluding N).
3.11 3.9
whole genome sequencing
WGS
process that determines the complete DNA sequence of a human'shuman’s genome, including all
23 chromosome pairs and mitochondrial DNA
Note 1 to entry: While performed through a coordinated workflow, current next-generation sequencing systems cannot
process the entire genome in a single run. The DNA is fragmented, sequenced in sections, and computationally
reconstructed using bioinformatics tools to assemble the complete genomic sequence.
Note 2 to entry: WGS is divided into high-coverage whole genome sequencing (hcWGS) and low-coverage whole genome
sequencing (lcWGS) according to the amount of sequencing.
Note3 Note 3 to entry: High-coverage whole genome sequencing:WGS has sequencing depth > 20X,20×, while the
coverage of clinical grade coverage WGS is usually >100X100×.
Note4 Note 4 to entry: Low For low-coverage whole genome sequencingWGS: 0.5X ≤,5× ≤ sequencing depth ≤ 6X. ≤ 6×.
4 Data analysis process
4.1 The integrity of the sample provided should be checked and verified prior to performing the analysis.
4.2 The data analysis process supported by the DNA chip shall include data preprocessing, genotype calling,
quality evaluation (quality assurance or quality control), genotype imputation, cluster analysis.
4.3 The WGS data analysis process shall include sequencing data quality control, compare and
deduplication, comparison quality control, variant calling (hcWGS) or genotype imputation (lcWGS), variation
quality control, variant annotation, variant interpretation and variation manual confirmation, shown in
Figure 1Figure 1.
© ISO #### 2025 – All rights reserved
© ISO #### 2025 – All rights reserved
Figure 1 — Analysis and interpretation process based on WGS
5 Quality control of raw data
5.1 DNA chip data preprocessing and quality control requirements
5.1.1 Data preprocessing
5.1.1.1 The original data format of the DNA chip shall be subject to the chip manufacturer. Individual data
should be converted into VCF files or raw genotype data files for subsequent analysis. File formats are
provided as 5.1.1.25.1.1.2 and 5.1.1.35.1.1.3.
5.1.1.2 When converting chip data from raw data to variant call format(VCF) files or raw genotype data
files, cluster analysis should be used. The reference data used in the cluster analysis should be the target
population data of the detection service. The source of the reference data should be explained to the consumer
so there is a clear understanding of the relative nature of the results.
© ISO #### 2025 – All rights reserved
5.1.1.3 The raw genotype data file shall consist of four columns, including RSID (Reference SNP Cluster ID),
chromosome, the position on the chromosome, and a pair of bases. The raw genotype data file shall explain
the detection platform, detection time, reference genome sequence and other information in the form of
comments at the beginning of the file.
5.1.1.4 VCF file format requirements shall conform with Annex AAnnex A.
5.1.2 Data quality control
5.1.2.1 For DNA chips spanning marker densities can be ranging from dozens to over 600 000 genome sites.
High-density DNA chips (>(> 600 000 markers) shall have a sample call rate ≥ 0,98. Targeted chips (<
(< 600 000 markers) shall meet call rate thresholds appropriate to their designed purpose, see Table C.1Table
C.1 for recommended minimum call rates by chip type.
5.1.2.2 The detection rate of a single site in the same batch of samples shall not be lower than 0,8.
5.1.2.3 The international general human nucleic acid database shall be used as the reference sequence for
comparison. For example, the latest version of the human genome reference sequence (Genome Reference
1)
Consortium Human Build 38, GRCh38) published by NCBI (National Center for Biotechnology Information)
should be used as the reference sequence for human samples.
5.2 Whole genome sequencing quality requirements
5.2.1 Sequencing type and data quality
5.2.1.1 Sequence files shall be in FASTQ format for subsequent analysis.
Example
EXAMPLE
@SEQ_ID_1 (Header line: unique identifier and metadata)
GATTTGGGGTTCAA. (Nucleotide sequence line: DNA bases in order)
+ (Optional separator line: may repeat header or be blank)
!''*((((***+. (Quality score line: symbols encode base-calling accuracy; e.g., "!"=Phred 0, """=Phred 2, "*"=Phred 10)
5.2.1.2 The international general human nucleic acid database shall be used as the reference sequence for
comparison. For example, the latest version of the human genome reference sequence (GRCh38) published by
NCBI should be used as the reference sequence for human samples.
5.2.1.3 Paired-end sequencing should be used for WGS, and the read length should not be no shorter than
100 bp.
5.2.1.4 The raw sequencing data shall be filtered (e.g.,. removing sequencing adapters and low-quality
bases) for subsequent analysis. The quality control parameters of the filtered data shall include the number of
sequenced bases, the ratio of Q20 and Q30, and GC content should be in
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

Frequently Asked Questions

ISO/DTS 20738 is a draft published by the International Organization for Standardization (ISO). Its full title is "Genomics informatics - Requirement of data analysis for direct-to-consumer testing". This standard covers: This document specifies detection SNP site and evaluation metrics for direct-to-consumer detection. This document specifies the requirement for the database that used for direct-to-consumer detection. This document specifies the elements of assessment reports. This document provides data analysis process requirements for genotyping arrays and genome sequencing. This document applies to the analysis of genetic data for direct-to-consumer in vitro diagnostics products without the involvement of a health care provider.

This document specifies detection SNP site and evaluation metrics for direct-to-consumer detection. This document specifies the requirement for the database that used for direct-to-consumer detection. This document specifies the elements of assessment reports. This document provides data analysis process requirements for genotyping arrays and genome sequencing. This document applies to the analysis of genetic data for direct-to-consumer in vitro diagnostics products without the involvement of a health care provider.

ISO/DTS 20738 is classified under the following ICS (International Classification for Standards) categories: 35.240.80 - IT applications in health care technology. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase ISO/DTS 20738 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.