ISO/IEC FDIS 23092-2
(Main)Technologies de l'information -- Représentation des informations génomiques
General Information
RELATIONS
Standards Content (sample)
DRAFT INTERNATIONAL STANDARD
ISO/IEC DIS 23092-2
ISO/IEC JTC 1/SC 29 Secretariat: JISC
Voting begins on: Voting terminates on:
2020-01-17 2020-04-10
Information technology — Genomic information
representation —
Part 2:
Coding of genomic information
Technologies de l'information — Représentation des informations génomiques —
Partie 2: Codage des informations génomiques
ICS: 35.040.99
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/IEC DIS 23092-2:2020(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION. ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC DIS 23092-2:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/DIS 23092-2:2020(E)
2 Contents
3 Foreword ......................................................................................................................................................................... ix
4 Introduction..................................................................................................................................................................... x
5 1 Scope ............................................................................................................................................................................. 1
6 2 Normative references ............................................................................................................................................. 1
7 3 Terms and definitions ............................................................................................................................................ 1
8 4 Abbreviations ............................................................................................................................................................ 6
9 5 Conventions................................................................................................................................................................ 6
10 5.1 General ................................................................................................................................................................... 6
11 5.2 Arithmetic operators ........................................................................................................................................ 6
12 5.3 Logical operators ................................................................................................................................................ 6
13 5.4 Relational operators ......................................................................................................................................... 7
14 5.5 Bit-wise operators .............................................................................................................................................. 7
15 5.6 Assignment operators....................................................................................................................................... 7
16 5.7 Range notation .................................................................................................................................................... 8
17 5.8 Mathematical functions .................................................................................................................................... 8
18 5.9 Order of operation precedence ..................................................................................................................... 8
19 5.10 Variables, syntax elements and tables ........................................................................................................ 9
20 5.11 Text description of logical operators ....................................................................................................... 10
21 5.12 Processes ............................................................................................................................................................ 11
22 6 Syntax and semantics .......................................................................................................................................... 12
23 6.1 Method of specifying syntax in tabular form ......................................................................................... 12
24 6.2 Bit ordering ....................................................................................................................................................... 13
25 6.3 Specification of syntax functions and data types ................................................................................. 13
26 6.4 Semantics ........................................................................................................................................................... 14
27 7 Data structures ...................................................................................................................................................... 14
28 7.1 Data unit ............................................................................................................................................................. 14
29 7.2 Raw reference ................................................................................................................................................... 15
30 7.2.1 Syntax and semantics ..................................................................................................................................... 16
31 7.3 Parameter set.................................................................................................................................................... 16
32 7.3.1 Syntax and semantics ..................................................................................................................................... 16
33 7.3.2 Encoding parameters ..................................................................................................................................... 16
34 7.4 Access unit ......................................................................................................................................................... 23
35 7.4.1 Syntax and semantics ..................................................................................................................................... 23
36 7.4.2 Access unit types .............................................................................................................................................. 26
37 8 Descriptors .............................................................................................................................................................. 27
38 9 Sequencing reads .................................................................................................................................................. 30
39 9.1 Supported symbols ......................................................................................................................................... 30
40 9.2 Paired-end reads ............................................................................................................................................. 32
41 9.3 Reverse-complement reads ......................................................................................................................... 32
42 9.4 Data classes........................................................................................................................................................ 32
43 9.5 Aligned data ....................................................................................................................................................... 33
44 9.6 Unaligned data .................................................................................................................................................. 34
45 10 Decoding process .................................................................................................................................................. 35
46 10.1 General ................................................................................................................................................................ 35
47 10.2 dataset_type = 0 or 1 ....................................................................................................................................... 35
48 10.2.1 References padding ..................................................................................................................................... 35
49 10.2.2 Type 1 AU (Class P) ...................................................................................................................................... 36
50 10.2.3 Type 2 AU (Class N) ...................................................................................................................................... 37
51 10.2.4 Type 3 AU (Class M) ..................................................................................................................................... 37
© ISO 2020 – All rights reserved---------------------- Page: 3 ----------------------
ISO/IEC DIS 23092-2
1 10.2.5 Type 4 AU (Class I) ....................................................................................................................................... 38
2 10.2.6 Type 5 AU (Class HM) .................................................................................................................................. 40
3 10.2.7 Type 6 AU (Class U) ...................................................................................................................................... 40
4 10.3 dataset_type = 2 ................................................................................................................................................ 41
5 10.3.1 Type 1 AU......................................................................................................................................................... 41
6 10.3.2 Type 2 AU......................................................................................................................................................... 42
7 10.3.3 Type 3 AU......................................................................................................................................................... 42
8 10.3.4 Type 4 AU......................................................................................................................................................... 42
9 10.3.5 Type 6 AU......................................................................................................................................................... 43
10 10.4 Genomic descriptors ...................................................................................................................................... 43
11 10.4.1 pos ...................................................................................................................................................................... 43
12 10.4.2 rcomp ................................................................................................................................................................ 44
13 10.4.3 flags ................................................................................................................................................................... 45
14 10.4.4 mmpos .............................................................................................................................................................. 45
15 10.4.5 mmtype ............................................................................................................................................................ 48
16 10.4.6 clips ................................................................................................................................................................... 52
17 10.4.7 ureads ............................................................................................................................................................... 55
18 10.4.8 rlen..................................................................................................................................................................... 55
19 10.4.9 pair..................................................................................................................................................................... 57
20 10.4.10 mscore ........................................................................................................................................................... 65
21 10.4.11 mmap.............................................................................................................................................................. 66
22 10.4.12 msar ................................................................................................................................................................ 69
23 10.4.13 rtype ............................................................................................................................................................... 70
24 10.4.14 rgroup ............................................................................................................................................................ 71
25 10.4.15 qv ..................................................................................................................................................................... 72
26 10.4.16 rname ............................................................................................................................................................. 76
27 10.4.17 rftp ................................................................................................................................................................... 76
28 10.4.18 rftt .................................................................................................................................................................... 77
29 10.4.19 tokentype descriptors .............................................................................................................................. 78
30 10.5 sequence ............................................................................................................................................................. 86
31 10.5.1 Aligned reads (Classes P, N, M, I, HM) .................................................................................................... 86
32 10.5.2 Unmapped reads (Class HM, U) ............................................................................................................... 87
33 10.6 e-cigar .................................................................................................................................................................. 88
34 10.6.1 Syntax ............................................................................................................................................................... 88
35 10.6.2 Decoding process for the first alignment............................................................................................. 89
36 10.6.3 Decoding process for other alignments ............................................................................................... 96
37 10.6.4 Reference transformation ......................................................................................................................... 96
38 11 Representation of reference sequences ....................................................................................................... 98
39 11.1 External reference .......................................................................................................................................... 98
40 11.2 Embedded reference ...................................................................................................................................... 98
41 11.3 Computed reference ....................................................................................................................................... 98
42 11.3.1 General ............................................................................................................................................................. 99
43 11.3.2 Reference transformation ......................................................................................................................... 99
44 11.3.3 PushIn ............................................................................................................................................................. 100
45 11.3.4 Local assembly ............................................................................................................................................ 101
46 11.3.5 Global assembly .......................................................................................................................................... 102
47 12 Block payload parsing process ....................................................................................................................... 102
48 12.1 General .............................................................................................................................................................. 102
49 12.2 Inverse binarizations ................................................................................................................................... 103
50 12.2.1 Binary (BI) .................................................................................................................................................... 104
51 12.2.2 Truncated Unary (TU) .............................................................................................................................. 104
52 12.2.3 Exponential Golomb (EG) ........................................................................................................................ 104
53 12.2.4 Truncated Exponential Golomb (TEG) ................................................................................................ 105
54 12.2.5 Signed Truncated Exponential Golomb (STEG) ............................................................................... 105
55 12.2.6 Split Unit-wise Truncated Unary (SUTU) ........................................................................................... 105
© ISO 2020 – All rights reserved---------------------- Page: 4 ----------------------
ISO/DIS 23092-2:2020(E)
1 12.2.7 Signed Split Unit-wise Truncated Unary (SSUTU) .......................................................................... 106
2 12.2.8 Double Truncated Unary (DTU) ............................................................................................................ 106
3 12.2.9 Signed Double Truncated Unary (SDTU) ........................................................................................... 107
4 12.3 Decoder Configuration ................................................................................................................................ 107
5 12.3.1 Sequences and quality values ................................................................................................................ 107
6 12.3.2 Support values ............................................................................................................................................. 108
7 12.3.3 CABAC binarizations.................................................................................................................................. 109
8 12.3.4 Transformation parameters .................................................................................................................. 111
9 12.3.5 Msar descriptor and read identifiers .................................................................................................. 113
10 12.3.6 State variables ............................................................................................................................................. 114
11 12.4 Initialization process for context variables ......................................................................................... 117
12 12.5 Arithmetic decoding engine ...................................................................................................................... 117
13 12.5.1 Initialization ................................................................................................................................................. 117
14 12.5.2 Arithmetic decoding process ................................................................................................................. 118
15 12.6 Decoding process for sequence descriptors ........................................................................................ 125
16 12.6.1 General ........................................................................................................................................................... 125
17 12.6.2 Block payload decoding process ........................................................................................................... 126
18 13 Output format ....................................................................................................................................................... 139
19 13.1 General .............................................................................................................................................................. 139
20 13.2 MPEG-G record ............................................................................................................................................... 139
21 13.2.1 number_of_template_segments ............................................................................................................. 141
22 13.2.2 number_of_record_segments .................................................................................................................. 141
23 13.2.3 number_of_alignments ............................................................................................................................. 141
24 13.2.4 class_ID ........................................................................................................................................................... 141
25 13.2.5 read_group_len ............................................................................................................................................ 141
26 13.2.6 reserved ......................................................................................................................................................... 141
27 13.2.7 read_1_first ................................................................................................................................................... 141
28 13.2.8 seq_ID .............................................................................................................................................................. 142
29 13.2.9 as_depth ......................................................................................................................................................... 142
30 13.2.10 read_len ....................................................................................................................................................... 142
31 13.2.11 qv_depth ...................................................................................................................................................... 142
32 13.2.12 read_name_len .......................................................................................................................................... 142
33 13.2.13 read_name .................................................................................................................................................. 142
34 13.2.14 read_group ................................................................................................................................................. 142
35 13.2.15 sequence ..................................................................................................................................................... 142
36 13.2.16 quality_values............................................................................................................................................ 142
37 13.2.17 mapping_pos .............................................................................................................................................. 142
38 13.2.18 ecigar_len .................................................................................................................................................... 142
39 13.2.19 ecigar_string .............................................................................................................................................. 142
40 13.2.20 reverse_comp ............................................................................................................................................ 142
41 13.2.21 mapping_score .......................................................................................................................................... 143
42 13.2.22 split_alignment ......................................................................................................................................... 143
43 13.2.23 delta .............................................................................................................................................................. 143
44 13.2.24 split_pos ....................................................................................................................................................... 143
45 13.2.25 split_seq_ID ................................................................................................................................................. 143
46 13.2.26 flags ............................................................................................................................................................... 143
47 13.2.27 more_alignments...................................................................................................................................... 143
48 13.2.28 next_pos ....................................................................................................................................................... 143
49 13.2.29 next_seq_ID ................................................................................................................................................. 143
50 13.3 Initialization process ................................................................................................................................... 143
51 Annex A (informative) Tokenization of reads identifiers ......................................................................... 147
52 Annex B (informative) Mapping quality .......................................................................................................... 149
53 Annex C (informative) Inverse binarization examples .............................................................................. 150
54 C.1 Binary (BI) binarization .............................................................................................................................. 150
© ISO 2020 – All rights reserved---------------------- Page: 5 ----------------------
ISO/IEC DIS 23092-2
1 C.2 Truncated Unary (TU) binarization ........................................................................................................ 150
2 C.3 Exponential Golomb (EG) binarization .................................................................................................. 150
3 C.4 Truncated Exponential Golomb (TEG) binarization ......................................................................... 151
4 C.5 Signed Truncated Exponential Golomb (STEG) Binarization ........................................................ 151
5 C.6 Split Unit-wise Truncated Unary (SUTU) binarization .................................................................... 151
6 C.7 Signed Split Unit-wise Truncated Unary (SSUTU) binarization ..............................................
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.