Technologies de l'information -- Représentation des informations génomiques

General Information

Status
Published
Current Stage
5020 - FDIS ballot initiated: 2 months. Proof sent to secretariat
Start Date
24-Jul-2020
Completion Date
24-Jul-2020
Ref Project

RELATIONS

Buy Standard

Standard
ISO/IEC DIS 23092-2 - Information technology -- Genomic information representation
English language
153 pages
limited time 15% off
Preview
limited time 15% off
Preview

Standards Content (sample)

DRAFT INTERNATIONAL STANDARD
ISO/IEC DIS 23092-2
ISO/IEC JTC 1/SC 29 Secretariat: JISC
Voting begins on: Voting terminates on:
2020-01-17 2020-04-10
Information technology — Genomic information
representation —
Part 2:
Coding of genomic information
Technologies de l'information — Représentation des informations génomiques —
Partie 2: Codage des informations génomiques
ICS: 35.040.99
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENT AND APPROVAL. IT IS
THEREFORE SUBJECT TO CHANGE AND MAY
NOT BE REFERRED TO AS AN INTERNATIONAL
STANDARD UNTIL PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
This document is circulated as received from the committee secretariat.
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
Reference number
NATIONAL REGULATIONS.
ISO/IEC DIS 23092-2:2020(E)
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION. ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC DIS 23092-2:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/DIS 23092-2:2020(E)
2 Contents

3 Foreword ......................................................................................................................................................................... ix

4 Introduction..................................................................................................................................................................... x

5 1 Scope ............................................................................................................................................................................. 1

6 2 Normative references ............................................................................................................................................. 1

7 3 Terms and definitions ............................................................................................................................................ 1

8 4 Abbreviations ............................................................................................................................................................ 6

9 5 Conventions................................................................................................................................................................ 6

10 5.1 General ................................................................................................................................................................... 6

11 5.2 Arithmetic operators ........................................................................................................................................ 6

12 5.3 Logical operators ................................................................................................................................................ 6

13 5.4 Relational operators ......................................................................................................................................... 7

14 5.5 Bit-wise operators .............................................................................................................................................. 7

15 5.6 Assignment operators....................................................................................................................................... 7

16 5.7 Range notation .................................................................................................................................................... 8

17 5.8 Mathematical functions .................................................................................................................................... 8

18 5.9 Order of operation precedence ..................................................................................................................... 8

19 5.10 Variables, syntax elements and tables ........................................................................................................ 9

20 5.11 Text description of logical operators ....................................................................................................... 10

21 5.12 Processes ............................................................................................................................................................ 11

22 6 Syntax and semantics .......................................................................................................................................... 12

23 6.1 Method of specifying syntax in tabular form ......................................................................................... 12

24 6.2 Bit ordering ....................................................................................................................................................... 13

25 6.3 Specification of syntax functions and data types ................................................................................. 13

26 6.4 Semantics ........................................................................................................................................................... 14

27 7 Data structures ...................................................................................................................................................... 14

28 7.1 Data unit ............................................................................................................................................................. 14

29 7.2 Raw reference ................................................................................................................................................... 15

30 7.2.1 Syntax and semantics ..................................................................................................................................... 16

31 7.3 Parameter set.................................................................................................................................................... 16

32 7.3.1 Syntax and semantics ..................................................................................................................................... 16

33 7.3.2 Encoding parameters ..................................................................................................................................... 16

34 7.4 Access unit ......................................................................................................................................................... 23

35 7.4.1 Syntax and semantics ..................................................................................................................................... 23

36 7.4.2 Access unit types .............................................................................................................................................. 26

37 8 Descriptors .............................................................................................................................................................. 27

38 9 Sequencing reads .................................................................................................................................................. 30

39 9.1 Supported symbols ......................................................................................................................................... 30

40 9.2 Paired-end reads ............................................................................................................................................. 32

41 9.3 Reverse-complement reads ......................................................................................................................... 32

42 9.4 Data classes........................................................................................................................................................ 32

43 9.5 Aligned data ....................................................................................................................................................... 33

44 9.6 Unaligned data .................................................................................................................................................. 34

45 10 Decoding process .................................................................................................................................................. 35

46 10.1 General ................................................................................................................................................................ 35

47 10.2 dataset_type = 0 or 1 ....................................................................................................................................... 35

48 10.2.1 References padding ..................................................................................................................................... 35

49 10.2.2 Type 1 AU (Class P) ...................................................................................................................................... 36

50 10.2.3 Type 2 AU (Class N) ...................................................................................................................................... 37

51 10.2.4 Type 3 AU (Class M) ..................................................................................................................................... 37

© ISO 2020 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/IEC DIS 23092-2

1 10.2.5 Type 4 AU (Class I) ....................................................................................................................................... 38

2 10.2.6 Type 5 AU (Class HM) .................................................................................................................................. 40

3 10.2.7 Type 6 AU (Class U) ...................................................................................................................................... 40

4 10.3 dataset_type = 2 ................................................................................................................................................ 41

5 10.3.1 Type 1 AU......................................................................................................................................................... 41

6 10.3.2 Type 2 AU......................................................................................................................................................... 42

7 10.3.3 Type 3 AU......................................................................................................................................................... 42

8 10.3.4 Type 4 AU......................................................................................................................................................... 42

9 10.3.5 Type 6 AU......................................................................................................................................................... 43

10 10.4 Genomic descriptors ...................................................................................................................................... 43

11 10.4.1 pos ...................................................................................................................................................................... 43

12 10.4.2 rcomp ................................................................................................................................................................ 44

13 10.4.3 flags ................................................................................................................................................................... 45

14 10.4.4 mmpos .............................................................................................................................................................. 45

15 10.4.5 mmtype ............................................................................................................................................................ 48

16 10.4.6 clips ................................................................................................................................................................... 52

17 10.4.7 ureads ............................................................................................................................................................... 55

18 10.4.8 rlen..................................................................................................................................................................... 55

19 10.4.9 pair..................................................................................................................................................................... 57

20 10.4.10 mscore ........................................................................................................................................................... 65

21 10.4.11 mmap.............................................................................................................................................................. 66

22 10.4.12 msar ................................................................................................................................................................ 69

23 10.4.13 rtype ............................................................................................................................................................... 70

24 10.4.14 rgroup ............................................................................................................................................................ 71

25 10.4.15 qv ..................................................................................................................................................................... 72

26 10.4.16 rname ............................................................................................................................................................. 76

27 10.4.17 rftp ................................................................................................................................................................... 76

28 10.4.18 rftt .................................................................................................................................................................... 77

29 10.4.19 tokentype descriptors .............................................................................................................................. 78

30 10.5 sequence ............................................................................................................................................................. 86

31 10.5.1 Aligned reads (Classes P, N, M, I, HM) .................................................................................................... 86

32 10.5.2 Unmapped reads (Class HM, U) ............................................................................................................... 87

33 10.6 e-cigar .................................................................................................................................................................. 88

34 10.6.1 Syntax ............................................................................................................................................................... 88

35 10.6.2 Decoding process for the first alignment............................................................................................. 89

36 10.6.3 Decoding process for other alignments ............................................................................................... 96

37 10.6.4 Reference transformation ......................................................................................................................... 96

38 11 Representation of reference sequences ....................................................................................................... 98

39 11.1 External reference .......................................................................................................................................... 98

40 11.2 Embedded reference ...................................................................................................................................... 98

41 11.3 Computed reference ....................................................................................................................................... 98

42 11.3.1 General ............................................................................................................................................................. 99

43 11.3.2 Reference transformation ......................................................................................................................... 99

44 11.3.3 PushIn ............................................................................................................................................................. 100

45 11.3.4 Local assembly ............................................................................................................................................ 101

46 11.3.5 Global assembly .......................................................................................................................................... 102

47 12 Block payload parsing process ....................................................................................................................... 102

48 12.1 General .............................................................................................................................................................. 102

49 12.2 Inverse binarizations ................................................................................................................................... 103

50 12.2.1 Binary (BI) .................................................................................................................................................... 104

51 12.2.2 Truncated Unary (TU) .............................................................................................................................. 104

52 12.2.3 Exponential Golomb (EG) ........................................................................................................................ 104

53 12.2.4 Truncated Exponential Golomb (TEG) ................................................................................................ 105

54 12.2.5 Signed Truncated Exponential Golomb (STEG) ............................................................................... 105

55 12.2.6 Split Unit-wise Truncated Unary (SUTU) ........................................................................................... 105

© ISO 2020 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/DIS 23092-2:2020(E)

1 12.2.7 Signed Split Unit-wise Truncated Unary (SSUTU) .......................................................................... 106

2 12.2.8 Double Truncated Unary (DTU) ............................................................................................................ 106

3 12.2.9 Signed Double Truncated Unary (SDTU) ........................................................................................... 107

4 12.3 Decoder Configuration ................................................................................................................................ 107

5 12.3.1 Sequences and quality values ................................................................................................................ 107

6 12.3.2 Support values ............................................................................................................................................. 108

7 12.3.3 CABAC binarizations.................................................................................................................................. 109

8 12.3.4 Transformation parameters .................................................................................................................. 111

9 12.3.5 Msar descriptor and read identifiers .................................................................................................. 113

10 12.3.6 State variables ............................................................................................................................................. 114

11 12.4 Initialization process for context variables ......................................................................................... 117

12 12.5 Arithmetic decoding engine ...................................................................................................................... 117

13 12.5.1 Initialization ................................................................................................................................................. 117

14 12.5.2 Arithmetic decoding process ................................................................................................................. 118

15 12.6 Decoding process for sequence descriptors ........................................................................................ 125

16 12.6.1 General ........................................................................................................................................................... 125

17 12.6.2 Block payload decoding process ........................................................................................................... 126

18 13 Output format ....................................................................................................................................................... 139

19 13.1 General .............................................................................................................................................................. 139

20 13.2 MPEG-G record ............................................................................................................................................... 139

21 13.2.1 number_of_template_segments ............................................................................................................. 141

22 13.2.2 number_of_record_segments .................................................................................................................. 141

23 13.2.3 number_of_alignments ............................................................................................................................. 141

24 13.2.4 class_ID ........................................................................................................................................................... 141

25 13.2.5 read_group_len ............................................................................................................................................ 141

26 13.2.6 reserved ......................................................................................................................................................... 141

27 13.2.7 read_1_first ................................................................................................................................................... 141

28 13.2.8 seq_ID .............................................................................................................................................................. 142

29 13.2.9 as_depth ......................................................................................................................................................... 142

30 13.2.10 read_len ....................................................................................................................................................... 142

31 13.2.11 qv_depth ...................................................................................................................................................... 142

32 13.2.12 read_name_len .......................................................................................................................................... 142

33 13.2.13 read_name .................................................................................................................................................. 142

34 13.2.14 read_group ................................................................................................................................................. 142

35 13.2.15 sequence ..................................................................................................................................................... 142

36 13.2.16 quality_values............................................................................................................................................ 142

37 13.2.17 mapping_pos .............................................................................................................................................. 142

38 13.2.18 ecigar_len .................................................................................................................................................... 142

39 13.2.19 ecigar_string .............................................................................................................................................. 142

40 13.2.20 reverse_comp ............................................................................................................................................ 142

41 13.2.21 mapping_score .......................................................................................................................................... 143

42 13.2.22 split_alignment ......................................................................................................................................... 143

43 13.2.23 delta .............................................................................................................................................................. 143

44 13.2.24 split_pos ....................................................................................................................................................... 143

45 13.2.25 split_seq_ID ................................................................................................................................................. 143

46 13.2.26 flags ............................................................................................................................................................... 143

47 13.2.27 more_alignments...................................................................................................................................... 143

48 13.2.28 next_pos ....................................................................................................................................................... 143

49 13.2.29 next_seq_ID ................................................................................................................................................. 143

50 13.3 Initialization process ................................................................................................................................... 143

51 Annex A (informative) Tokenization of reads identifiers ......................................................................... 147

52 Annex B (informative) Mapping quality .......................................................................................................... 149

53 Annex C (informative) Inverse binarization examples .............................................................................. 150

54 C.1 Binary (BI) binarization .............................................................................................................................. 150

© ISO 2020 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/IEC DIS 23092-2

1 C.2 Truncated Unary (TU) binarization ........................................................................................................ 150

2 C.3 Exponential Golomb (EG) binarization .................................................................................................. 150

3 C.4 Truncated Exponential Golomb (TEG) binarization ......................................................................... 151

4 C.5 Signed Truncated Exponential Golomb (STEG) Binarization ........................................................ 151

5 C.6 Split Unit-wise Truncated Unary (SUTU) binarization .................................................................... 151

6 C.7 Signed Split Unit-wise Truncated Unary (SSUTU) binarization ..............................................

...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.