ISO/IEC 15444-6:2003/Amd 1:2007
(Amendment)Information technology — JPEG 2000 image coding system — Part 6: Compound image file format — Amendment 1: Hidden text metadata
Information technology — JPEG 2000 image coding system — Part 6: Compound image file format — Amendment 1: Hidden text metadata
Technologies de l'information — Système de codage d'images JPEG 2000 — Partie 6: Format de fichier d'image de composant — Amendement 1: Métadonnées de texte caché
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 15444-6
First edition
2003-10-15
AMENDMENT 1
2007-08-15
Information technology — JPEG 2000
image coding system —
Part 6:
Compound image file format
AMENDMENT 1: Hidden text metadata
Technologies de l'information — Système de codage d'image
JPEG 2000 —
Partie 6: Format de fichier d'image de composant
AMENDEMENT 1: Métadonnées de texte caché
Reference number
ISO/IEC 15444-6:2003/Amd.1:2007(E)
©
ISO/IEC 2007
---------------------- Page: 1 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2007
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2007 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 1 to ISO/IEC 15444-6:2003 was prepared by Joint Technical Committee ISO/IEC JTC 1,
Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.
© ISO/IEC 2007 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
Information technology — JPEG 2000 image coding system —
Part 6:
Compound image file format
AMENDMENT 1: Hidden text metadata
Add the following normative references to 2.2:
IETF RFC 1950, ZLIB Compressed Data Format Specification version 3.3, May 1996
IETF RFC 1951, DEFLATE Compressed Data Format Specification version 1.3, May 1996
IETF RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, August 1998
W3C, Cascading Style Sheets, level 1 (CSS1) Specification, http://www.w3.org/pub/WWW/TR/REC-CSS1
W3C, Cascading Style Sheets, level 2 (CSS2) Specification, http://www.w3.org/TR/REC-CSS2
W3C, HTML 4.01 Specification, http://www.w3.org/TR/html401
W3C, XHTML 1.0 Extensible HyperText Markup Language, Second Edition, http://www.w3.org/TR/xhtml1
W3C, XML Schema Part 0: Primer, Second Edition, http://www.w3.org/TR/xmlschema-0
W3C, XML Schema Part 1: Structures, Second Edition, http://www.w3.org/TR/xmlschema-1
W3C, XML Schema Part 2: Datatypes, Second Edition, http://www.w3.org/TR/xmlschema-2
Add the following terms and definitions to Clause 3:
3.23
hidden text
symbolic representation for the characters and words found in an image
3.24
annotation
particular region of a page in a JPM document that has associated a URL reference, a note or a highlight
3.25
hidden text XML
XML data which describe hidden text and annotations for a single page in a JPM file and which conform to the
schema in Annex H
3.26
compressed hidden text XML
hidden text XML data compressed using the mechanisms defined in F.2
© ISO/IEC 2007 – All rights reserved 1
---------------------- Page: 4 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
3.27
hidden text UUID box
UUID box containing compressed hidden text XML
3.28
hidden text XML Schema
XML Schema for hidden text XML, as defined in H.1
Add the following abbreviations to Clause 4:
HTX Hidden Text XML
Add the following subclause after 5.2.8:
5.3 Hidden Text Metadata
Hidden text metadata is data representing the text, text elements and text flow associated with an image. In
the context of this standard, hidden text is associated with a particular region of a page in a JPM document.
Common uses for hidden text include text searching and highlighting, cut-and-paste, and text-to-speech
processing. Hidden text describes the flow of the text on a page as well as the text elements.
JPM allows a rich, multiple content-type representation of a document. Each region of a page may be
encoded with a compression technique best suited to its characteristics. In regions containing text, high fidelity
reproduction of the source image is retained by not replacing the text regions with a character-based rendition
through OCR, but rather by using advanced coding methods such as JBIG2. Even OCR results with a 99
percent accuracy contain substantial numbers of errors per page which require expensive human labour to
correct. The searchable nature of a character-based rendition can be obtained instead by associating hidden
"dirty OCR" results with the corresponding text image. This standard defines a format for hidden text metadata.
A key issue with hidden text is capturing the ambiguities seen by the OCR engine in a way that allows
properly-constructed search engines to find whether and where a given word might be present in a text image.
Properly captured, this information provides nearly as much searching precision as an approach using human-
corrected "clean OCR" data, but at much lower cost. Search results are most useful where there are fewer
false positives to weed through. Intelligent search engines can take account of such data as confidence and
alternate characters or alternate words to appropriately alter the ranking of search hits on less certain
characters.
In many cases, true ambiguity exists in the image and it would confuse a human observer as well. In these
cases, saving confidence values for characters and their alternatives or describing several alternative parsings
of a string of characters into words can amount to saving the state of the OCR process to allow the problem to
be revisited in a later stage, perhaps by a different engine or by access to first a general dictionary and then a
set of more specialized dictionaries.
As a last step, when a person is presented with the search results, they can dismiss a given search hit by
comparison to the actual image data for a character or word. For this purpose (and to allow later-stage OCR
processes to resume analysis on the image), bounding box rectangles can be defined for all the elements of
the hidden text such as characters, words, lines, paragraphs and regions. By indicating a container
relationship among these items, intelligent navigation and text selection can occur at character, word, line,
paragraph boundaries. A reading order through these rectangles can be defined for what was in the image
just a random placement of unrelated glyphs.
While it is primarily designed for use by machines such as search engines, the hidden text can also serve as a
crude (if "dirty") or adequate (if "clean") alternate representation for an image region to allow it to display on
character-based devices (such as mobile phones) or small-area graphics devices (such as PDAs).
2 © ISO/IEC 2007 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
Annotations are added to the document typically with a WYSIWYG editor to indicate URL references, notes,
and to highlight key sections of the document text. Each annotation is associated with a particular region of a
page in a JPM document.
XML is used for hidden text and annotations because it is a format widely used to store structured information,
and can be machine processed.
Renumber the original 5.3 as 5.4.
Add the following rows at the correct alphabetical location in Table A.1 of A.4:
Table A.1 — Boxes defined or referenced within this International Standard
Box name Type Superbox Comments (Informative)
Hidden Text Metadata ‘htxb’ Yes This optional box contains hidden text and
(0x68747862) annotations.
HTX Reference Box ‘phtx’ No This optional box can be used to point to
(0x70687478) Hidden Text Metadata box contents at top file
level.
Add the following subclauses after B.6.4:
B.6.5 Hidden Text Metadata box (superbox)
Box type: ‘htxb’ (0x68747862)
Container: Page box or File
Mandatory: No
Quantity: At most one if the container is the Page box, any number if the container is the file
Location: Anywhere in the Page box after the Page Header box if the container is the Page box, or
anywhere after the File Type box if the container is the file
The Hidden Text Metadata box (‘htxb’) serves as a container for hidden text data. It is a superbox that may
contain an optional Label box and must contain one of two box types. It may either contain one XML box
containing hidden text metadata, or it may contain one UUID box containing hidden text metadata as specified
in F.2.
The type of a Hidden Text Metadata box shall be htxb’ (0x68747862). The contents of a Hidden Text
Metadata box shall be as in Figure B.25:
or
Figure B.25 — Organization of the contents of a Hidden Text Metadata box
© ISO/IEC 2007 – All rights reserved 3
---------------------- Page: 6 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
B.6.6 HTX Reference box
Box type: ‘phtx’ (0x70687478)
Container: Page box
Mandatory: No
Quantity: At most one
Location: Anywhere in the Page box after the Page Header box
If the hidden text for a page is contained in a Hidden Text Metadata box within the corresponding Page box,
this box must not appear. If the hidden text for a page is contained in a series of one or more Hidden Text
Metadata boxes at the file level, one HTX reference box has to be included in the corresponding Page box.
The type of a HTX Reference box shall be 'phtx' (0x70687478). The contents of a HTX Reference box shall be
as in Figure B.26:
Figure B.26 — Organization of the contents of a HTX Reference box
Rtyp: Referenced box type. This field specifies the actual type (as would be found in the TBox
field in an actual box header) of the box referenced by this HTX Reference box. However, a
reader shall not attempt to locate a physically stored box header for the box represented by
this HTX Reference box, as it is legal to use a HTX Reference box to create a new box that
is not contiguously contained in other locations within this or other files, and thus the box
header will not exist.
flst: Fragment List box. This box specifies the actual locations of the fragments of the referenced
HTX element. When those fragments are concatenated, in order, as specified by the
Fragment List box definition, the resulting byte-stream shall be the contents of the
referenced HTX element, which contains hidden text data, and shall not include the box
header fields. The format of the Fragment List box is specified in B.5.1.1. If Rtyp is 'uuid'
and the UUID signals deflate compression as defined in F.2, the number of fragments of the
Fragment List box must be one.
label: Label box. This optional box may contain a Label box which specifies a label or name for
the hidden text of the corresponding page. The structure of a Label box is specified in B.6.3.
Table B.31 — HTX Reference box contents data structure values
Parameter Size (bits) Value
Rtyp 32 See Table B.32
flst Variable Variable
label Variable Variable
Table B.32 — Legal Rtyp values
Value Meaning
The referenced HTX data shall be contained in an XML box as described in Annex F.
xml\40
The XML box is defined in I.7.1 of ITU-T Rec T.800 (2002) | ISO/IEC 15444-1:2004.
The referenced HTX data shall be contained in a UUID box as described in Annex F.
uuid
The UUID box is defined in I.7.2 of ITU-T Rec T.800 (2002) | ISO/IEC 15444-1:2004.
All other values reserved
4 © ISO/IEC 2007 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
Renumber the original B.6.5 as B.6.7.
Add the following annexes after Annex E:
Annex F
(normative)
Hidden Text and Annotations Storage
F.1 Storage of HTX in JPM
A hidden text XML element is restricted to represent text for a single page. It is stored in a Hidden Text
Metadata box as defined in B.6.5. The Hidden Text Metadata box either appears within the corresponding
Page box or is placed at the top level of the file. If placed on top level, an HTX Reference box as defined in
B.6.6 must be placed in the corresponding Page box to point to the Hidden Text Metadata boxes that
composes the hidden text of the page.
When a Hidden Text Metadata box is small in size, it is reasonable to place it directly in Page box. In keeping
with the usual JPM approach, large objects are generally placed at the top file level. In this case, the much
smaller HTX Reference box is placed in the page box and points to the actual data. Also in this case a single
HTX Reference box can point to multiple file level Hidden Text Metadata boxes. This can be used to compose
the HTX for many pages from combinations of fixed page content (such as page headers and footers) and
variable page content unique to each page.
XML data representing hidden text and annotations is defined using XML 1.0, and conforms to the schemas in
Annex H. It shall be referred to as Hidden Text XML or HTX.
HTX shall be stored in a Hidden Text Metadata box as defined in B.6.5.
The storage of uncompressed HTX may increase file size considerably. In order to minimise the increase in
file size, HTX may be compressed using the mechanisms defined in F.2.
F.2 Compression of HTX
HTX may be compressed using the zlib format defined in IETF RFC 1950 with DEFLATE compression defined
in IETF RFC 1951.
UUID boxes shall be used for the storage of compressed HTX in the JPM file format.
Compressed HTX shall be stored in a UUID box, as defined in I.7.3 of ISO/IEC 15444-1:2004, with the
following contents:
ID This field shall contain the following 16 hexadecimal bytes:
c2 f3 66 a4 27 ec 40 c4 a0 9a 7e 65 2f 36 eb 59
DATA This field will contain hidden text XML compressed to the DEFLATE format, as specified in F.1.
A UUID box with the above content shall be referred to as a hidden text UUID box.
© ISO/IEC 2007 – All rights reserved 5
---------------------- Page: 8 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
The following URL may be used in a UUID Data Entry box, as defined in I.7.3.2 of ISO/IEC 15444-1:2004, to
describe the format of the data contained in hidden text UUID boxes:
http://www.jpeg.org/hiddentext/htx.html
6 © ISO/IEC 2007 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
Annex G
(normative)
Hidden Text and Annotations Types and Elements
G.1 Overview
This section describes each of the HTX types and elements, and how they are to be used and interpreted.
Annex H formally describes the schemas that the hidden text XML must conform to. Here the text is a
description of each of the elements, what they are for, how they relate to each other, how often they can occur,
how they are to be interpreted.
Hidden text can be encoded using subelements at different levels of detail as described in this section. This
can be used to structure the hidden text and give it a text flow in regions, paragraphs, lines, words, etc.
Whenever this kind of structured information is not available, the hidden text can be directly put into the
appropriate elements, omitting specific positioning of lines inside paragraphs, words inside lines, etc. The
following picture gives an overview of the various elements that can be used to store the hidden text of a
page:
Figure G.1 — Structure of HTX
© ISO/IEC 2007 – All rights reserved 7
---------------------- Page: 10 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
The hidden text XML schema (see H.1) uses some types and elements defined in the XHTML 1.0 XML
Schema. See the XHTML 1.0 reference for full details of these types and elements.
The following additional types and elements are defined:
G.2 Types
G.2.1 Shape
The Shape type is used to describe the shape of a region in the document and is defined by the following
XML schema declaration:
Enumeration of shapes.
G.2.2 Coordinates
The Coords type is used to store a comma separated sequence of non-negative integer values. This type is
similar to the XHTML 1.0 Coords type but excludes negative and percentage values. The attribute specifies
the position and shape of the area. The number and order of values depends on the value of the shape
attribute. Possible combinations:
• rect: left-x, top-y, right-x, bottom-y.
• poly: x1, y1, x2, y2, ., xN, yN.
If the first and last x and y coordinate pairs are not the same, user agents must infer an additional
coordinate pair to close the polygon.
The Coords element is defined by the following XML schema declaration:
Comma separated list of integer values.
8 © ISO/IEC 2007 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
G.2.3 Percentage
A simple type Percentage is defined to store a string that holds a percent value indicating the confidence of a
hidden text word or character match. Percentage is defined as follows:
Percentage value.
G.2.4 Angle
A simple type Angle is defined to store a string that indicates an angle for use in hidden text. The Angle type
is defined as follows:
nn for radian measure or nn° for degree
G.2.5 Resolution
A simple type Resolution is defined to store a string that indicates a resolution for use with coordinates in
hidden text and annotations. The Resolution type is defined as follows:
Resolution value in dots per inch (dpi). A single number stands
for
horizontal and vertical resolution having the same values.
Two numbers can be used to define different resolutions for
horizontal
(first number) and vertical (second number).
© ISO/IEC 2007 – All rights reserved 9
---------------------- Page: 12 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
G.3 Common Attributes
G.3.1 Core Attributes
Coreattrs, a set of core attributes that are common to most elements, is defined as follows:
core attributes common to most elements
id document-wide unique id
class space separated list of classes
lang language code (backwards compatible)
xml:lang language code (as per XML 1.0 spec)
dir direction for weak/neutral text
iref URI of the image corresponding to the region
The following attributes are members of the Coreattrs group:
• lang (optional)
An optional attribute of type LanguageCode to indicate the default language for text in the hidden
text XML. Refer to the XHTML 1.0 specification for further details.
• xml:lang (optional)
An optional attribute of type xml:lang to indicate the default language for text in the hidden text
XML. Refer to the XHTML 1.0 specification for further details.
• dir (optional)
An optional attribute containing the string rtl or ltr, indicating the default direction for text in the
hidden text XML. Refer to the XHTML 1.0 specification for further details.
• id (optional)
An optional attribute of type xs:ID. Contains an id that is unique in the scope of this document.
This attribute can be used for referencing a certain element (e.g. in a style sheet). See XML
Schema specification for further details.
• class (optional)
This attribute can contain a space separated list of classes. Useful for convenient style sheet
usage.
• iref (optional)
URI which points to an image file corresponding to the region.
(ex.1 iref=”http://jpeg.org/image.jp2”, ex.2 iref=”jpip://jpeg.org/image.jp2?fsize=32,32&rsiz=32,32”)
10 © ISO/IEC 2007 – All rights reserved
---------------------- Page: 13 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
G.3.2 Position Attributes
Posattrs, a set of position attributes that are common to most visual elements, is defined as follows:
positioning attributes common to most elements
shape shape of an element
coords coordinates of an element
angle angle of text direction
0 is horizontal to the right, positive values
mean counter-clockwise rotation
baseline angle of the characters in a line of
text
The following attributes are members of the Posattrs group:
• shape (optional)
An optional attribute of type Shape containing the shape of the region bounding the element.
Possible values are ‘rect’ for a rectangle and ‘poly’ for a polygon.The default value for this
attribute is rect. If this attribute is missing then the bounding shape for this element is the
bounding shape of the parent element (which is the whole page in case of hiddentext).
• coords (optional)
The logical coordinates of the shape bounding the hidden text for this page. The unit is pixels. A
resolution can be defined as an attribute on the htx element. If this attribute is missing then the
bounding shape for this element is the bounding shape of the parent element (which is the whole
page in case of hiddentext). How the value of cords is to interpreted depends on the shape
attribute. The coord values unit is pixel, no percentage or any length unit like inch or centimetre.
The origin (coordinates ‘0, 0’) is the upper left corner of the page.
• angle (optional)
An attribute of type Angle that indicates the angle of orientation of the element, relative to the
direction of the element's parent.
Can either be in degree (value followed by a ° sign) or radian measure (value without unit). A
value of 0 means same direction as the element's parent, positive values mean rotating counter-
clockwise relative to that direction. Default value is ‘0’.
• baseline (optional)
An attribute of type Angle that indicates the relative orientation of the sub elements and direct
content contained in the element with respect to the direction given by the angle attribute.
Can either be in degree (value followed by a ° sign) or radian measure (value without unit). A
value of 0 means same direction as the element, positive values mean rotating counter-clockwise
relative to that direction. Default value is ‘0’.
The values of shape and cords attribute should be interpreted as described in HTML 4.01 subclause 13.6.1
section “AREA attribute definitions”.
© ISO/IEC 2007 – All rights reserved 11
---------------------- Page: 14 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
G.4 Elements
G.4.1 HTX
The htx element, the global container and root elements for hidden text and annotations, is declared as
follows:
Global container for hidden text and annotations. Contains
language attributes, an optional xhtml head and a
mandatory body.
This root element contains an optional xhtml:head element, a annotations element and a hiddentext element.
Attributes:
Core attributes apply.
• res (optional)
An optional attribute of type Resolution indicating the resolution for any coordinates in dots per
inch (dpi). A single number stands for horizontal and vertical resolution having the same values.
Two numbers can be used to define different resolutions for horizontal (first number) and vertical
(second number).
• width (optional)
The width of the page in pixels.
• height (optional)
The height of the page in pixels.
Elements
• xhtml:head (at most one)
An optional element containing general header data for the hidden text XML elements, including
any required Cascading Style Sheet data. Refer to the XHTML 1.0 specification for further details.
• annotations (at most one)
An optional annotations element may be used to attach notes and to describe clickable and
highlighted regions on a page.
• hiddentext (at most one)
An optional hiddentext containing the hidden text XML data for a page.
Direct content
• none
12 © ISO/IEC 2007 – All rights reserved
---------------------- Page: 15 ----------------------
ISO/IEC 15444-6:2003/Amd.1:2007(E)
G.4.2 Parameter
The param element is declared as follows:
User defined properties for a hidden text and annotations
object.
A param element of a HTX contains user defined properties for the associated element, e.g. to specify the
OCR engine use
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.