Postal Services - Open Standard Interface - Address Data File Format for OCR/VCS Dictionary Generation

This document defines a file format for the generation of postal address directories. It is designed to hold all information necessary to support address reading software including data required for forwarding applications. In typical postal automation systems these files will be processed by directory generation software which creates application specific loadable data. This data – usually referred to as operational directory – is heavily compressed and contains access tables tailored for the specific reading software.
Not in the scope of this document are topics external to file like compression, checksums, the interface for transmission to the supplier, modification permissions, error handling on inconsistent data and undo in updates.

Postalische Dienstleistungen - Offene Normschnittstelle - Adressdateiformat für die Generierung von Wörterbüchern in OCR/Videocodier-Systemen

1.1   Anwendungsbereich
Das vorliegende Dokument legt ein Dateiformat für die Erzeugung von Postadressverzeichnissen fest. Dieses Dateiformat muss so ausgeführt sein, dass es alle Informationen enthält, die zur Unterstützung der Lesesoftware für Adressen erforderlich sind, einschließlich von Daten für Nachsendungen. In typischen Postautomatisierungssystemen werden diese Dateien von einer Verzeichnis-Generierungssoftware verarbei-tet, die anwendungsspezifische, ladefähige Daten erzeugt. Diese Daten, die gewöhnlich als Betriebs-verzeichnis bezeichnet werden, sind stark komprimiert und enthalten Zugriffstabellen, die auf die spezielle Lesesoftware zugeschnitten sind.
Nicht zum Anwendungsbereich dieses Dokumentes gehören Themen, die Dateien nicht berühren, wie z. B. Komprimierung, Prüfsummen, die Schnittstelle für eine Übertragung zum Lieferanten, Modifikationsrechte, die Fehlerbehandlung von inkonsistenten Daten und Rücknahmefunktionen (Undo) in Aktualisierungen (Updates).
1.2   Zweck
Das Format wurde unter Berücksichtigung folgender Anforderungen entwickelt:
   es muss folgende Daten enthalten:
   Adressen, die sich aus den Adresskomponenten zusammensetzen (einschließlich von Parallel-bezeichnungen (Alias) und Bereichsdaten);
   Name der Person und der Organisation;
   Adresscodes, die gewöhnlich als Sortiercodes verwendet werden;
   Verknüpfungen zwischen Adressen z. B. für die Nachsendung;
                es sollte die Zeichencodierung nicht einschränken;
   es sollte für spezifische Anwendungen leicht benutzerdefinierbar sein;
   es sollte vollständige sowie inkrementelle Aktualisierungen ermöglichen, d. h. nur Änderung der Daten;
   für eine bessere Verarbeitung muss die Möglichkeit bestehen, die Daten in mehrere Dateien aufzuteilen.
Folgende Konzepte liegen diesem Format zugrunde:
   Das Format beruht auf XML.

Services postaux - Interface de standard ouvert - Format de fichiers de données d'adresses pour la génération du dictionnaire OCR/VCS

Poštne storitve - Odprti standardni vmesnik - Datotečni format naslovnih podatkov za generiranje slovarja s pomočjo OCR/VCS (sistem za optično razpoznavanje znakov)

General Information

Status
Published
Publication Date
24-Mar-2009
Current Stage
9020 - Submission to 2 Year Review Enquiry - Review Enquiry
Due Date
15-Apr-2022
Completion Date
15-Apr-2022

Buy Standard

Technical specification
-TS CEN/TS 15873:2009
English language
27 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (sample)

SLOVENSKI STANDARD
SIST-TS CEN/TS 15873:2009
01-maj-2009

3RãWQHVWRULWYH2GSUWLVWDQGDUGQLYPHVQLN'DWRWHþQLIRUPDWQDVORYQLKSRGDWNRY

]DJHQHULUDQMHVORYDUMDVSRPRþMR2&59&6 VLVWHP]DRSWLþQRUD]SR]QDYDQMH
]QDNRY
Postal Services - Open Standard Interface - Address Data File Format for OCR/VCS
Dictionary Generation

Postalische Dienstleistungen - Offene Normschnittstelle - Adress Datei Format für die

Generierung von Wörterbüchern in OCR/Videocodier-Systemen
Services posteaux - Interface de standard ouvert - Format de fichiers de données
d'adresses pour la génération du dictionnaire OCR/VCS
Ta slovenski standard je istoveten z: CEN/TS 15873:2009
ICS:
03.240 Poštne storitve Postal services
35.240.69 Uporabniške rešitve IT pri IT applications in postal
poštnih storitvah services
SIST-TS CEN/TS 15873:2009 en

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST-TS CEN/TS 15873:2009
---------------------- Page: 2 ----------------------
SIST-TS CEN/TS 15873:2009
TECHNICAL SPECIFICATION
CEN/TS 15873
SPÉCIFICATION TECHNIQUE
TECHNISCHE SPEZIFIKATION
March 2009
ICS 03.240; 35.240.60
English Version
Postal Services - Open Standard Interface - Address Data File
Format for OCR/VCS Dictionary Generation

Services postaux - Interface de standard ouvert - Format de Postalische Dienstleistungen - Offene Normschnittstelle -

fichiers de données d'adresses pour la génération du Adressdateiformat für die Generierung von Wörterbüchern

dictionnaire OCR/VCS in OCR/Videocodier-Systemen

This Technical Specification (CEN/TS) was approved by CEN on 1 March 2009 for provisional application.

The period of validity of this CEN/TS is limited initially to three years. After two years the members of CEN will be requested to submit their

comments, particularly on the question whether the CEN/TS can be converted into a European Standard.

CEN members are required to announce the existence of this CEN/TS in the same way as for an EN and to make the CEN/TS available

promptly at national level in an appropriate form. It is permissible to keep conflicting national standards in force (in parallel to the CEN/TS)

until the final decision about the possible conversion of the CEN/TS into an EN is reached.

CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland,

France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal,

Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and United Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION
COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
Management Centre: Avenue Marnix 17, B-1000 Brussels

© 2009 CEN All rights of exploitation in any form and by any means reserved Ref. No. CEN/TS 15873:2009: E

worldwide for CEN national Members.
---------------------- Page: 3 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
Contents Page

Foreword ..............................................................................................................................................................3

1 Introduction ............................................................................................................................................4

2 Scope and purpose................................................................................................................................5

2.1 Scope ......................................................................................................................................................5

2.2 Purpose ...................................................................................................................................................5

3 Related Standards .................................................................................................................................7

3.1 UPU S42 ..................................................................................................................................................7

4 Symbols and Abbreviations .................................................................................................................7

5 XML Schema adressTree ......................................................................................................................8

5.1 ,

and ..............................................................................................9

5.2 Address Tree in , and ...............................................................................................9

5.3 Attributes for , and .......................................................................................... 11

5.4 String parts in , and ......................................................................................................... 11

5.5 Ranges in , and ................................................................................................................ 12

5.6 Aliases in , and ....................................................................................................... 13

5.7 other XML files .................................................................................................................. 14

5.8 Linking addresses via .............................................................................................................. 15

5.9 Project specific part of the XML schema ......................................................................................... 16

6 XML Schema addressDeltaTree ........................................................................................................ 18

6.1 Joining deltas via and file names .................................................................... 19

6.2 Update actions , and ......................................................................................... 19

7 Miscellaneous ..................................................................................................................................... 21

Annex A ............................................................................................................................................................ 22

A.1 General XML Schema part ................................................................................................................. 22

A.2 Example for a project specific XML Schema part ........................................................................... 24

A.3 Initial addressTree Example .............................................................................................................. 25

A.4 Update addressDeltaTree Example................................................................................................... 26

A.5 Updated addressTree Example ......................................................................................................... 27

---------------------- Page: 4 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
Foreword

This document (CEN/TS 15873:2009) has been prepared by Technical Committee CEN/TC 331 “Postal

Services”, the secretariat of which is held by NEN.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent

rights. CEN [and/or CENELEC] shall not be held responsible for identifying any or all such patent rights.

According to the CEN/CENELEC Internal Regulations, the national standards organizations of the following

countries are bound to announce this Technical Specification: Austria, Belgium, Bulgaria, Cyprus, Czech

Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia,

Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain,

Sweden, Switzerland and the United Kingdom.

NOTE This document has been prepared by experts from CEN/TC 331 and UPU, in the framework of the Memorandum of

Understanding between UPU and CEN.
---------------------- Page: 5 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
1 Introduction

In initial meetings of CEN/TC331/WG3 interfaces which will benefit from standardization have been identified

and agreed on. Candidates for Open Interface standardization are:

 interface between the image handler and automatic address readers or video coding places;

 interface from machine control to Barcode Printers;
 interface from machine control to Barcode Reader / Verifier;
 interface between scanner, image handler and machine control;
 file format of Sort Plan;
 MIS Interface (Statistics);
 file format of Address data files.
The new intended standard deals with the file format of Address Data Files.

OCR results and video coder inputs have to be verified against the “real” existing addresses in order to reach

high recognition rates combined with low error rates. For that purpose postal operators provide postal address

directories to the OCR/VCS suppliers. Usually different postal operators use different file formats for these

(source) directories. In typical postal automation systems these files will be processed by directory generation

software which creates application specific loadable data. This data – usually referred to as “operational

directory” – is heavily compressed and contains access tables tailored for the specific reading software.

Usually different OCR/VCS suppliers use different operational directory formats.

This standard shall define a common Address Data File format for postal address directories to be provided

from the postal operators to the OCR/VCS suppliers.

This Address Data File format shall be designed to hold all information necessary to support address reading

and video coding software including data required for special recognition tasks e.g. forwarding applications.

---------------------- Page: 6 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
2 Scope and purpose
2.1 Scope

This document defines a file format for the generation of postal address directories. It is designed to hold all

information necessary to support address reading software including data required for forwarding applications.

In typical postal automation systems these files will be processed by directory generation software which

creates application specific loadable data. This data – usually referred to as operational directory – is heavily

compressed and contains access tables tailored for the specific reading software.

Not in the scope of this document are topics external to file like compression, checksums, the interface for

transmission to the supplier, modification permissions, error handling on inconsistent data and undo in

updates.
2.2 Purpose
The format has been designed with the following requirements in mind:
 must be able to hold the following data:
 addresses composed of address components (including aliases and range-data);
 person and organization names;
 address codes typically used as sort codes;
 links between addresses e.g. for use in forwarding;
 should not restrict character encoding;
 easily customizable for specific applications;
 should allow complete as well as incremental updates, i.e. change-only data;
 it must be possible to split data in multiple files for better handling.
The ideas behind this format are as follows:
 The format is based on XML.

 The basic XML structure is general. Project (the term project is used throughout this document to describe

a specific application such as address data for a specific country or postal organization) specifics are

coded as attributes. This should make it easier to build project independent parsers and tools.

 Address data can be structured hierarchically. An address component appearing in a lot of addresses

shall be written once as parent node in all addresses it is used in the XML address tree.

 Beyond the pure address data, there are general as well as optional project specific attributes on the level

of address components and string parts.

 In favour of faster parser execution and smaller file sizes the names of XML elements appearing very

often are short strings.
---------------------- Page: 7 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)

 Semantics are defined only in a basic manner and have to be completed in the project specific tailoring

process. E.g. a street without numbers in the data may be interpreted as a street which has no numbers,

or where all numbers are valid. Due to this users must be aware that the interoperability of this Technical

Specification may be limited to be applied to the specific project.
---------------------- Page: 8 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
3 Related Standards
3.1 UPU S42

1) UPU S42 is beginning with version -5 a two part standard. Part a contains concepts and the

theoretical language description. Part b contains practical examples from different countries and may

be supplemented with new examples in some future.

2) UPU S42a defines components an address is composed of as well as postal entities which can be

“described” using these address components. The standard goes into great detail in defining a

globally usable set of specific address components such as “postcode”, “door”, ...

3) UPU S42b describes how to write an address given its constituting address components. It uses

templates to describe the order, line-breaks, etc. The templates are country specific (US, Brazil,

England, ...) and also uses an country specific subset of the globally defined types.

4) UPU S42 address components are assumed to have a type and a string. They do not have

additional attributes and do not have aliases.

5) UPU S42 does not define a format for an individual address == address-component collection and

does not define a format for an address directory == set of addresses.
6) UPU S42 has no concept of sort codes or forwarding information.

UPU S42 will not conflict with the format defined in this document as it targets at a completely different

application and type of information described. The only thing in common with address data are the address-

component definitions themselves. These could be used in customizing the ADF for a specific project. UPU

S42’s excellent glossary should be reused where applicable.
4 Symbols and Abbreviations
XML eXtended Markup Language
ADF Address Data File
---------------------- Page: 9 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
5 XML Schema adressTree

The syntax is described as an XML schema, divided into a general and a project specific part. The general

part of the XML Schema defines the basic structures. It uses some types and attribute groups to be defined in

the project specific part. Basically the structure spans a tree of address components represented by XML

elements .

The general part of the schema is listed in section A.1A.1. The project specific part is explained in section 5.9.

This document contains also another XML Schema addressDeltaTree explained in chapter 6.

The following Figure 1 shows the general structure of the XML schema. Since the project specific part does

not change the general structure the diagram is independent from any project specifics.

Figure 1 — General data structure of the XML schema
---------------------- Page: 10 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
5.1 ,
and

Below the XML root element is one

and one section. The
stores a

string and optionally a list of global aliases. Aliases are described in section 5.6. A version string of

the data contained in the file is stored in the element.
5.2 Address Tree in , and

Addresses are stored in an address tree corresponding to the XML elements , and . is

the root node of this tree, address components stored in are the nodes and may be additional leafs.

Explanation on the element follows in section 5.8. One complete address corresponds to one leafs root

path. Each nodes root path identifies a partial address. Other XML elements and attributes carry additional

information for the address tree node. allows to split data into multiple files.

One address component in XML element holds a type mentioned in attribute tp and a string in child

elements or and . Attribute tp and other optional attributes for are described in section 5.3.

Other optional child elements are described in the following sections.

In this context one address component holds just a name with a type and does not necessarily describe a real

thing or place. Also abstract data like delivery point codes, sort codes or else may be stored in address

components represented as XML elements .
Example: Some addresses in various formats:
In a table:
Country City Street HNr
GERMANY BERLIN
GERMANY KONSTANZ BUECKLESTR 1
GERMANY KONSTANZ BUECKLESTR 2
GERMANY KONSTANZ BUECKLESTR 3
GERMANY KONSTANZ BUECKLESTR 4
As address tree:
Country
GERMANY
City City
BERLIN KONSTANZ
Street
BUECKLESTR
HNr HNr HNr HNr
1 2 3 4
Figure 2 — Addresses formatted as an address tree
---------------------- Page: 11 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
Address tree representation in XML:

xsi:noNamespaceSchemaLocation="xml_projectAddressTree.xsd">

0


GERMANY
BERLIN

KONSTANZ
BUECKLESTR
1
2
3
4





The following verbose XML representation of this example is more like the address table. Each address is

written as a child path of the element. Both XML representations are equivalent. Usage of the short

form as shown above is strongly recommended, due to less file size overhead and less risk of duplicating

errors.

xsi:noNamespaceSchemaLocation="xml_projectAddressTree.xsd">

0


GERMANY
BERLIN


GERMANY
KONSTANZ
BUECKLESTR
1

GERMANY
KONSTANZ
BUECKLESTR
2

GERMANY
KONSTANZ
BUECKLESTR
3

GERMANY
KONSTANZ
BUECKLESTR
4



---------------------- Page: 12 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
5.3 Attributes for ,
and

There are for a start two attributes available for certain XML elements mentioned in the description:

tp Type of the address component for XML elements and . Type names are to be defined

in the project specific part of the schema as enumeration in the xs:simpleType named typeAttrType.

Type names shall be short strings in order to reduce data amount.

id Identification code for XML elements and . See section 5.8 for details. The usage is

required if updates on basis of the XML schema addressDeltaTree as explained in chapter 6 should

take place. The id attribute allows distinguishing between different address entries with the same

spelling, if necessary.

The following general attributes are optional for , and . Additionally attributes defined in the

project specific part of the schema as xs:attributeGroup named element_projectAttribGroup are available for

these XML elements also.
rg Indicates that the entry is a range. See section 5.5 for range details.

lang This attribute holds a xs:language string identifying a certain language. In multi language countries,

this allows to distinguish between spellings in different tongues.
ofl Boolean attribute indicating that the entry is the official name.

old Boolean attribute to mark an old name no longer valid but still commonly used.

Examples for general attribute usage. XML element is introduced in section 5.6:

...
FLORENZ
FIRENZE

...
Siemens AG
Siemens
Siemens Dematic AG
AEG Electrocom

5.4 String parts in , and

Text information is stored as strings in XML elements . Ranges explained in section 5.5 may use and

instead. As the type xs:token defines, only strings without control characters and single spaces as word

separators are allowed. The general attributes are listed here:

join Usually multiple string parts are combined with a word separating space character in between. If this

boolean attribute is true (=1) the actual string part is joined to the preceding string part.

Example: Three times the string “KONSTANZ AM BODENSEE”:
KONSTANZ AM BODENSEE
KONSTANZAMBODENSEE
KONSTANZ AM BODENSEE
opt This boolean attribute allows to mark optional part of strings.
---------------------- Page: 13 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
Examples:
SIEMENSAG
WERNER VONSIEMENS STR

Project specifically additional attributes for string parts could be defined as xs:attributeGroup named

string_projectAttribGroup in the project specific part of the schema.

Note that the recommended way to split strings is the insertion of one whitespace character. Multiple

elements should be used for applying attributes to parts of stings only.
5.5 Ranges in , and

A range is indicated by the attribute rg in the element with one of the values:

“a” All values in the sequence from the first to the last value are included
“e” Even values only
“o” Odd values only

There are two forms. In the first form the string in carries the first and the last value separated by a dash

character (hyphen-minus: '-'). If the dash is missing, the whole value is taken as the first value. If there is

more than one dash, the first one is taken as separator. The string may be split into more than one XML

element. In other words: if the rg attribute is set, the string is interpreted as a range in any case. Due to better

readability and less file size overhead, this form should be preferred, if sufficient.

The second form uses for the first value and for the last value. The rg attribute value in the

version is by default “a” and may be omitted. and may occur more than once, allowing to specify

different attributes to the string parts just like defined for explained in section 5.4. Both the general and

project specific attributes of may be applied also in and . The first (last) range value is always the

concatenation of all () values.

The first, last or both values may be omitted. Missing values will be defaulted by project specific minimal and

maximal values.
The semantics of non numeric ranges have to be clarified project specifically.

Note that ranges are just a short form and the range entries could be enumerated as several children in the

address tree also. See house numbers 1-4 in the examples of section 5.2. Enumerations like this are the

recommended way for sets of entries beyond the simple semantics of the ranges explained here.

Examples:

In the following examples the project specific default minimal value is 1, maximum 999.

Some representations for the complete HNr range 1-999. The two forms and the possibility to omit information

should be illustrated here. Anyhow the first example is recommended to be used as standard notation of such

a range.
1-999
1-
1
-999
-


1999
1999
1
---------------------- Page: 14 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)
999
Note that
....1-999
does not represent a range from 1 trough 999, but only the string "1-999".

In the following examples for non numeric ranges the concrete contents of the ranges have to be clarified

project specifically:
10A-16C
11A-15C
Two times the range 97X-107X, where X is optional.
97X
-107X
97X
107X

The first and last values in the following duration range example are notated as xs:dateTime types containing

dash characters. For this reason the only way to notate it, is the usage of and , since the usage of

and dash would lead to a weird range from “2006” to “09-T00:00:00-2006-09-06T23:59:59”:


2006-09-05T00:00:00
2006-09-06T23:59:59

5.6 Aliases in ,
and

An alias gives a variant or an abbreviation of an address components spelling. There are two kinds:

Local aliases are mentioned directly inside of XML element .

Global aliases are defined in the header of the XML file in elements and are valid for the whole XML

file. Such an alias is applied to address components if the types in and are equal and the

string for is equal to the first or element below . Applying such an alias, leads to

adding all variants and starting with the second one to the fitting address component in .

The usual alias element stores the string in or and elements in the same manner as described

for above. A short form in element holds the string directly and cannot specify attributes on string parts.

Alias attributes are explained in section 5.3, attributes on string parts in section 5.4. Note that the type

attribute tp is not to be mentioned in or , but in the parent element or .

Example:

xsi:noNamespaceSchemaLocation="xml_projectAddressTree.xsd">

0

AY Str
A STREET



AY Str

---------------------- Page: 15 ----------------------
SIST-TS CEN/TS 15873:2009
CEN/TS 15873:2009 (E)

BEE Str
B Str

SEE Str
SEE DRIVE
A BC Str



Another example frequently can be found in holiday regions, where houses tend to have names:


xsi:noNamespaceSchemaLocation="xml_projectAddressTree.xsd">

0



Bergstraße

2
Haus Gletscherblick




5.7 other XML files

All address data in one file may lead to handling problems with one very large file. So a method to split data

into several files using the element is provided here. This element contains the filename of another

XML file. All XML files should be placed into the same directory and pathnames should not be used in the

string of the element. All address data is expected to be reachable by walking recursively through

the include hierarchy, starting with a master file.

The address tree of an included XML file is added as subtree to the tree node where the element is

mentioned in. So the partial address data identified by the root path of the element in the address

tree are not to be repeated in the included file.
The strin
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.