Postal Services - Open Standard Interface - Address Data File Format for OCR/VCS Dictionary Generation

This document defines a file format for the generation of postal address directories. It is designed to hold all information necessary to support address reading software including data required for forwarding applications. In typical postal automation systems these files will be processed by directory generation software which creates application specific loadable data. This data – usually referred to as operational directory – is heavily compressed and contains access tables tailored for the specific reading software.
Not in the scope of this document are topics external to file like compression, checksums, the interface for transmission to the supplier, modification permissions, error handling on inconsistent data and undo in updates.

Postalische Dienstleistungen - Offene Normschnittstelle - Adressdateiformat für die Generierung von Wörterbüchern in OCR/Videocodier-Systemen

1.1   Anwendungsbereich
Das vorliegende Dokument legt ein Dateiformat für die Erzeugung von Postadressverzeichnissen fest. Dieses Dateiformat muss so ausgeführt sein, dass es alle Informationen enthält, die zur Unterstützung der Lesesoftware für Adressen erforderlich sind, einschließlich von Daten für Nachsendungen. In typischen Postautomatisierungssystemen werden diese Dateien von einer Verzeichnis-Generierungssoftware verarbei-tet, die anwendungsspezifische, ladefähige Daten erzeugt. Diese Daten, die gewöhnlich als Betriebs-verzeichnis bezeichnet werden, sind stark komprimiert und enthalten Zugriffstabellen, die auf die spezielle Lesesoftware zugeschnitten sind.
Nicht zum Anwendungsbereich dieses Dokumentes gehören Themen, die Dateien nicht berühren, wie z. B. Komprimierung, Prüfsummen, die Schnittstelle für eine Übertragung zum Lieferanten, Modifikationsrechte, die Fehlerbehandlung von inkonsistenten Daten und Rücknahmefunktionen (Undo) in Aktualisierungen (Updates).
1.2   Zweck
Das Format wurde unter Berücksichtigung folgender Anforderungen entwickelt:
   es muss folgende Daten enthalten:
   Adressen, die sich aus den Adresskomponenten zusammensetzen (einschließlich von Parallel-bezeichnungen (Alias) und Bereichsdaten);
   Name der Person und der Organisation;
   Adresscodes, die gewöhnlich als Sortiercodes verwendet werden;
   Verknüpfungen zwischen Adressen z. B. für die Nachsendung;
                es sollte die Zeichencodierung nicht einschränken;
   es sollte für spezifische Anwendungen leicht benutzerdefinierbar sein;
   es sollte vollständige sowie inkrementelle Aktualisierungen ermöglichen, d. h. nur Änderung der Daten;
   für eine bessere Verarbeitung muss die Möglichkeit bestehen, die Daten in mehrere Dateien aufzuteilen.
Folgende Konzepte liegen diesem Format zugrunde:
   Das Format beruht auf XML.

Services postaux - Interface de standard ouvert - Format de fichiers de données d'adresses pour la génération du dictionnaire OCR/VCS

Poštne storitve - Odprti standardni vmesnik - Datotečni format naslovnih podatkov za generiranje slovarja s pomočjo OCR/VCS (sistem za optično razpoznavanje znakov)

General Information

Status
Published
Publication Date
24-Mar-2009
Current Stage
9093 - Decision to confirm - Review Enquiry
Completion Date
04-Nov-2022

Buy Standard

Technical specification
TS CEN/TS 15873:2009
English language
27 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


SLOVENSKI STANDARD
01-maj-2009
3RãWQHVWRULWYH2GSUWLVWDQGDUGQLYPHVQLN'DWRWHþQLIRUPDWQDVORYQLKSRGDWNRY
]DJHQHULUDQMHVORYDUMDVSRPRþMR2&59&6 VLVWHP]DRSWLþQRUD]SR]QDYDQMH
]QDNRY
Postal Services - Open Standard Interface - Address Data File Format for OCR/VCS
Dictionary Generation
Postalische Dienstleistungen - Offene Normschnittstelle - Adress Datei Format für die
Generierung von Wörterbüchern in OCR/Videocodier-Systemen
Services posteaux - Interface de standard ouvert - Format de fichiers de données
d'adresses pour la génération du dictionnaire OCR/VCS
Ta slovenski standard je istoveten z: CEN/TS 15873:2009
ICS:
03.240 Poštne storitve Postal services
35.240.69 Uporabniške rešitve IT pri IT applications in postal
poštnih storitvah services
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

TECHNICAL SPECIFICATION
CEN/TS 15873
SPÉCIFICATION TECHNIQUE
TECHNISCHE SPEZIFIKATION
March 2009
ICS 03.240; 35.240.60
English Version
Postal Services - Open Standard Interface - Address Data File
Format for OCR/VCS Dictionary Generation
Services postaux - Interface de standard ouvert - Format de Postalische Dienstleistungen - Offene Normschnittstelle -
fichiers de données d'adresses pour la génération du Adressdateiformat für die Generierung von Wörterbüchern
dictionnaire OCR/VCS in OCR/Videocodier-Systemen
This Technical Specification (CEN/TS) was approved by CEN on 1 March 2009 for provisional application.
The period of validity of this CEN/TS is limited initially to three years. After two years the members of CEN will be requested to submit their
comments, particularly on the question whether the CEN/TS can be converted into a European Standard.
CEN members are required to announce the existence of this CEN/TS in the same way as for an EN and to make the CEN/TS available
promptly at national level in an appropriate form. It is permissible to keep conflicting national standards in force (in parallel to the CEN/TS)
until the final decision about the possible conversion of the CEN/TS into an EN is reached.
CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland,
France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal,
Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and United Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION
COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
Management Centre: Avenue Marnix 17, B-1000 Brussels
© 2009 CEN All rights of exploitation in any form and by any means reserved Ref. No. CEN/TS 15873:2009: E
worldwide for CEN national Members.

Contents Page
Foreword .3
1 Introduction .4
2 Scope and purpose.5
2.1 Scope .5
2.2 Purpose .5
3 Related Standards .7
3.1 UPU S42 .7
4 Symbols and Abbreviations .7
5 XML Schema adressTree .8
5.1 ,

and .9
5.2 Address Tree in , and .9
5.3 Attributes for , and . 11
5.4 String parts in , and . 11
5.5 Ranges in , and . 12
5.6 Aliases in ,
and . 13
5.7 other XML files . 14
5.8 Linking addresses via . 15
5.9 Project specific part of the XML schema . 16
6 XML Schema addressDeltaTree . 18
6.1 Joining deltas via and file names . 19
6.2 Update actions , and . 19
7 Miscellaneous . 21

Annex A . 22
A.1 General XML Schema part . 22
A.2 Example for a project specific XML Schema part . 24
A.3 Initial addressTree Example . 25
A.4 Update addressDeltaTree Example. 26
A.5 Updated addressTree Example . 27

Foreword
This document (CEN/TS 15873:2009) has been prepared by Technical Committee CEN/TC 331 “Postal
Services”, the secretariat of which is held by NEN.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. CEN [and/or CENELEC] shall not be held responsible for identifying any or all such patent rights.
According to the CEN/CENELEC Internal Regulations, the national standards organizations of the following
countries are bound to announce this Technical Specification: Austria, Belgium, Bulgaria, Cyprus, Czech
Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia,
Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain,
Sweden, Switzerland and the United Kingdom.
NOTE This document has been prepared by experts from CEN/TC 331 and UPU, in the framework of the Memorandum of
Understanding between UPU and CEN.

1 Introduction
In initial meetings of CEN/TC331/WG3 interfaces which will benefit from standardization have been identified
and agreed on. Candidates for Open Interface standardization are:
 interface between the image handler and automatic address readers or video coding places;
 interface from machine control to Barcode Printers;
 interface from machine control to Barcode Reader / Verifier;
 interface between scanner, image handler and machine control;
 file format of Sort Plan;
 MIS Interface (Statistics);
 file format of Address data files.
The new intended standard deals with the file format of Address Data Files.
OCR results and video coder inputs have to be verified against the “real” existing addresses in order to reach
high recognition rates combined with low error rates. For that purpose postal operators provide postal address
directories to the OCR/VCS suppliers. Usually different postal operators use different file formats for these
(source) directories. In typical postal automation systems these files will be processed by directory generation
software which creates application specific loadable data. This data – usually referred to as “operational
directory” – is heavily compressed and contains access tables tailored for the specific reading software.
Usually different OCR/VCS suppliers use different operational directory formats.
This standard shall define a common Address Data File format for postal address directories to be provided
from the postal operators to the OCR/VCS suppliers.
This Address Data File format shall be designed to hold all information necessary to support address reading
and video coding software including data required for special recognition tasks e.g. forwarding applications.

2 Scope and purpose
2.1 Scope
This document defines a file format for the generation of postal address directories. It is designed to hold all
information necessary to support address reading software including data required for forwarding applications.
In typical postal automation systems these files will be processed by directory generation software which
creates application specific loadable data. This data – usually referred to as operational directory – is heavily
compressed and contains access tables tailored for the specific reading software.
Not in the scope of this document are topics external to file like compression, checksums, the interface for
transmission to the supplier, modification permissions, error handling on inconsistent data and undo in
updates.
2.2 Purpose
The format has been designed with the following requirements in mind:
 must be able to hold the following data:
 addresses composed of address components (including aliases and range-data);
 person and organization names;
 address codes typically used as sort codes;
 links between addresses e.g. for use in forwarding;
 should not restrict character encoding;
 easily customizable for specific applications;
 should allow complete as well as incremental updates, i.e. change-only data;
 it must be possible to split data in multiple files for better handling.

The ideas behind this format are as follows:
 The format is based on XML.
 The basic XML structure is general. Project (the term project is used throughout this document to describe
a specific application such as address data for a specific country or postal organization) specifics are
coded as attributes. This should make it easier to build project independent parsers and tools.
 Address data can be structured hierarchically. An address component appearing in a lot of addresses
shall be written once as parent node in all addresses it is used in the XML address tree.
 Beyond the pure address data, there are general as well as optional project specific attributes on the level
of address components and string parts.
 In favour of faster parser execution and smaller file sizes the names of XML elements appearing very
often are short strings.
 Semantics are defined only in a basic manner and have to be completed in the project specific tailoring
process. E.g. a street without numbers in the data may be interpreted as a street which has no numbers,
or where all numbers are valid. Due to this users must be aware that the interoperability of this Technical
Specification may be limited to be applied to the specific project.
3 Related Standards
3.1 UPU S42
1) UPU S42 is beginning with version -5 a two part standard. Part a contains concepts and the
theoretical language description. Part b contains practical examples from different countries and may
be supplemented with new examples in some future.
2) UPU S42a defines components an address is composed of as well as postal entities which can be
“described” using these address components. The standard goes into great detail in defining a
globally usable set of specific address components such as “postcode”, “door”, .
3) UPU S42b describes how to write an address given its constituting address components. It uses
templates to describe the order, line-breaks, etc. The templates are country specific (US, Brazil,
England, .) and also uses an country specific subset of the globally defined types.
4) UPU S42 address components are assumed to have a type and a string. They do not have
additional attributes and do not have aliases.
5) UPU S42 does not define a format for an individual address == address-component collection and
does not define a format for an address directory == set of addresses.
6) UPU S42 has no concept of sort codes or forwarding information.

UPU S42 will not conflict with the format defined in this document as it targets at a completely different
application and type of information described. The only thing in common with address data are the address-
component definitions themselves. These could be used in customizing the ADF for a specific project. UPU
S42’s excellent glossary should be reused where applicable.
4 Symbols and Abbreviations
XML eXtended Markup Language
ADF Address Data File
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.