ISO/TS 19138:2006
(Main)Geographic information - Data quality measures
Geographic information - Data quality measures
ISO/TS 19138:2006 defines a set of data quality measures. These can be used when reporting data quality for the data quality subelements identified in ISO 19113. Multiple measures are defined for each data quality subelement, and the choice of which to use will depend on the type of data and its intended purpose. The data quality measures are structured so that they can be maintained in a register established in conformance with ISO 19135. ISO/TS 19138:2006 does not attempt to describe every possible data quality measure, only a set of commonly used ones.
Information géographique — Mesures de la qualité des données
Geografske informacije - Kakovostne mere za prostorske podatke
General Information
Relations
Frequently Asked Questions
ISO/TS 19138:2006 is a technical specification published by the International Organization for Standardization (ISO). Its full title is "Geographic information - Data quality measures". This standard covers: ISO/TS 19138:2006 defines a set of data quality measures. These can be used when reporting data quality for the data quality subelements identified in ISO 19113. Multiple measures are defined for each data quality subelement, and the choice of which to use will depend on the type of data and its intended purpose. The data quality measures are structured so that they can be maintained in a register established in conformance with ISO 19135. ISO/TS 19138:2006 does not attempt to describe every possible data quality measure, only a set of commonly used ones.
ISO/TS 19138:2006 defines a set of data quality measures. These can be used when reporting data quality for the data quality subelements identified in ISO 19113. Multiple measures are defined for each data quality subelement, and the choice of which to use will depend on the type of data and its intended purpose. The data quality measures are structured so that they can be maintained in a register established in conformance with ISO 19135. ISO/TS 19138:2006 does not attempt to describe every possible data quality measure, only a set of commonly used ones.
ISO/TS 19138:2006 is classified under the following ICS (International Classification for Standards) categories: 35.240.70 - IT applications in science. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/TS 19138:2006 has the following relationships with other standards: It is inter standard links to ISO 19157:2013. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/TS 19138:2006 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
TECHNICAL ISO/TS
SPECIFICATION 19138
First edition
2006-12-01
Geographic information — Data quality
measures
Information géographique — Mesures de la qualité des données
Reference number
©
ISO 2006
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2006
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2006 – All rights reserved
Contents Page
Foreword. iv
Introduction . v
1 Scope .1
2 Conformance.1
3 Normative references .1
4 Terms and definitions .1
5 Relationships to other standards.2
6 Register.3
7 Components of a data quality measure .3
7.1 List of components.3
7.2 Component details.4
7.3 Standardized data quality measures .6
Annex A (normative) Abstract test suite.7
A.1 Test case identifier: Component test.7
A.2 Test case identifier: Name test.7
A.3 Test case identifier: Data quality element and subelement test.7
A.4 Test case identifier: Data quality basic measure test.8
A.5 Test case identifier: Definition test.8
A.6 Test case identifier: Description test.8
A.7 Test case identifier: Parameter test.8
A.8 Test case identifier: Data quality value type test .8
A.9 Test case identifier: Source reference test .9
A.10 Test case identifier: Example test.9
Annex B (normative) Structure of data quality measures.10
B.1 Components defining a data quality measure.10
B.2 Mapping of the components to ISO 19115 and ISO 19135 .11
B.3 UML-diagram for data quality measure .11
Annex C (normative) Data quality basic measures .14
C.1 Purpose of data quality basic measures.14
C.2 Counting-related data quality basic measures.14
C.3 Uncertainty-related data quality basic measures.15
Annex D (normative) List of data quality measures .19
D.1 Completeness .19
D.2 Logical consistency .23
D.3 Positional accuracy .37
D.4 Temporal accuracy .58
D.5 Thematic accuracy.61
Bibliography .68
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
In other circumstances, particularly when there is an urgent market requirement for such documents, a
technical committee may decide to publish other types of normative document:
⎯ an ISO Publicly Available Specification (ISO/PAS) represents an agreement between technical experts in
an ISO working group and is accepted for publication if it is approved by more than 50 % of the members
of the parent committee casting a vote;
⎯ an ISO Technical Specification (ISO/TS) represents an agreement between the members of a technical
committee and is accepted for publication if it is approved by 2/3 of the members of the committee casting
a vote.
An ISO/PAS or ISO/TS is reviewed after three years in order to decide whether it will be confirmed for a
further three years, revised to become an International Standard, or withdrawn. If the ISO/PAS or ISO/TS is
confirmed, it is reviewed again after a further three years, at which time it must either be transformed into an
International Standard or be withdrawn.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO/TS 19138 was prepared by Technical Committee ISO/TC 211, Geographic information/Geomatics.
iv © ISO 2006 – All rights reserved
Introduction
Knowledge of the quality of geographic data is often crucial for the application of the data, as different users
and different applications often have different data quality requirements. A user of geographic data may have
multiple datasets from which to choose. Therefore, it is necessary to compare the quality of the datasets to
determine which best fulfils the requirements of the user. To facilitate such comparisons, it is essential that the
results of the quality reports are expressed in a comparable way and that there is a common understanding of
the data quality measures that have been used. These data quality measures provide descriptors of the
quality of geographic data through comparison with the universe of discourse. The use of incompatible
measures makes data quality comparisons impossible to perform.
Data quality needs to be reported by the producer and evaluated by the user against his or her requirements
for different criteria and data quality measures. It is essential that reported quality for a dataset contains the
quality measurements that may be of interest to a potential user of the dataset, and that the metrics used to
determine the quality are reported and available to the user.
ISO 19113 establishes the principles for the description of geographic data quality and specifies components
for reporting quality information. Procedures for the evaluation of geographic data quality are described in
ISO 19114.
The objective of this Technical Specification is to guide the producer in choosing the right data quality
measures for data quality reporting, and the user in the evaluation of the usefulness of a dataset by
standardizing the components and structures of data quality measures and by defining commonly used data
quality measures.
TECHNICAL SPECIFICATION ISO/TS 19138:2006(E)
Geographic information — Data quality measures
1 Scope
This Technical Specification defines a set of data quality measures. These can be used when reporting data
quality for the data quality subelements identified in ISO 19113. Multiple measures are defined for each data
quality subelement, and the choice of which to use will depend on the type of data and its intended purpose.
The data quality measures are structured so that they can be maintained in a register established in
conformance with ISO 19135.
This Technical Specification does not attempt to describe every possible data quality measure, only a set of
commonly used ones.
2 Conformance
Any set of data quality measures claiming conformance with this Technical Specification shall pass all of the
conditions specified in the abstract test suite (Annex A).
3 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO/TS 19103:2005, Geographic information — Conceptual schema language
ISO 19113:2002, Geographic information — Quality principles
ISO 19115:2003, Geographic information — Metadata
ISO 19135:2005, Geographic information — Procedures for item registration
4 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
4.1
correctness
correspondence with the universe of discourse
4.2
data quality basic measure
generic data quality measure used as a basis for the creation of specific data quality measures
NOTE Data quality basic measures are abstract data types. They cannot be used directly when reporting data
quality.
4.3
data quality scope
extent or characteristic(s) of the data for which quality information is reported
[ISO 19113]
NOTE A data quality scope for a dataset can comprise a dataset series to which the dataset belongs, the dataset
itself, or a smaller grouping of data located physically within the dataset sharing common characteristics. Common
characteristics can be an identified feature type, feature attribute, or feature relationship; data collection criteria; original
source; or a specified geographic or temporal extent.
4.4
error
discrepancy with the universe of discourse
4.5
measurand
particular quantity subject to measurement
[International Vocabulary of Basic and General Terms in Metrology (VIM)]
4.6
universe of discourse
view of the real or hypothetical world that includes everything of interest
[ISO 19101]
5 Relationships to other standards
ISO 19113 describes relevant data quality elements and their corresponding data quality subelements and it
indicates how quality should be reported. ISO 19114 describes procedures for the evaluation of quantitative
quality. ISO 19115 contains elements and classes for data quality reporting within the UML models and data
dictionaries.
ISO 19113 specifies a set of descriptors for a data quality subelement, for use in recording data quality. One
of these descriptors is the data quality measure. A data quality measure is described by the components listed
in 7.1.
Table 1 provides a list of data quality elements and data quality subelements as defined in ISO 19113.
2 © ISO 2006 – All rights reserved
Table 1 — Data quality elements and data quality subelements with definitions (ISO 19113)
Data quality element Data quality subelement Definition
completeness commission excess data present in a dataset
omission data absent from a dataset
logical consistency conceptual consistency adherence to rules of the conceptual schema
domain consistency adherence of values to the value domains
format consistency degree to which data is stored in accordance
with the physical structure of the dataset
topological consistency correctness of the explicitly encoded
topological characteristics of a dataset
positional accuracy absolute or external accuracy closeness of reported coordinate values to
values accepted as or being true
relative or internal accuracy closeness of the relative positions of features in
a dataset to their respective relative positions
accepted as or being true
gridded data position accuracy closeness of gridded data position values to
values accepted as or being true
temporal accuracy accuracy of a time measurement correctness of the temporal references of an
item (reporting of error in time measurement)
temporal consistency correctness of ordered events or sequences, if
reported
temporal validity validity of data with respect to time
thematic accuracy classification correctness comparison of the classes assigned to features
or their attributes to a universe of discourse
(e.g. ground truth or reference dataset)
non-quantitative attribute correctness correctness of non-quantitative attribute
quantitative attribute accuracy accuracy of quantitative attributes
6 Register
A register of data quality measures shall contain a set of data quality measures, described using the
components listed in 7.1. The registration procedures shall be performed according to ISO 19135.
Annex D of this Technical Specification contains the list of standardized data quality measures. A register
shall contain these data quality measures and may also contain additional data quality measures submitted
through the procedures defined within ISO 19135. The registration process also allows retiring data quality
measures.
7 Components of a data quality measure
7.1 List of components
Each data quality measure shall be described using the following technical components:
⎯ name (7.2.1)
⎯ alias (7.2.2)
⎯ data quality element (7.2.3)
⎯ data quality subelement (7.2.4)
⎯ data quality basic measure (7.2.5)
⎯ definition (7.2.6)
⎯ description (7.2.7)
⎯ parameter (7.2.8)
⎯ data quality value type (7.2.9)
⎯ data quality value structure (7.2.10)
⎯ source reference (7.2.11)
⎯ example (7.2.12)
⎯ identifier (7.2.13)
7.2 Component details
7.2.1 Name
Name refers to the name of the data quality measure.
If the data quality measure already has a commonly used name, this name should be used. If no name exists,
a name shall be chosen that reflects the nature of the measure.
NOTE The component name is specified in the base standard for registers, ISO 19135.
7.2.2 Alias
Alias refers to other recognized name for the same data quality measure. It may be a different commonly used
name, or an abbreviation or a short name.
More than one alias may be provided.
7.2.3 Data quality element
Data quality element refers to the name of the data quality element to which this data quality measure applies.
NOTE A list of data quality elements is provided in Table 1.
7.2.4 Data quality subelement
Data quality subelement refers to the name of the data quality subelement to which this data quality measure
applies.
NOTE A list of data quality subelements is provided in Table 1.
7.2.5 Data quality basic measure
Each data quality basic measure is described by its name, definition and value type. Data quality basic
measures are identified by their names.
4 © ISO 2006 – All rights reserved
A variety of data quality measures are based on counting of erroneous items. There are also several data
quality measures dealing with the uncertainty of numerical values. In order to avoid repetition, all possible
methods of constructing counting-related data quality measures as well as general statistical measures for
one- and two-dimensional random variables shall be defined in terms of data quality basic measures.
The data quality basic measures are defined in Annex C.
If a data quality measure is based on one of the set of data quality basic measures, the name of the data
quality basic measure shall be provided in the field data quality basic measure. If the data quality measure is
not based on a data quality basic measure, it shall be indicated in this field that a data quality basic measure
is not applicable. The data quality basic measures shall also be used as appropriate for creating new data
quality measures, for instance for reporting unclosed surface patches or other application-dependent data
quality measures.
7.2.6 Definition
Definition states the fundamental concept of the data quality measure.
If the data quality measure is derived from a data quality basic measure, the definition is based on the data
quality basic measure definition and specialized for this data quality measure.
NOTE The component definition is specified in the base standard for registers, ISO 19135.
7.2.7 Description
Description refers to the description of the data quality measure including methods of calculation, with all
formulae and/or illustrations needed to establish the result of applying the measure.
If the data quality measure uses the concept of errors, it shall be stated how an item shall be classified as
incorrect.
NOTE The component description is specified in the base standard for registers, ISO 19135.
7.2.8 Parameter
Parameter refers to an auxiliary variable used by the data quality measure. It shall include name, definition
and description.
More than one parameter may be provided.
7.2.9 Data quality value type
Data quality value type refers to the value type for reporting a data quality result.
A data quality value type shall be provided for a data quality result. The data types defined in ISO/TS 19103
shall be used when appropriate.
Table 2 — Examples of data quality value types
Boolean
Real
Integer
Ratio (numerator of type integer : denominator of type integer)
Percentage
Measure(s) [value(s) + unit(s)]
7.2.10 Data quality value structure
Data quality value structure gives the structure for reporting a complex data quality result.
A data quality result may consist of multiple values. In this case, the data quality result shall be structured
using the data quality value structures as given in Table 3. The structure may consist of homogeneous or
heterogeneous data quality value types. The possible data quality value types are given in 7.2.9.
Table 3 — Data quality value structures
Bag
Set
Sequence
Table
Matrix
Coverage
NOTE The values within a structure can be multiple. For example, the covariance matrix as given in Table D.32 is
reported as matrix of measure, where the matrix elements may have different units of measure. A list may consist of
different data quality value types.
7.2.11 Source reference
Source reference gives the citation of the source of the data quality measure.
When a data quality measure for which additional information is provided in an external source is added to the
list of standardized data quality measures, a reference to that source may be provided here.
NOTE The component source reference is specified in the base standard for registers, ISO 19135.
7.2.12 Example
Example may provide examples of applying the data quality measure or the result obtained for the data quality
measure.
More than one example may be provided.
7.2.13 Identifier
Identifier consists of an integer number that uniquely identifies a data quality measure.
If data quality measures are administered in a register, then identifiers may only be assigned by the register
manager.
NOTE The component identifier is specified in the base standard for registers, ISO 19135.
7.2.14 Obligation of the above-listed components
Some of the components are mandatory, others are conditional or optional. Table B.1 provides further
information on the obligation of each technical component.
7.3 Standardized data quality measures
In order to make data quality related metadata and data quality reports comparable, standardized data quality
measures shall be used in evaluating and reporting data quality, where appropriate. Annex D gives a list of
commonly used data quality measures with all required components for data quality measures as specified in
this Technical Specification.
6 © ISO 2006 – All rights reserved
Annex A
(normative)
Abstract test suite
A.1 Test case identifier: Component test
a) Test purpose: to determine conformance by ensuring that all necessary components of a data quality
measure are provided.
b) Test method: examine the entry for the data quality measure and verify that the components have been
provided as required by Table B.1.
c) Reference: 7.2 and Annex B.
d) Test type: Capability.
A.2 Test case identifier: Name test
a) Test purpose: to determine if a distinct name for the data quality measure is used.
b) Test method: determine if the name for the data quality measure is distinct from other measures with
different concepts, and if the name is not in conflict with other data quality basic measures, their
definitions and descriptions.
c) Reference: 7.2.1.
d) Test type: Capability.
A.3 Test case identifier: Data quality element and subelement test
a) Test purpose: to determine
⎯ if data quality element and subelement are assigned;
⎯ if they are taken from the list of data quality elements and subelements in ISO 19113 or if they are an
additional data quality element and subelement created in conformance with the rules of ISO 19113;
⎯ if the data quality measure is relevant for the given data quality element and subelement.
b) Test method: check if proper values are assigned to the data quality element and subelement
components and if the data quality measure has bearing on these.
c) Reference: 7.2.3 and 7.2.4.
d) Test type: Capability.
A.4 Test case identifier: Data quality basic measure test
a) Test purpose: to determine if a data quality measure is properly derived from a data quality basic
measure.
b) Test method: check if an appropriate data quality basic measure for the data quality measure exists and,
if it does, that the data quality measure is utilizing this data quality basic measure in conformance with
this Technical Specification.
c) Reference: 7.2.5.
d) Test type: Capability.
A.5 Test case identifier: Definition test
a) Test purpose: to determine if a fitting, correct and complete definition is provided.
b) Test method: check that the given definition contains no ambiguities and that it is in conformance with
characteristics of a definition as stated in ISO 19135:2005, 7.3.1.
c) Reference: 7.2.6 and ISO 19135:2005, 7.3.1.
d) Test type: Capability.
A.6 Test case identifier: Description test
a) Test purpose: to determine if an exhaustive description is provided.
b) Test method: check if the description contains a comprehensive explanation with all required formulae to
facilitate the application of the data quality measure.
c) Reference: 7.2.7.
d) Test type: Capability.
A.7 Test case identifier: Parameter test
a) Test purpose: to determine if required parameters are provided.
b) Test method: check if all parameters occurring in the description are provided in the parameter
component.
c) Reference: 7.2.8.
d) Test type: Capability.
A.8 Test case identifier: Data quality value type test
a) Test purpose: to determine if a proper data quality value type is provided.
b) Test method: check if the provided data quality value type is included in the list in Table 3.
c) Reference: 7.2.9.
d) Test type: Capability.
8 © ISO 2006 – All rights reserved
A.9 Test case identifier: Source reference test
a) Test purpose: to determine if a proper source reference is provided.
b) Test method: check if the cited reference source exists and if it reflects the concept of the provided data
quality measure.
c) Reference: 7.2.11.
d) Test type: Capability.
A.10 Test case identifier: Example test
a) Test purpose: to determine if the example, if provided, is a valid example for the data quality measure.
b) Test method: check if the example is free of errors and if it is representative of the usage of the data
quality measure.
c) Reference: 7.2.12.
d) Test type: Capability.
Annex B
(normative)
Structure of data quality measures
B.1 Components defining a data quality measure
Table B.1 shall be used for the technical specification of every data quality measure. The descriptor for
obligation/condition may have the following values: M (mandatory), C (conditional), or O (optional).
Table B.1 — Components defining a data quality measure
Line Component Description Obligation/condition
1 Name Name of the data quality measure applied to the M
data
a
2 Alias Another recognized name, an abbreviation or a O
short name for the same data quality measure
3 Data quality element Name of the data quality element for which M
quality is reported
4 Data quality subelement Name of the data quality subelement for which M
quality is reported
5 Data quality basic Name of the data quality basic measure from C/if derived from basic measure
measure which the data quality measure is derived
6 Definition Definition of the fundamental concept for the M
data quality measure
7 Description Description of the data quality measure, C/if the definition is not sufficient
including all formulae and/or illustrations needed for the understanding of the data
to establish the result of applying the measure quality measure concept
a
8 Parameter Auxiliary variable used by the data quality C/if required
measure, including its name, definition and
optionally its description
a
9 Data quality value type Value type for reporting a data quality result M
10 Data quality value Structure for reporting a complex data quality O
structure result
a
11 Source reference Reference to the source of an item that has been C/if an external source exists
adopted from an external source
a
12 Example Illustration of the use of a data quality measure O
13 Identifier Integer number, uniquely identifying a data C/if data quality measures are
quality measure administered in a register
a
Multiple entries are allowed. When values for the optional or conditional elements are not present, this should be indicated by
assigning the character “—” to the appropriate component.
10 © ISO 2006 – All rights reserved
B.2 Mapping of the components to ISO 19115 and ISO 19135
Table B.2 — Mapping of the components to ISO 19115 and ISO 19135
Line Component ISO 19115 element name ISO 19135 element name
1 Name nameOfMeasure name
2 Alias – alternativeExpressions
3 Data quality element DQ_Element –
4 Data quality subelement lines 108-127 [B.2.4.3 Data quality –
element information]
5 Data quality basic measure – –
6 Definition – definition
7 Description measureDescription description
8 Parameter – –
9 Data quality value type – –
10 Data quality value structure – –
11 Source reference – source
12 Example – –
13 Identifier measureIdentification itemIdentifier
B.3 UML-diagram for data quality measure
Figure B.1 defines the components for data quality measures and Figure B.2 defines the relationship of data
quality measures and registered items from ISO 19135. Both figures are in UML notation.
The UML models describe the content model, if a register for data quality measures is implemented.
The class RE_RegisteredItem is defined in ISO 19135.
Figure B.1 — Data quality measure
12 © ISO 2006 – All rights reserved
Figure B.2 — Relationship between registered item of ISO 19135 and data quality measure
Annex C
(normative)
Data quality basic measures
C.1 Purpose of data quality basic measures
The concept of data quality basic measures is introduced in this Technical Specification to avoid the repetitive
definition of the same concept. There are data quality measures that have certain communalities. For
example, the counting-related data quality measures are dealing with the concept of counting errors. The
number of errors may be used to construct different kind of data quality measures. The concept of
constructing these data quality measures is defined for the generic data quality basic measures and shall be
used for the creation of data quality measures that share these communalities.
Counting- and uncertainty-related data quality measures can be identified. Therefore two principle categories
of data quality basic measures are listed in this annex. The counting-related data quality basic measures are
based on the concept of counting errors or correct items. The uncertainty-related data quality basic measures
are based on the concept of modelling the uncertainty of measurements with statistical methods. The
measured quantity can be embedded in different dimensions. Depending on the dimension of the measured
quantity, different types of data quality basic measures shall be used to construct data quality measures.
Annex D uses the data quality basic measures of Annex C where appropriate. When appropriate, the
construction of new data quality measures shall be derived from one of the following data quality basic
measures.
C.2 Counting-related data quality basic measures
The data quality basic measures based on different methods of counting errors or counting the number of
correct values are listed in Table C.1.
Table C.1 — Data quality basic measures for counting-related data quality measures
Data quality basic
Data quality basic measure definition Example Data quality value type
measure name
Error indicator Indicator that an item is in error False Boolean (if the value is true the item is
not correct)
Correctness indicator Indicator that an item is not in error True Boolean (if the value is true the item is
correct)
Error count 11 Integer
Total number of items that are subject to
an error of a specified type
Correct items count Total number of items that are free of 571 Integer
errors of a specified type
Error rate Number of the erroneous items with 0,0189 Error rate can either be presented as
respect to the total number of items that 1,89% real, percentage or as ratio
should have been present 11,582
Correct items rate Number of the correct items with respect 0,9811 Correct items rate can either be
to the total number of items that should 98,11% presented as real, percentage or as ratio
have been present 571:582
NOTE 1 Number of items is defined using number of items in the universe of discourse for the dataset specified by
data quality scope.
EXAMPLE Use number of items found in the real world or reference dataset.
NOTE 2 A list of data quality value types is provided in Table 2 (see 7.2.9).
14 © ISO 2006 – All rights reserved
C.3 Uncertainty-related data quality basic measures
C.3.1 General
Numerical values that are obtained by some kind of measuring procedure can only be observed to a certain
accuracy. By treating the measured quantity (measurand) as random variable, this uncertainty can be
quantified. The different ways of describing uncertainty with statistical methods are used for the definition of
uncertainty-related data quality basic measures.
The statistical methods used for the definition of uncertainty-related data quality measures are based on
certain assumptions:
⎯ uncertainties are homogeneous for all observed values;
⎯ the observed values are not correlated;
⎯ the observed values have normal distribution.
C.3.2 One-dimensional random variable, Ζ
For a continuous measurand (i.e. the value domain of the measured quantities is the real numbers), it is
impossible to give the probability of a single value to be the true value. But it is possible to give the probability
for the true value to be within a certain interval. This interval is called the confidence interval. It is given by the
probability P of the true value being between the lower and the upper limit. This probability P is also called the
significance level.
P lower limituutrue value upper limit = P
()
If the standard deviation σ is known, the limits are given by the quantiles u of the normal (Gaussian)
distribution Pz()−⋅uσσuutrue value z+u⋅ = P .
tt
Table C.2 — Relation between the quantiles of the normal distribution and the significance level
Data quality basic Data quality value
Probability P Quantile Name
measure type
P = 68,3 % u = 1 u ⋅σ LE68.3 measure
68,%3 68,%3 Z
P = 50 % u = 0,6745 u ⋅σ LE50 measure
50% 50% Z
P = 90 % u = 1,645 u ⋅σ LE90 measure
90% 90% Z
P = 95 % = 1,960 LE95 measure
u u ⋅σ
95% 95% Z
P = 99 % u = 2,576 u ⋅σ LE99 measure
99% 99% Z
P = 99,8 % u = 3 u ⋅σ LE99.8 measure
99,%8 99,%8 Z
If the standard deviation σ is unknown, but the one-dimensional random variable Ζ is measured redundantly
by Ν independent observations, it is possible to estimate the standard deviation from the observations.
th
z represents the i measurement for the value. If the true value z for Ζ is known, the standard deviation
mi
t
can be estimated by
N
sz=−()z
Zm∑it
r
i=1
with redundancy r being the number of observations r = N. If the true value is unknown, it may be estimated
N
as the arithmetic mean of the observations z = z .
tm∑i
i=1
The standard deviation may then be estimated using the same formula, with r = N − 1.
If the standard deviation is estimated by redundant measurements, the confidence interval can be derived
from the Student’s t-distribution with parameter r:
Pt−⋅suuZ−z t⋅s = P with ()Z−zs/ ~t(r)
()
zt z tz
Table C.3 — Relation between the quantiles of the Student’s t-distribution and the significance level
for different redundancies r
Quantile Quantile Quantile Quantile Quantile Quantile
Probability P
for r = 10 for r = 5 for r = 4 for r = 3 for r = 2 for r = 1
P = 50 % t = 1,221 t = 1,301 t = 1,344 t = 1,423 t = 1,604 t = 2,414
t = 1,524 t = 1,657 t = 1,731 t = 1,868 t = 2,203 t = 3,933
P = 68,3 %
P = 90 % t = 2,228 t = 2,571 t = 2,776 t = 3,182 t = 4,303 t = 12,706
P = 95 % t = 2,634 t = 3,163 t = 3,495 t = 4,177 t = 6,205 t = 25,452
P = 99 % t = 3,581 t = 4,773 t = 5,598 t = 7,453 t = 14,089 t = 127,321
t = 4,587 t = 6,869 t = 8,610 t = 12,924 t = 31,599 t = 636,619
P = 99,8 %
Table C.4 — Data quality basic measures for different probabilities P of a one-dimensional quantity,
where the standard deviation is estimated from redundant measurements
Probability P Data quality basic measure Name Data quality value type
tr()⋅s
P = 50,0 % LE50(r) measure
50% Z
tr()⋅s
P = 68,3 % LE68.3(r) measure
68,%3 Z
tr()⋅s
P = 90,0 % LE90(r) measure
90% Z
tr()⋅s
P = 95,0 % LE95(r) measure
95% Z
tr()⋅s
P = 99,0 % LE99(r) measure
99% Z
tr()⋅s
P = 99,8 % LE99.8(r) measure
99,%8 Z
NOTE The values of t for a number of redundancies r can be obtained from Table C.3.
The data quality basic measures for the uncertainty of one-dimensional quantities are given in Tables C.2 and
C.4. They both aim to measure the uncertainty by giving the upper and lower limit of a confidence interval.
The difference is in how the standard deviation is obtained. If it is known a priori, then Table C.2 is relevant. If
the standard deviation is estimated from redundant measurements, then Table C.4 in conjunction with
Table C.3 is relevant.
16 © ISO 2006 – All rights reserved
C.3.3 Two-dimensional random variable Χ and Υ
The case of the one-dimensional random variable Ζ can be expanded to two dimensions where the
measurand is always observed by two values. The measurand is given by the tuple Χ, Υ. This has the same
assumptions as in the case of the one-dimensional random variable.
The observations are x and y . The equivalence of the confidence interval in one dimension is the
mi mi
confidence area, which is usually described as a circle around the best estimation for the true value. The
probability for the true value to lie in this area is calculated by area integration over the two-dimensional
density function of the normal distribution. A circular area is characterized by its radius. This radius, R, is used
as measure for the accuracy of two-dimensional random variables:
⎛⎞
()xx−−(y y)
1 tt
⎜⎟
−+
22 2
⎜⎟
σσ
⎝⎠XY
PR(,σσ, )= edxdy
XY
∫∫
2πσσ
XY
()xx−+(y−y)=R
tt
For some particular probabilities, the radius can be calculated depending on the standard deviations
σ and σ .
x y
Table C.5 — Relationship between the probability P and the corresponding radius of the circular area
Data quality basic Data quality value
Probability P Name
measure type
P = 39,4 % CE39.4 measure
σ +σ
xy
11, 774
P = 50 % CE50 measure
σ +σ
xy
2,146
P = 90 % CE90 measure
σ +σ
xy
2,4477
P = 95 % CE95 measure
σ +σ
xy
35,
P = 99,8 % CE99.8 measure
σ +σ
xy
C.3.4 Three-dimensional random variable Χ, Υ, Ζ
The case of the one-dimensional random variable Ζ can be expanded to three dimensions where the
measurand is always observed by three values. The measurand is given by the tuple Χ, Υ, Ζ. They underlay
the same assumptions as in the case of the one-dimensional random variable.
The observations are x , y and z . The equivalence of the confidence interval in one dimension is the
mi mi mi
confidence volume, which is usually described as a sphere around the best estimation for the true value. The
probability for the true value to lie in this volume is calculated by volume integration over the three-
dimensional density function of the normal distribution. A spherical volume is characterized by its radius. This
radius is used as measure for the accuracy of three-dimensional random variables.
Table C.6 — Relationship between the probability P and the corresponding radius
of the spherical volume
Data quality basic Data quality value
Probability P Name
measure type
spherical error probable
05, 1⋅+σ σσ+
P = 50 % measure
()xy z
(SEP)
mean radial spherical error
22 2
P = 61 % measure
σ++σσ
xy z
(MRSE)
90 % spherical accuracy
0,833⋅+σ σσ+
P = 90 % measure
()xy z
standard
99 % spherical accuracy
1,122⋅+σ σσ+
P = 99 % measure
()
xy z
standard
18 © ISO 2006 – All rights reserved
Annex D
(normative)
List of data quality measures
D.1 Completeness
D.1.1 Overview
This annex defines data quality measures. In order to achieve well defined and comparable quality
information, it is strongly recommended to carry out the evaluation and reporting of data quality using these
data quality measures. Due to the nature of quality and geospatial information, this list cannot be complete.
Therefore, there may be cases where the user of this Technical Specification has to come up with user-
defined data quality measures. In cases where user-defined data quality measures are related to error counts
or to uncertainty, they shall be defined using the data quality basic measures as provided in Annex C. In any
case, a data quality measure shall be defined using the structure as given in Annex B.
D.1.2 Commission
The data quality measures for the data quality subelement commission are provided in Tables D.1 to D.4.
Table D.1 — Excess item
Line Component Description
1 Name excess item
2 Alias –
3 Data quality element completeness
4 Data quality subelement commission
5 Data quality basic measure error indicator
6 Definition indication that an item is incorrectly present in the data
7 Description –
8 Parameter –
9 Data quality value type Boolean (true indicates that the item is in excess)
10 Data quality value structure –
11 Source
...
SLOVENSKI STANDARD
01-september-2009
Geografske informacije - Kakovostne mere za prostorske podatke
Geographic information - Data quality measures
Information géographique - Mesures de la qualité des données
Ta slovenski standard je istoveten z: ISO/TS 19138:2006
ICS:
35.240.70 Uporabniške rešitve IT v IT applications in science
znanosti
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
TECHNICAL ISO/TS
SPECIFICATION 19138
First edition
2006-12-01
Geographic information — Data quality
measures
Information géographique — Mesures de la qualité des données
Reference number
©
ISO 2006
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2006
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2006 – All rights reserved
Contents Page
Foreword. iv
Introduction . v
1 Scope .1
2 Conformance.1
3 Normative references .1
4 Terms and definitions .1
5 Relationships to other standards.2
6 Register.3
7 Components of a data quality measure .3
7.1 List of components.3
7.2 Component details.4
7.3 Standardized data quality measures .6
Annex A (normative) Abstract test suite.7
A.1 Test case identifier: Component test.7
A.2 Test case identifier: Name test.7
A.3 Test case identifier: Data quality element and subelement test.7
A.4 Test case identifier: Data quality basic measure test.8
A.5 Test case identifier: Definition test.8
A.6 Test case identifier: Description test.8
A.7 Test case identifier: Parameter test.8
A.8 Test case identifier: Data quality value type test .8
A.9 Test case identifier: Source reference test .9
A.10 Test case identifier: Example test.9
Annex B (normative) Structure of data quality measures.10
B.1 Components defining a data quality measure.10
B.2 Mapping of the components to ISO 19115 and ISO 19135 .11
B.3 UML-diagram for data quality measure .11
Annex C (normative) Data quality basic measures .14
C.1 Purpose of data quality basic measures.14
C.2 Counting-related data quality basic measures.14
C.3 Uncertainty-related data quality basic measures.15
Annex D (normative) List of data quality measures .19
D.1 Completeness .19
D.2 Logical consistency .23
D.3 Positional accuracy .37
D.4 Temporal accuracy .58
D.5 Thematic accuracy.61
Bibliography .68
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
In other circumstances, particularly when there is an urgent market requirement for such documents, a
technical committee may decide to publish other types of normative document:
⎯ an ISO Publicly Available Specification (ISO/PAS) represents an agreement between technical experts in
an ISO working group and is accepted for publication if it is approved by more than 50 % of the members
of the parent committee casting a vote;
⎯ an ISO Technical Specification (ISO/TS) represents an agreement between the members of a technical
committee and is accepted for publication if it is approved by 2/3 of the members of the committee casting
a vote.
An ISO/PAS or ISO/TS is reviewed after three years in order to decide whether it will be confirmed for a
further three years, revised to become an International Standard, or withdrawn. If the ISO/PAS or ISO/TS is
confirmed, it is reviewed again after a further three years, at which time it must either be transformed into an
International Standard or be withdrawn.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO/TS 19138 was prepared by Technical Committee ISO/TC 211, Geographic information/Geomatics.
iv © ISO 2006 – All rights reserved
Introduction
Knowledge of the quality of geographic data is often crucial for the application of the data, as different users
and different applications often have different data quality requirements. A user of geographic data may have
multiple datasets from which to choose. Therefore, it is necessary to compare the quality of the datasets to
determine which best fulfils the requirements of the user. To facilitate such comparisons, it is essential that the
results of the quality reports are expressed in a comparable way and that there is a common understanding of
the data quality measures that have been used. These data quality measures provide descriptors of the
quality of geographic data through comparison with the universe of discourse. The use of incompatible
measures makes data quality comparisons impossible to perform.
Data quality needs to be reported by the producer and evaluated by the user against his or her requirements
for different criteria and data quality measures. It is essential that reported quality for a dataset contains the
quality measurements that may be of interest to a potential user of the dataset, and that the metrics used to
determine the quality are reported and available to the user.
ISO 19113 establishes the principles for the description of geographic data quality and specifies components
for reporting quality information. Procedures for the evaluation of geographic data quality are described in
ISO 19114.
The objective of this Technical Specification is to guide the producer in choosing the right data quality
measures for data quality reporting, and the user in the evaluation of the usefulness of a dataset by
standardizing the components and structures of data quality measures and by defining commonly used data
quality measures.
TECHNICAL SPECIFICATION ISO/TS 19138:2006(E)
Geographic information — Data quality measures
1 Scope
This Technical Specification defines a set of data quality measures. These can be used when reporting data
quality for the data quality subelements identified in ISO 19113. Multiple measures are defined for each data
quality subelement, and the choice of which to use will depend on the type of data and its intended purpose.
The data quality measures are structured so that they can be maintained in a register established in
conformance with ISO 19135.
This Technical Specification does not attempt to describe every possible data quality measure, only a set of
commonly used ones.
2 Conformance
Any set of data quality measures claiming conformance with this Technical Specification shall pass all of the
conditions specified in the abstract test suite (Annex A).
3 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO/TS 19103:2005, Geographic information — Conceptual schema language
ISO 19113:2002, Geographic information — Quality principles
ISO 19115:2003, Geographic information — Metadata
ISO 19135:2005, Geographic information — Procedures for item registration
4 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
4.1
correctness
correspondence with the universe of discourse
4.2
data quality basic measure
generic data quality measure used as a basis for the creation of specific data quality measures
NOTE Data quality basic measures are abstract data types. They cannot be used directly when reporting data
quality.
4.3
data quality scope
extent or characteristic(s) of the data for which quality information is reported
[ISO 19113]
NOTE A data quality scope for a dataset can comprise a dataset series to which the dataset belongs, the dataset
itself, or a smaller grouping of data located physically within the dataset sharing common characteristics. Common
characteristics can be an identified feature type, feature attribute, or feature relationship; data collection criteria; original
source; or a specified geographic or temporal extent.
4.4
error
discrepancy with the universe of discourse
4.5
measurand
particular quantity subject to measurement
[International Vocabulary of Basic and General Terms in Metrology (VIM)]
4.6
universe of discourse
view of the real or hypothetical world that includes everything of interest
[ISO 19101]
5 Relationships to other standards
ISO 19113 describes relevant data quality elements and their corresponding data quality subelements and it
indicates how quality should be reported. ISO 19114 describes procedures for the evaluation of quantitative
quality. ISO 19115 contains elements and classes for data quality reporting within the UML models and data
dictionaries.
ISO 19113 specifies a set of descriptors for a data quality subelement, for use in recording data quality. One
of these descriptors is the data quality measure. A data quality measure is described by the components listed
in 7.1.
Table 1 provides a list of data quality elements and data quality subelements as defined in ISO 19113.
2 © ISO 2006 – All rights reserved
Table 1 — Data quality elements and data quality subelements with definitions (ISO 19113)
Data quality element Data quality subelement Definition
completeness commission excess data present in a dataset
omission data absent from a dataset
logical consistency conceptual consistency adherence to rules of the conceptual schema
domain consistency adherence of values to the value domains
format consistency degree to which data is stored in accordance
with the physical structure of the dataset
topological consistency correctness of the explicitly encoded
topological characteristics of a dataset
positional accuracy absolute or external accuracy closeness of reported coordinate values to
values accepted as or being true
relative or internal accuracy closeness of the relative positions of features in
a dataset to their respective relative positions
accepted as or being true
gridded data position accuracy closeness of gridded data position values to
values accepted as or being true
temporal accuracy accuracy of a time measurement correctness of the temporal references of an
item (reporting of error in time measurement)
temporal consistency correctness of ordered events or sequences, if
reported
temporal validity validity of data with respect to time
thematic accuracy classification correctness comparison of the classes assigned to features
or their attributes to a universe of discourse
(e.g. ground truth or reference dataset)
non-quantitative attribute correctness correctness of non-quantitative attribute
quantitative attribute accuracy accuracy of quantitative attributes
6 Register
A register of data quality measures shall contain a set of data quality measures, described using the
components listed in 7.1. The registration procedures shall be performed according to ISO 19135.
Annex D of this Technical Specification contains the list of standardized data quality measures. A register
shall contain these data quality measures and may also contain additional data quality measures submitted
through the procedures defined within ISO 19135. The registration process also allows retiring data quality
measures.
7 Components of a data quality measure
7.1 List of components
Each data quality measure shall be described using the following technical components:
⎯ name (7.2.1)
⎯ alias (7.2.2)
⎯ data quality element (7.2.3)
⎯ data quality subelement (7.2.4)
⎯ data quality basic measure (7.2.5)
⎯ definition (7.2.6)
⎯ description (7.2.7)
⎯ parameter (7.2.8)
⎯ data quality value type (7.2.9)
⎯ data quality value structure (7.2.10)
⎯ source reference (7.2.11)
⎯ example (7.2.12)
⎯ identifier (7.2.13)
7.2 Component details
7.2.1 Name
Name refers to the name of the data quality measure.
If the data quality measure already has a commonly used name, this name should be used. If no name exists,
a name shall be chosen that reflects the nature of the measure.
NOTE The component name is specified in the base standard for registers, ISO 19135.
7.2.2 Alias
Alias refers to other recognized name for the same data quality measure. It may be a different commonly used
name, or an abbreviation or a short name.
More than one alias may be provided.
7.2.3 Data quality element
Data quality element refers to the name of the data quality element to which this data quality measure applies.
NOTE A list of data quality elements is provided in Table 1.
7.2.4 Data quality subelement
Data quality subelement refers to the name of the data quality subelement to which this data quality measure
applies.
NOTE A list of data quality subelements is provided in Table 1.
7.2.5 Data quality basic measure
Each data quality basic measure is described by its name, definition and value type. Data quality basic
measures are identified by their names.
4 © ISO 2006 – All rights reserved
A variety of data quality measures are based on counting of erroneous items. There are also several data
quality measures dealing with the uncertainty of numerical values. In order to avoid repetition, all possible
methods of constructing counting-related data quality measures as well as general statistical measures for
one- and two-dimensional random variables shall be defined in terms of data quality basic measures.
The data quality basic measures are defined in Annex C.
If a data quality measure is based on one of the set of data quality basic measures, the name of the data
quality basic measure shall be provided in the field data quality basic measure. If the data quality measure is
not based on a data quality basic measure, it shall be indicated in this field that a data quality basic measure
is not applicable. The data quality basic measures shall also be used as appropriate for creating new data
quality measures, for instance for reporting unclosed surface patches or other application-dependent data
quality measures.
7.2.6 Definition
Definition states the fundamental concept of the data quality measure.
If the data quality measure is derived from a data quality basic measure, the definition is based on the data
quality basic measure definition and specialized for this data quality measure.
NOTE The component definition is specified in the base standard for registers, ISO 19135.
7.2.7 Description
Description refers to the description of the data quality measure including methods of calculation, with all
formulae and/or illustrations needed to establish the result of applying the measure.
If the data quality measure uses the concept of errors, it shall be stated how an item shall be classified as
incorrect.
NOTE The component description is specified in the base standard for registers, ISO 19135.
7.2.8 Parameter
Parameter refers to an auxiliary variable used by the data quality measure. It shall include name, definition
and description.
More than one parameter may be provided.
7.2.9 Data quality value type
Data quality value type refers to the value type for reporting a data quality result.
A data quality value type shall be provided for a data quality result. The data types defined in ISO/TS 19103
shall be used when appropriate.
Table 2 — Examples of data quality value types
Boolean
Real
Integer
Ratio (numerator of type integer : denominator of type integer)
Percentage
Measure(s) [value(s) + unit(s)]
7.2.10 Data quality value structure
Data quality value structure gives the structure for reporting a complex data quality result.
A data quality result may consist of multiple values. In this case, the data quality result shall be structured
using the data quality value structures as given in Table 3. The structure may consist of homogeneous or
heterogeneous data quality value types. The possible data quality value types are given in 7.2.9.
Table 3 — Data quality value structures
Bag
Set
Sequence
Table
Matrix
Coverage
NOTE The values within a structure can be multiple. For example, the covariance matrix as given in Table D.32 is
reported as matrix of measure, where the matrix elements may have different units of measure. A list may consist of
different data quality value types.
7.2.11 Source reference
Source reference gives the citation of the source of the data quality measure.
When a data quality measure for which additional information is provided in an external source is added to the
list of standardized data quality measures, a reference to that source may be provided here.
NOTE The component source reference is specified in the base standard for registers, ISO 19135.
7.2.12 Example
Example may provide examples of applying the data quality measure or the result obtained for the data quality
measure.
More than one example may be provided.
7.2.13 Identifier
Identifier consists of an integer number that uniquely identifies a data quality measure.
If data quality measures are administered in a register, then identifiers may only be assigned by the register
manager.
NOTE The component identifier is specified in the base standard for registers, ISO 19135.
7.2.14 Obligation of the above-listed components
Some of the components are mandatory, others are conditional or optional. Table B.1 provides further
information on the obligation of each technical component.
7.3 Standardized data quality measures
In order to make data quality related metadata and data quality reports comparable, standardized data quality
measures shall be used in evaluating and reporting data quality, where appropriate. Annex D gives a list of
commonly used data quality measures with all required components for data quality measures as specified in
this Technical Specification.
6 © ISO 2006 – All rights reserved
Annex A
(normative)
Abstract test suite
A.1 Test case identifier: Component test
a) Test purpose: to determine conformance by ensuring that all necessary components of a data quality
measure are provided.
b) Test method: examine the entry for the data quality measure and verify that the components have been
provided as required by Table B.1.
c) Reference: 7.2 and Annex B.
d) Test type: Capability.
A.2 Test case identifier: Name test
a) Test purpose: to determine if a distinct name for the data quality measure is used.
b) Test method: determine if the name for the data quality measure is distinct from other measures with
different concepts, and if the name is not in conflict with other data quality basic measures, their
definitions and descriptions.
c) Reference: 7.2.1.
d) Test type: Capability.
A.3 Test case identifier: Data quality element and subelement test
a) Test purpose: to determine
⎯ if data quality element and subelement are assigned;
⎯ if they are taken from the list of data quality elements and subelements in ISO 19113 or if they are an
additional data quality element and subelement created in conformance with the rules of ISO 19113;
⎯ if the data quality measure is relevant for the given data quality element and subelement.
b) Test method: check if proper values are assigned to the data quality element and subelement
components and if the data quality measure has bearing on these.
c) Reference: 7.2.3 and 7.2.4.
d) Test type: Capability.
A.4 Test case identifier: Data quality basic measure test
a) Test purpose: to determine if a data quality measure is properly derived from a data quality basic
measure.
b) Test method: check if an appropriate data quality basic measure for the data quality measure exists and,
if it does, that the data quality measure is utilizing this data quality basic measure in conformance with
this Technical Specification.
c) Reference: 7.2.5.
d) Test type: Capability.
A.5 Test case identifier: Definition test
a) Test purpose: to determine if a fitting, correct and complete definition is provided.
b) Test method: check that the given definition contains no ambiguities and that it is in conformance with
characteristics of a definition as stated in ISO 19135:2005, 7.3.1.
c) Reference: 7.2.6 and ISO 19135:2005, 7.3.1.
d) Test type: Capability.
A.6 Test case identifier: Description test
a) Test purpose: to determine if an exhaustive description is provided.
b) Test method: check if the description contains a comprehensive explanation with all required formulae to
facilitate the application of the data quality measure.
c) Reference: 7.2.7.
d) Test type: Capability.
A.7 Test case identifier: Parameter test
a) Test purpose: to determine if required parameters are provided.
b) Test method: check if all parameters occurring in the description are provided in the parameter
component.
c) Reference: 7.2.8.
d) Test type: Capability.
A.8 Test case identifier: Data quality value type test
a) Test purpose: to determine if a proper data quality value type is provided.
b) Test method: check if the provided data quality value type is included in the list in Table 3.
c) Reference: 7.2.9.
d) Test type: Capability.
8 © ISO 2006 – All rights reserved
A.9 Test case identifier: Source reference test
a) Test purpose: to determine if a proper source reference is provided.
b) Test method: check if the cited reference source exists and if it reflects the concept of the provided data
quality measure.
c) Reference: 7.2.11.
d) Test type: Capability.
A.10 Test case identifier: Example test
a) Test purpose: to determine if the example, if provided, is a valid example for the data quality measure.
b) Test method: check if the example is free of errors and if it is representative of the usage of the data
quality measure.
c) Reference: 7.2.12.
d) Test type: Capability.
Annex B
(normative)
Structure of data quality measures
B.1 Components defining a data quality measure
Table B.1 shall be used for the technical specification of every data quality measure. The descriptor for
obligation/condition may have the following values: M (mandatory), C (conditional), or O (optional).
Table B.1 — Components defining a data quality measure
Line Component Description Obligation/condition
1 Name Name of the data quality measure applied to the M
data
a
2 Alias Another recognized name, an abbreviation or a O
short name for the same data quality measure
3 Data quality element Name of the data quality element for which M
quality is reported
4 Data quality subelement Name of the data quality subelement for which M
quality is reported
5 Data quality basic Name of the data quality basic measure from C/if derived from basic measure
measure which the data quality measure is derived
6 Definition Definition of the fundamental concept for the M
data quality measure
7 Description Description of the data quality measure, C/if the definition is not sufficient
including all formulae and/or illustrations needed for the understanding of the data
to establish the result of applying the measure quality measure concept
a
8 Parameter Auxiliary variable used by the data quality C/if required
measure, including its name, definition and
optionally its description
a
9 Data quality value type Value type for reporting a data quality result M
10 Data quality value Structure for reporting a complex data quality O
structure result
a
11 Source reference Reference to the source of an item that has been C/if an external source exists
adopted from an external source
a
12 Example Illustration of the use of a data quality measure O
13 Identifier Integer number, uniquely identifying a data C/if data quality measures are
quality measure administered in a register
a
Multiple entries are allowed. When values for the optional or conditional elements are not present, this should be indicated by
assigning the character “—” to the appropriate component.
10 © ISO 2006 – All rights reserved
B.2 Mapping of the components to ISO 19115 and ISO 19135
Table B.2 — Mapping of the components to ISO 19115 and ISO 19135
Line Component ISO 19115 element name ISO 19135 element name
1 Name nameOfMeasure name
2 Alias – alternativeExpressions
3 Data quality element DQ_Element –
4 Data quality subelement lines 108-127 [B.2.4.3 Data quality –
element information]
5 Data quality basic measure – –
6 Definition – definition
7 Description measureDescription description
8 Parameter – –
9 Data quality value type – –
10 Data quality value structure – –
11 Source reference – source
12 Example – –
13 Identifier measureIdentification itemIdentifier
B.3 UML-diagram for data quality measure
Figure B.1 defines the components for data quality measures and Figure B.2 defines the relationship of data
quality measures and registered items from ISO 19135. Both figures are in UML notation.
The UML models describe the content model, if a register for data quality measures is implemented.
The class RE_RegisteredItem is defined in ISO 19135.
Figure B.1 — Data quality measure
12 © ISO 2006 – All rights reserved
Figure B.2 — Relationship between registered item of ISO 19135 and data quality measure
Annex C
(normative)
Data quality basic measures
C.1 Purpose of data quality basic measures
The concept of data quality basic measures is introduced in this Technical Specification to avoid the repetitive
definition of the same concept. There are data quality measures that have certain communalities. For
example, the counting-related data quality measures are dealing with the concept of counting errors. The
number of errors may be used to construct different kind of data quality measures. The concept of
constructing these data quality measures is defined for the generic data quality basic measures and shall be
used for the creation of data quality measures that share these communalities.
Counting- and uncertainty-related data quality measures can be identified. Therefore two principle categories
of data quality basic measures are listed in this annex. The counting-related data quality basic measures are
based on the concept of counting errors or correct items. The uncertainty-related data quality basic measures
are based on the concept of modelling the uncertainty of measurements with statistical methods. The
measured quantity can be embedded in different dimensions. Depending on the dimension of the measured
quantity, different types of data quality basic measures shall be used to construct data quality measures.
Annex D uses the data quality basic measures of Annex C where appropriate. When appropriate, the
construction of new data quality measures shall be derived from one of the following data quality basic
measures.
C.2 Counting-related data quality basic measures
The data quality basic measures based on different methods of counting errors or counting the number of
correct values are listed in Table C.1.
Table C.1 — Data quality basic measures for counting-related data quality measures
Data quality basic
Data quality basic measure definition Example Data quality value type
measure name
Error indicator Indicator that an item is in error False Boolean (if the value is true the item is
not correct)
Correctness indicator Indicator that an item is not in error True Boolean (if the value is true the item is
correct)
Error count 11 Integer
Total number of items that are subject to
an error of a specified type
Correct items count Total number of items that are free of 571 Integer
errors of a specified type
Error rate Number of the erroneous items with 0,0189 Error rate can either be presented as
respect to the total number of items that 1,89% real, percentage or as ratio
should have been present 11,582
Correct items rate Number of the correct items with respect 0,9811 Correct items rate can either be
to the total number of items that should 98,11% presented as real, percentage or as ratio
have been present 571:582
NOTE 1 Number of items is defined using number of items in the universe of discourse for the dataset specified by
data quality scope.
EXAMPLE Use number of items found in the real world or reference dataset.
NOTE 2 A list of data quality value types is provided in Table 2 (see 7.2.9).
14 © ISO 2006 – All rights reserved
C.3 Uncertainty-related data quality basic measures
C.3.1 General
Numerical values that are obtained by some kind of measuring procedure can only be observed to a certain
accuracy. By treating the measured quantity (measurand) as random variable, this uncertainty can be
quantified. The different ways of describing uncertainty with statistical methods are used for the definition of
uncertainty-related data quality basic measures.
The statistical methods used for the definition of uncertainty-related data quality measures are based on
certain assumptions:
⎯ uncertainties are homogeneous for all observed values;
⎯ the observed values are not correlated;
⎯ the observed values have normal distribution.
C.3.2 One-dimensional random variable, Ζ
For a continuous measurand (i.e. the value domain of the measured quantities is the real numbers), it is
impossible to give the probability of a single value to be the true value. But it is possible to give the probability
for the true value to be within a certain interval. This interval is called the confidence interval. It is given by the
probability P of the true value being between the lower and the upper limit. This probability P is also called the
significance level.
P lower limituutrue value upper limit = P
()
If the standard deviation σ is known, the limits are given by the quantiles u of the normal (Gaussian)
distribution Pz()−⋅uσσuutrue value z+u⋅ = P .
tt
Table C.2 — Relation between the quantiles of the normal distribution and the significance level
Data quality basic Data quality value
Probability P Quantile Name
measure type
P = 68,3 % u = 1 u ⋅σ LE68.3 measure
68,%3 68,%3 Z
P = 50 % u = 0,6745 u ⋅σ LE50 measure
50% 50% Z
P = 90 % u = 1,645 u ⋅σ LE90 measure
90% 90% Z
P = 95 % = 1,960 LE95 measure
u u ⋅σ
95% 95% Z
P = 99 % u = 2,576 u ⋅σ LE99 measure
99% 99% Z
P = 99,8 % u = 3 u ⋅σ LE99.8 measure
99,%8 99,%8 Z
If the standard deviation σ is unknown, but the one-dimensional random variable Ζ is measured redundantly
by Ν independent observations, it is possible to estimate the standard deviation from the observations.
th
z represents the i measurement for the value. If the true value z for Ζ is known, the standard deviation
mi
t
can be estimated by
N
sz=−()z
Zm∑it
r
i=1
with redundancy r being the number of observations r = N. If the true value is unknown, it may be estimated
N
as the arithmetic mean of the observations z = z .
tm∑i
i=1
The standard deviation may then be estimated using the same formula, with r = N − 1.
If the standard deviation is estimated by redundant measurements, the confidence interval can be derived
from the Student’s t-distribution with parameter r:
Pt−⋅suuZ−z t⋅s = P with ()Z−zs/ ~t(r)
()
zt z tz
Table C.3 — Relation between the quantiles of the Student’s t-distribution and the significance level
for different redundancies r
Quantile Quantile Quantile Quantile Quantile Quantile
Probability P
for r = 10 for r = 5 for r = 4 for r = 3 for r = 2 for r = 1
P = 50 % t = 1,221 t = 1,301 t = 1,344 t = 1,423 t = 1,604 t = 2,414
t = 1,524 t = 1,657 t = 1,731 t = 1,868 t = 2,203 t = 3,933
P = 68,3 %
P = 90 % t = 2,228 t = 2,571 t = 2,776 t = 3,182 t = 4,303 t = 12,706
P = 95 % t = 2,634 t = 3,163 t = 3,495 t = 4,177 t = 6,205 t = 25,452
P = 99 % t = 3,581 t = 4,773 t = 5,598 t = 7,453 t = 14,089 t = 127,321
t = 4,587 t = 6,869 t = 8,610 t = 12,924 t = 31,599 t = 636,619
P = 99,8 %
Table C.4 — Data quality basic measures for different probabilities P of a one-dimensional quantity,
where the standard deviation is estimated from redundant measurements
Probability P Data quality basic measure Name Data quality value type
tr()⋅s
P = 50,0 % LE50(r) measure
50% Z
tr()⋅s
P = 68,3 % LE68.3(r) measure
68,%3 Z
tr()⋅s
P = 90,0 % LE90(r) measure
90% Z
tr()⋅s
P = 95,0 % LE95(r) measure
95% Z
tr()⋅s
P = 99,0 % LE99(r) measure
99% Z
tr()⋅s
P = 99,8 % LE99.8(r) measure
99,%8 Z
NOTE The values of t for a number of redundancies r can be obtained from Table C.3.
The data quality basic measures for the uncertainty of one-dimensional quantities are given in Tables C.2 and
C.4. They both aim to measure the uncertainty by giving the upper and lower limit of a confidence interval.
The difference is in how the standard deviation is obtained. If it is known a priori, then Table C.2 is relevant. If
the standard deviation is estimated from redundant measurements, then Table C.4 in conjunction with
Table C.3 is relevant.
16 © ISO 2006 – All rights reserved
C.3.3 Two-dimensional random variable Χ and Υ
The case of the one-dimensional random variable Ζ can be expanded to two dimensions where the
measurand is always observed by two values. The measurand is given by the tuple Χ, Υ. This has the same
assumptions as in the case of the one-dimensional random variable.
The observations are x and y . The equivalence of the confidence interval in one dimension is the
mi mi
confidence area, which is usually described as a circle around the best estimation for the true value. The
probability for the true value to lie in this area is calculated by area integration over the two-dimensional
density function of the normal distribution. A circular area is characterized by its radius. This radius, R, is used
as measure for the accuracy of two-dimensional random variables:
⎛⎞
()xx−−(y y)
1 tt
⎜⎟
−+
22 2
⎜⎟
σσ
⎝⎠XY
PR(,σσ, )= edxdy
XY
∫∫
2πσσ
XY
()xx−+(y−y)=R
tt
For some particular probabilities, the radius can be calculated depending on the standard deviations
σ and σ .
x y
Table C.5 — Relationship between the probability P and the corresponding radius of the circular area
Data quality basic Data quality value
Probability P Name
measure type
P = 39,4 % CE39.4 measure
σ +σ
xy
11, 774
P = 50 % CE50 measure
σ +σ
xy
2,146
P = 90 % CE90 measure
σ +σ
xy
2,4477
P = 95 % CE95 measure
σ +σ
xy
35,
P = 99,8 % CE99.8 measure
σ +σ
xy
C.3.4 Three-dimensional random variable Χ, Υ, Ζ
The case of the one-dimensional random variable Ζ can be expanded to three dimensions where the
measurand is always observed by three values. The measurand is given by the tuple Χ, Υ, Ζ. They underlay
the same assumptions as in the case of the one-dimensional random variable.
The observations are x , y and z . The equivalence of the confidence interval in one dimension is the
mi mi mi
confidence volume, which is usually described as a sphere around the best estimation for the true value. The
probability for the true value to lie in this volume is calculated by volume integration over the three-
dimensional density function of the normal distribution. A spherical volume is characterized by its radius. This
radius is used as measure for the accuracy of three-dimensional random variables.
Table C.6 — Relationship between the probability P and the corresponding radius
of the spherical volume
Data quality basic Data quality value
Probability P Name
measure type
spherical error probable
05, 1⋅+σ σσ+
P = 50 % measure
()xy z
(SEP)
mean radial spherical error
22 2
P = 61 % measure
σ++σσ
xy z
(MRSE)
90 % spherical accuracy
0,833⋅+σ σσ+
P = 90 % measure
()xy z
standard
99 % spherical accuracy
1,122⋅+σ σσ+
P = 99 % measure
()
xy z
standard
18 © ISO 2006 – All rights reserved
SIST-TS ISO/TS
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...