Information technology — Big data reference architecture — Part 1: Framework and application process

This document describes the framework of the big data reference architecture and the process for how a user of the document can apply it to their particular problem domain.

Technologies de l'information — Architecture de référence des mégadonnées — Partie 1: Cadre méthodologique et processus d'application

General Information

Status
Published
Publication Date
19-Aug-2020
Current Stage
6060 - International Standard published
Start Date
20-Aug-2020
Completion Date
20-Aug-2020
Ref Project

Buy Standard

Technical report
ISO/IEC TR 20547-1:2020 - Information technology -- Big data reference architecture
English language
14 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/IEC PRF TR 20547-1 - Information technology -- Big data reference architecture
English language
14 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

TECHNICAL ISO/IEC TR
REPORT 20547-1
First edition
2020-08
Information technology — Big data
reference architecture —
Part 1:
Framework and application process
Technologies de l'information — Architecture de référence des
mégadonnées —
Partie 1: Cadre méthodologique et processus d'application
Reference number
ISO/IEC TR 20547-1:2020(E)
ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC TR 20547-1:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC TR 20547-1:2020(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Abbreviated terms .............................................................................................................................................................................................. 2

5 Document overview........................................................................................................................................................................................... 3

6 Big data standardization: motivation and objectives ..................................................................................................... 3

7 Conceptual foundations ................................................................................................................................................................................ 5

7.1 General ........................................................................................................................................................................................................... 5

7.2 Reference architecture concepts ............................................................................................................................................. 5

7.3 Reference architecture structure ............................................................................................................................................ 6

8 Big data reference architecture elements .................................................................................................................................. 7

8.1 Overview ...................................................................................................................................................................................................... 7

8.2 Stakeholders .............................................................................................................................................................................................. 8

8.3 Concerns ....................................................................................................................................................................................................... 9

8.4 Views ............................................................................................................................................................................................................... 9

8.4.1 User view .............................................................................................................................................................................10

8.4.2 Functional view ..............................................................................................................................................................10

9 Big data reference architecture application process ...................................................................................................10

9.1 Overview ...................................................................................................................................................................................................10

9.2 Identify stakeholders and concerns...................................................................................................................................11

9.3 Map stakeholders and concerns to roles and subroles .....................................................................................11

9.4 Develop detailed activity descriptions and map to concerns ......................................................................12

9.5 Define functional components to implement activities ....................................................................................13

9.6 Cross walk activities/functional components back to concerns ...............................................................13

Bibliography .............................................................................................................................................................................................................................14

© ISO/IEC 2020 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC TR 20547-1:2020(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that

are members of ISO or IEC participate in the development of International Standards through

technical committees established by the respective organization to deal with particular fields of

technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other

international organizations, governmental and non-governmental, in liaison with ISO and IEC, also

take part in the work.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for

the different types of document should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject

of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent

rights. Details of any patent rights identified during the development of the document will be in the

Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC

list of patent declarations received (see http:// patents .iec .ch).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/IEC JTC 1, Information technology,

Subcommittee SC 42, Artificial intelligence.
A list of all parts in the ISO/IEC 20547 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC TR 20547-1:2020(E)
Introduction

The big data paradigm is a rapidly changing field with rapidly changing technologies. This dynamic

situation creates two significant issues for potential implementers of the technology. First, there is a

lack of standard definitions for terms including the core concept of big data. The second issue is that

there is no consistent approach to describe a big data architecture and implementation. The first issue

is addressed by ISO/IEC 20546. The ISO/IEC 20547 series is targeted to the second issue and provides

a framework and reference architecture which organizations can apply to their problem domain to

effectively and consistently describe their architecture and its implementations with respect to the

roles/actors and their concerns as well as the underlying technology. This document describes the

reference architecture framework and provides a process for mapping a specific problem set/use case

to the architecture and evaluating that mapping.
© ISO/IEC 2020 – All rights reserved v
---------------------- Page: 5 ----------------------
TECHNICAL REPORT ISO/IEC TR 20547-1:2020(E)
Information technology — Big data reference
architecture —
Part 1:
Framework and application process
1 Scope

This document describes the framework of the big data reference architecture and the process for how

a user of the document can apply it to their particular problem domain.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO/IEC/IEEE 42010, Systems and software engineering — Architecture description
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO/IEC/IEEE 42010 and the

following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
big data

extensive datasets — primarily in the characteristics of volume, variety, velocity, and/or variability —

that require a scalable technology for efficient storage, manipulation, and analysis

Note 1 to entry: Big data is commonly used in many different ways, for example as the name of the scalable

technology used to handle big data extensive datasets.
[SOURCE: ISO/IEC 20546:2019, 3.1.2]
3.2
reference architecture

in the field of software architecture or enterprise architecture, provides a proven template solution

for an architecture for a particular domain, as well as a common vocabulary with which to discuss

implementations, often with the aim of stressing commonality
[SOURCE: ISO/TR 14639-2:2014, 2.65]
3.3
framework

particular set of beliefs, or ideas referred to in order to describe a scenario or solve a problem

[SOURCE: ISO 15638-6:2014, 4.30]
© ISO/IEC 2020 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO/IEC TR 20547-1:2020(E)
3.4
security

protection against intentional subversion or forced failure. A composite of four attributes —

confidentiality, integrity, availability, and accountability — plus aspects of a fifth, usability, all of which

have the related issue of their assurance
[SOURCE: ISO/IEC/IEEE 15288:2015, 4.1, 31]
3.5
privacy

right of individuals to control or influence what information related to them may be collected and

stored and by whom that information may be disclosed
[SOURCE: ISO/IEC TR 26927:2011, 3.34]
3.6
provenance

information on the place and time of origin, derivation or generation of a resource or a record or proof

of authenticity or of past ownership
[SOURCE: ISO/IEC 11179-7:2019, 3.1.10]
3.7
SQL
database language specified by ISO/IEC 9075

Note 1 to entry: SQL is sometimes interpreted to stand for Structured Query Language but that name is not used

in the ISO/IEC 9075 series.
[SOURCE: ISO/IEC 20546:2019, 3.1.36]
3.8
lifecycle

evolution of a system, product, service, project or other human-made entity from conception through

retirement
[SOURCE: ISO/IEC/IEEE 15288:2015, 4.1.23]
4 Abbreviated terms
BDA big data auditor
BDAcP big data access provider
BDAnP big data analytics provider
BDAP big data application provider
BDCP big data collection provider
BDFP big data framework provider
BDIP big data infrastructure provider
BDPlaP big data platform provider
BDPreP big data preparation provider
BDProP big data processing provider
2 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC TR 20547-1:2020(E)
BDSD big data service developer
BDSO big data system orchestrator
BDSP big data service partner
BDRA big data reference architecture
BDVP big data visualization provider
GDPR general data protection regulation
JSON Javascript object notation
RDF resource description framework
SQuaRE systems and software quality requirements and evaluation
XML extensible markup language
5 Document overview

This document is designed to introduce the reader to certain big data reference architecture concepts

so that they can apply the other documents in the ISO/IEC 20547 series to their specific system and

problem set.
Clauses 6 to 9:
— give the motivation and objectives behind big data standards;
— provide an introduction to reference architectures and their purpose;
— provide an overview of the BDRA and an explanation of its key concepts;
— provide a process on application of the BDRA to a problem domain.

This document can be leveraged in various ways when reading and applying the ISO/IEC 20547 series:

a) if the user intends to read only this document to gain a general understanding of the BDRA and its

applicability to his/her problem space, he/she can concentrate on Clauses 5, 6, and 7;

b) if the user is developing a big data architecture and wishes to align it to the BDRA, then he/she can

follow the process in Clause 8.
6 Big data standardization: motivation and objectives

In a 2019 report, IDC forecast worldwide revenues for big data and data analytics of 189,1 billion USD,

a 12 % increase over 2018 and predicts a five-year compound annual growth rate of 13,2 % with

[15]
revenues in 2022 exceeding 274,3 billion USD .

In addition, buyers and implementers of big data systems deal with an exploding number of technologies

and options — many of which get wrapped by the vendors in the buzz words including the undefined

term big data. In order for the stakeholders of big data systems to understand what they are buying and

implementing, a clear framework for communications with potential technology and service vendors is

needed to support robust and accurate communication.

NOTE 1 "Big data system" means a system that leverages big data engineering and employs a big data

paradigm to process big data.
© ISO/IEC 2020 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO/IEC TR 20547-1:2020(E)

NOTE 2 "Big data engineering" means advanced techniques that harness independent resources for building

scalable data systems when the characteristics of the datasets require new architectures for efficient storage,

manipulation, and analysis.

NOTE 3 "Big data paradigm" means distribution of data systems across horizontally coupled, independent

resources to achieve the scalability needed for the efficient processing of extensive datasets.

While the potential value for analyzing big data is what attracts organizations to implementation of

big data systems, these organizations need to understand the potential issues and liabilities associated

with managing and controlling this data. IDC estimates that enterprises have liability or responsibility

for nearly 80 % of the information in the digital universe and should be prepared to deal with issues of

compliance, copyright and privacy. IDC further predicts that, by 2020, over 40 % of the information in

the digital universe will require explicit protection and the amount of this data is growing faster than

[15]

the total digital universe . These risks mean that organizations should both be able to identify, define

and articulate the policies for data security, provenance, and governance as well as implementing and

documenting the technical controls to enforce those policies in order to protect the organization as a

whole from liability for compromise or misuse of the data they control.

Finally, very few organizations dealing with big data operate solely on data organic to that organization.

This means that systems that collect and analyze big data need to be able to securely and reliably

interoperate and share data. In fact, the sheer volume associated with big data frequently makes it

impractical to transfer between systems necessitating that, in many cases, the analytics need to be

moved to the data requiring not just interoperability at the data level but at the software and application

level between systems.

The existing big data landscape, market requirements for big data standardization were examined and

the standardization priorities below were identified:

a) big data use cases, definitions, vocabulary and reference architectures (e.g. system, data, platforms,

online/offline, etc.);
b) specifications and standardization of metadata including data provenance;
c) application models (e.g. batch, streaming, etc.);

d) query languages including non-relational queries to support diverse data types (XML, RDF, JSON,

multimedia, etc.) and big data operations (e.g. matrix operations);
e) domain-specific languages;
f) semantics of eventual consistency;
g) advanced network protocols for efficient data transfer;

h) general and domain specific ontologies and taxonomies for describing data semantics including

interoperation between ontologies;
i) big data security and privacy access controls;

j) remote, distributed, and federated analytics (taking the analytics to the data) including data and

processing resource discovery and data mining;
k) data sharing and exchange;

l) data storage, e.g. memory storage system, distributed file system, data warehouse, etc.;

m) human consumption of the results of big data analysis (e.g. visualization);
n) energy measurement for big data;
o) interface between relational (SQL) and non-relational (NoSQL) data stores;
[13]
p) big data quality and veracity description and management .
4 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC TR 20547-1:2020(E)

ISO/IEC 20546 and the ISO/IEC 20547 series were developed with the intention to address those gaps.

This document specifically addresses framework and application process, big data use cases and

requirements [gap a) above], reference architectures [gap a) above], and security and privacy [gap i)

above], and standards roadmap. In addition, organizations with big data analytic requirements

cannot wait for big data specific standards before they can implement their systems. Because big

data is essentially a subset of all data, and almost every information technology standard deals with

data in some respect, there are a large number of standards in place or underdevelopment today that

address a number of big data issues. To address this need, ISO/IEC 20547-5 is a standards roadmap

that aligns existing standards to the roles within the reference architecture to provide big data system

stakeholders some guidance on how they can apply those standards to their problems today. Clause 7

describes each of the other parts in this series.
7 Conceptual foundations
7.1 General

The ISO/IEC 20547 series is designed to provide a foundation to a range of stakeholders in a given

system to effectively and unambiguously describe and communicate about the characteristics and

attributes of a given big data system. Based on the definitions provided in ISO/IEC 20546 for big data, a

big data system is a system that:

— processes extensive data sets — primarily in the characteristics of volume, variety, velocity, and/or

variability — that require a scalable architecture for efficient storage, manipulation, and analysis;

— leverages advanced techniques that harness independent resources for building scalable data

systems when the characteristics of the datasets require new architectures for efficient storage,

manipulation, and analysis;

— employs a paradigm where distribution of data systems across horizontally coupled and independent

resources to achieve the scalability needed for the efficient processing of extensive datasets.

The broad and unconstrained nature of big data systems necessitates that the reference architecture

provided in the ISO/IEC 20547 series be sufficient to represent the wide range of potential use cases

implemented by big data systems.
7.2 Reference architecture concepts

In order to understand what a reference architecture covers, it is necessary to first define what a

reference architecture means. Since it is an architecture, the reference architecture necessarily

possesses all the characteristics of an architecture as defined by ISO/IEC/IEEE 42010 (see 3.2). The big

data reference architecture should also be generalized enough to cover the variety of potential big data

systems architectures.

Examined from an object-oriented view point, the reference architecture would be considered the

abstract class from which specific instances of architectures derive their structure and attributes.

ISO/TR 14639-2 defines a reference architecture as in the field of software architecture or enterprise

architecture, provides a proven template solution for an architecture for a particular domain, as well

as a common vocabulary with which to discuss implementations, often with the aim of stressing

commonality.

Based on this reasoning, a reference architecture is an architectural framework as defined by

ISO/IEC/IEEE 42010, including the structure, rules and constraints common to all big data systems.

Thus, a big data reference architecture provides a series of conventions, principles and practices for

describing big data system architectures.
© ISO/IEC 2020 – All rights reserved 5
---------------------- Page: 10 ----------------------
ISO/IEC TR 20547-1:2020(E)

Reference architectures are developed to meet a wide variety of objectives as shown in Figure 1 taken

from Reference [14], which goes on to describes that the core purpose of a reference architecture is to

be forward looking and should be used (referenced) as the basis for future implementations.

Figure 1 — The concept of reference architectures
7.3 Reference architecture structure

Figure 2 combines concepts and structures from ISO/IEC/IEEE 42010 to depict the outline for a

reference architecture structure.

A reference architecture is defined for a domain. for this reference architecture, the domain is big data.

The domain in turn defines the environment, in the case of big data the environment is primarily defined

by the core characteristics of big data — volume, velocity, variety, variability (see ISO/IEC 20546).

The stakeholders in this environment includes all the common stakeholders (users, owners, architects,

etc.) for any system along with anyone having a concern related to the data and its characteristics.

The environment bounds the concerns. Since the environment is defined by the big data characteristics

the concerns are bound by those characteristics and each concern should relate to one or more of those

characteristics along with the stakeholder(s) which have that concern.

The reference architecture is described using an architecture framework. This framework is described

in ISO/IEC 20547-3 and is presented in terms of two view points:
— roles and activities — user view;
— functional components — functional view.
Each of these viewpoints in turn addresses one or more concerns.
6 © ISO/IEC 2020 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC TR 20547-1:2020(E)

Those concerns can be addressed by one or more roles, activities and functional components within

those architecture views.

EXAMPLE In a credit monitoring system, everyone with a record in that system is a stakeholder. Most have a

concern that their privacy is maintained, and the security of their personal information is protected.

Within both the user and functional views of the big data reference architecture, there is a security and

privacy cross-cutting aspect. This cross-cutting aspect has a relation with the activity "perform audit"

and functional component "audit framework" which addresses that concern.

Figure 2 — Basic outline of a reference architecture structure based on ISO/IEC/IEEE 42010

8 Big data reference architecture elements
8.1 Overview

For the big data system environment, this document provides BDRA structure by providing the scope

of each part of the ISO/IEC 20547 series, logical relationships of each document, and application process

of BDRA. The scope of each document in the ISO/IEC 20547 series is as follows:

— ISO/IEC 20547-1: Framework and application process describes the framework of the big data

reference architecture and the process for how a user of the standard can apply it to their particular

problem domain;

— ISO/IEC 20547-2: Use cases and derived requirements provides examples of big data use cases with

application domains and technical considerations derived from the contributed use cases;

— ISO/IEC 20547-3: Reference architecture specifies the big data reference architecture (BDRA). The

reference architecture includes concepts and architectural views (user view and functional view);

— ISO/IEC 20547-4: Security and privacy specifies the security and privacy aspects applicable to

the Big Data Reference Architecture (BDRA) including the big data roles, activities, and functional

components, and also provides guidance on security and privacy operations for big data;

© ISO/IEC 2020 – All rights reserved 7
---------------------- Page: 12 ----------------------
ISO/IEC TR 20547-1:2020(E)

— ISO/IEC 20547-5: Standards roadmap describes big data relevant standards, both in existence and

under development, along with priorities for future big data standards development based on gap

analysis.

Figure 3 shows the relationships and iteration cycle of each part of the ISO/IEC 20547 series. Based on

the contributions from enterprises, organizations and experts of the related research and academia

related, ISO/IEC TR 20547-2 collects use cases and derives technical considerations. ISO/IEC 20547-3

defines reference architecture for big data by reflecting these technical considerations. ISO/IEC 20547-4,

in particular, specifies the security and privacy aspect to support big data. ISO/IEC 20547-5 provides

the applicable list of standards at the BDRA perspective. The ISO/IEC 20547 series represents a point-

in-time view of big data systems and architectures. As big data implementations are created and

evolve based on ISO/IEC 20547-1, ISO/IEC TR 20547-2, ISO/IEC 20547-3 and ISO/IEC 20547-4, they

will reference and make use of the standards documented in ISO/IEC 20547-5. Those new systems

can be documented in ISO/IEC TR 20547-2 as new use cases leading to new technical considerations.

The technical considerations introduced by those use cases can lead to new standardization activities

resulting in new standards to be documented in ISO/IEC 20547-5.
Figure 3 — Relationships between the parts of the ISO/IEC 20547 series

In order to apply this framework to a specific use case, it is necessary to understand the overall

environment in which the big data system will be implemented, who the stakeholders are for that

system, and what are the concerns of those stakeholders. Subclauses 8.2 to 8.4 describe each of these

key components.
8.2 Stakeholders

ISO/IEC/IEEE 42010 defines a stakeholder as any individual, team, organization or classes thereof,

having an interest in a system. Common stakeholders include the system owners, customers, system

implementors and others. In the case of big data systems, the stakeholders also include anyone with an

interest in the data being processed by the system. This includes the data owners who can be providing

data to the big data system, the data consumers who are making decisions based on the data coming

from the big data system, and also those people or organizations who can be described by the data.

Identification of the stakeholders and their concerns is the first step in developing a big data architecture.

ISO/IEC 20547-3 refers to the stakeholders of a big data system as parties within the user view.

8 © ISO/IEC 2020 – All rights reserved
--------------------
...

TECHNICAL ISO/IEC TR
REPORT 20547-1
First edition
Information technology — Big data
reference architecture —
Part 1:
Framework and application process
Technologies de l'information — Architecture de référence des
mégadonnées —
Partie 1: Cadre méthodologique et processus d'application
PROOF/ÉPREUVE
Reference number
ISO/IEC TR 20547-1:2020(E)
ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC TR 20547-1:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting

on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address

below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii PROOF/ÉPREUVE © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC TR 20547-1:2020(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Abbreviated terms .............................................................................................................................................................................................. 2

5 Document overview........................................................................................................................................................................................... 3

6 Big data standardization: motivation and objectives ..................................................................................................... 3

7 Conceptual foundations ................................................................................................................................................................................ 5

7.1 General ........................................................................................................................................................................................................... 5

7.2 Reference architecture concepts ............................................................................................................................................. 5

7.3 Reference architecture structure ............................................................................................................................................ 6

8 Big data reference architecture elements .................................................................................................................................. 7

8.1 Overview ...................................................................................................................................................................................................... 7

8.2 Stakeholders .............................................................................................................................................................................................. 8

8.3 Concerns ....................................................................................................................................................................................................... 9

8.4 Views ............................................................................................................................................................................................................... 9

8.4.1 User view .............................................................................................................................................................................10

8.4.2 Functional view ..............................................................................................................................................................10

9 Big data reference architecture application process ...................................................................................................10

9.1 Overview ...................................................................................................................................................................................................10

9.2 Identify stakeholders and concerns...................................................................................................................................11

9.3 Map stakeholders and concerns to roles and subroles .....................................................................................11

9.4 Develop detailed activity descriptions and map to concerns ......................................................................12

9.5 Define functional components to implement activities ....................................................................................13

9.6 Cross walk activities/functional components back to concerns ...............................................................13

Bibliography .............................................................................................................................................................................................................................14

© ISO/IEC 2020 – All rights reserved PROOF/ÉPREUVE iii
---------------------- Page: 3 ----------------------
ISO/IEC TR 20547-1:2020(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that

are members of ISO or IEC participate in the development of International Standards through

technical committees established by the respective organization to deal with particular fields of

technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other

international organizations, governmental and non-governmental, in liaison with ISO and IEC, also

take part in the work.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for

the different types of document should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject

of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent

rights. Details of any patent rights identified during the development of the document will be in the

Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC

list of patent declarations received (see http:// patents .iec .ch).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the

World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/

iso/ foreword .html.

This document was prepared by Technical Committee ISO/IEC JTC 1, Information technology,

Subcommittee SC 42, Artificial intelligence.
A list of all parts in the ISO/IEC 20547 series can be found on the ISO website.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www .iso .org/ members .html.
iv PROOF/ÉPREUVE © ISO/IEC 2020 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC TR 20547-1:2020(E)
Introduction

The big data paradigm is a rapidly changing field with rapidly changing technologies. This dynamic

situation creates two significant issues for potential implementers of the technology. First, there is a

lack of standard definitions for terms including the core concept of big data. The second issue is that

there is no consistent approach to describe a big data architecture and implementation. The first issue

is addressed by ISO/IEC 20546. The ISO/IEC 20547 series is targeted to the second issue and provides

a framework and reference architecture which organizations can apply to their problem domain to

effectively and consistently describe their architecture and its implementations with respect to the

roles/actors and their concerns as well as the underlying technology. This document describes the

reference architecture framework and provides a process for mapping a specific problem set/use case

to the architecture and evaluating that mapping.
© ISO/IEC 2020 – All rights reserved PROOF/ÉPREUVE v
---------------------- Page: 5 ----------------------
TECHNICAL REPORT ISO/IEC TR 20547-1:2020(E)
Information technology — Big data reference
architecture —
Part 1:
Framework and application process
1 Scope

This document describes the framework of the big data reference architecture and the process for how

a user of the document can apply it to their particular problem domain.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO/IEC/IEEE 42010, Systems and software engineering — Architecture description
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO/IEC/IEEE 42010 and the

following apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
big data

extensive datasets — primarily in the characteristics of volume, variety, velocity, and/or variability —

that require a scalable technology for efficient storage, manipulation, and analysis

Note 1 to entry: Big data is commonly used in many different ways, for example as the name of the scalable

technology used to handle big data extensive datasets.
[SOURCE: ISO/IEC 20546:2019, 3.1.2]
3.2
reference architecture

in the field of software architecture or enterprise architecture, provides a proven template solution

for an architecture for a particular domain, as well as a common vocabulary with which to discuss

implementations, often with the aim of stressing commonality
[SOURCE: ISO/TR 14639-2:2014, 2.65]
3.3
framework

particular set of beliefs, or ideas referred to in order to describe a scenario or solve a problem

[SOURCE: ISO 15638-6:2014, 4.30]
© ISO/IEC 2020 – All rights reserved PROOF/ÉPREUVE 1
---------------------- Page: 6 ----------------------
ISO/IEC TR 20547-1:2020(E)
3.4
security

protection against intentional subversion or forced failure. A composite of four attributes —

confidentiality, integrity, availability, and accountability — plus aspects of a fifth, usability, all of which

have the related issue of their assurance
[SOURCE: ISO/IEC/IEEE 15288:2015, 4.1, 31]
3.5
privacy

right of individuals to control or influence what information related to them may be collected and

stored and by whom that information may be disclosed
[SOURCE: ISO/IEC TR 26927:2011, 3.34]
3.6
provenance

information on the place and time of origin, derivation or generation of a resource or a record or proof

of authenticity or of past ownership
[SOURCE: ISO/IEC 11179-7:2019, 3.1.10]
3.7
SQL
database language specified by ISO/IEC 9075

Note 1 to entry: SQL is sometimes interpreted to stand for Structured Query Language but that name is not used

in the ISO/IEC 9075 series.
[SOURCE: ISO/IEC 20546:2019, 3.1.36]
3.8
lifecycle

evolution of a system, product, service, project or other human-made entity from conception through

retirement
[SOURCE: ISO/IEC/IEEE 15288:2015, 4.1.23]
4 Abbreviated terms
BDA big data auditor
BDAcP big data access provider
BDAnP big data analytics provider
BDAP big data application provider
BDCP big data collection provider
BDFP big data framework provider
BDIP big data infrastructure provider
BDPlaP big data platform provider
BDPreP big data preparation provider
BDProP big data processing provider
2 PROOF/ÉPREUVE © ISO/IEC 2020 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC TR 20547-1:2020(E)
BDSD big data service developer
BDSO big data system orchestrator
BDSP big data service partner
BDRA big data reference architecture
BDVP big data visualization provider
GDPR general data protection regulation
JSON Javascript object notation
RDF resource description framework
SQuaRE systems and software quality requirements and evaluation
XML extensible markup language
5 Document overview

This document is designed to introduce the reader to certain big data reference architecture concepts

so that they can apply the other documents in the ISO/IEC 20547 series to their specific system and

problem set.
Clauses 6 to 9:
— give the motivation and objectives behind big data standards;
— provide an introduction to reference architectures and their purpose;
— provide an overview of the BDRA and an explanation of its key concepts;
— provide a process on application of the BDRA to a problem domain.

This document can be leveraged in various ways when reading and applying the ISO/IEC 20547 series:

a) if the user intends to read only this document to gain a general understanding of the BDRA and its

applicability to his/her problem space, he/she can concentrate on Clauses 5, 6, and 7;

b) if the user is developing a big data architecture and wishes to align it to the BDRA, then he/she can

follow the process in Clause 8.
6 Big data standardization: motivation and objectives

In a 2019 report, IDC forecast worldwide revenues for big data and data analytics of 189,1 billion USD,

a 12 % increase over 2018 and predicts a five-year compound annual growth rate of 13,2 % with

[15]
revenues in 2022 exceeding 274,3 billion USD .

In addition, buyers and implementers of big data systems deal with an exploding number of technologies

and options — many of which get wrapped by the vendors in the buzz words including the undefined

term big data. In order for the stakeholders of big data systems to understand what they are buying and

implementing, a clear framework for communications with potential technology and service vendors is

needed to support robust and accurate communication.

NOTE 1 "Big data system" means a system that leverages big data engineering and employs a big data

paradigm to process big data.
© ISO/IEC 2020 – All rights reserved PROOF/ÉPREUVE 3
---------------------- Page: 8 ----------------------
ISO/IEC TR 20547-1:2020(E)

NOTE 2 "Big data engineering" means advanced techniques that harness independent resources for building

scalable data systems when the characteristics of the datasets require new architectures for efficient storage,

manipulation, and analysis.

NOTE 3 "Big data paradigm" means distribution of data systems across horizontally coupled, independent

resources to achieve the scalability needed for the efficient processing of extensive datasets.

While the potential value for analyzing big data is what attracts organizations to implementation of

big data systems, these organizations need to understand the potential issues and liabilities associated

with managing and controlling this data. IDC estimates that enterprises have liability or responsibility

for nearly 80 % of the information in the digital universe and should be prepared to deal with issues of

compliance, copyright and privacy. IDC further predicts that, by 2020, over 40 % of the information in

the digital universe will require explicit protection and the amount of this data is growing faster than

[15]

the total digital universe . These risks mean that organizations should both be able to identify, define

and articulate the policies for data security, provenance, and governance as well as implementing and

documenting the technical controls to enforce those policies in order to protect the organization as a

whole from liability for compromise or misuse of the data they control.

Finally, very few organizations dealing with big data operate solely on data organic to that organization.

This means that systems that collect and analyze big data need to be able to securely and reliably

interoperate and share data. In fact, the sheer volume associated with big data frequently makes it

impractical to transfer between systems necessitating that, in many cases, the analytics need to be

moved to the data requiring not just interoperability at the data level but at the software and application

level between systems.

The existing big data landscape, market requirements for big data standardization were examined and

the standardization priorities below were identified:

a) big data use cases, definitions, vocabulary and reference architectures (e.g. system, data, platforms,

online/offline, etc.);
b) specifications and standardization of metadata including data provenance;
c) application models (e.g. batch, streaming, etc.);

d) query languages including non-relational queries to support diverse data types (XML, RDF, JSON,

multimedia, etc.) and big data operations (e.g. matrix operations);
e) domain-specific languages;
f) semantics of eventual consistency;
g) advanced network protocols for efficient data transfer;

h) general and domain specific ontologies and taxonomies for describing data semantics including

interoperation between ontologies;
i) big data security and privacy access controls;

j) remote, distributed, and federated analytics (taking the analytics to the data) including data and

processing resource discovery and data mining;
k) data sharing and exchange;

l) data storage, e.g. memory storage system, distributed file system, data warehouse, etc.;

m) human consumption of the results of big data analysis (e.g. visualization);
n) energy measurement for big data;
o) interface between relational (SQL) and non-relational (NoSQL) data stores;
[13]
p) big data quality and veracity description and management .
4 PROOF/ÉPREUVE © ISO/IEC 2020 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC TR 20547-1:2020(E)

ISO/IEC 20546 and the ISO/IEC 20547 series were developed with the intention to address those gaps.

This document specifically addresses framework and application process, big data use cases and

requirements [gap a) above], reference architectures [gap a) above], and security and privacy [gap i)

above], and standards roadmap. In addition, organizations with big data analytic requirements

cannot wait for big data specific standards before they can implement their systems. Because big

data is essentially a subset of all data, and almost every information technology standard deals with

data in some respect, there are a large number of standards in place or underdevelopment today that

address a number of big data issues. To address this need, ISO/IEC 20547-5 is a standards roadmap

that aligns existing standards to the roles within the reference architecture to provide big data system

stakeholders some guidance on how they can apply those standards to their problems today. Clause 7

describes each of the other parts in this series.
7 Conceptual foundations
7.1 General

The ISO/IEC 20547 series is designed to provide a foundation to a range of stakeholders in a given

system to effectively and unambiguously describe and communicate about the characteristics and

attributes of a given big data system. Based on the definitions provided in ISO/IEC 20546 for big data, a

big data system is a system that:

— processes extensive data sets — primarily in the characteristics of volume, variety, velocity, and/or

variability — that require a scalable architecture for efficient storage, manipulation, and analysis;

— leverages advanced techniques that harness independent resources for building scalable data

systems when the characteristics of the datasets require new architectures for efficient storage,

manipulation, and analysis;

— employs a paradigm where distribution of data systems across horizontally coupled and independent

resources to achieve the scalability needed for the efficient processing of extensive datasets.

The broad and unconstrained nature of big data systems necessitates that the reference architecture

provided in the ISO/IEC 20547 series be sufficient to represent the wide range of potential use cases

implemented by big data systems.
7.2 Reference architecture concepts

In order to understand what a reference architecture covers, it is necessary to first define what a

reference architecture means. Since it is an architecture, the reference architecture necessarily

possesses all the characteristics of an architecture as defined by ISO/IEC/IEEE 42010 (see 3.2). The big

data reference architecture should also be generalized enough to cover the variety of potential big data

systems architectures.

Examined from an object-oriented view point, the reference architecture would be considered the

abstract class from which specific instances of architectures derive their structure and attributes.

ISO/TR 14639-2 defines a reference architecture as in the field of software architecture or enterprise

architecture, provides a proven template solution for an architecture for a particular domain, as well

as a common vocabulary with which to discuss implementations, often with the aim of stressing

commonality.

Based on this reasoning, a reference architecture is an architectural framework as defined by

ISO/IEC/IEEE 42010, including the structure, rules and constraints common to all big data systems.

Thus, a big data reference architecture provides a series of conventions, principles and practices for

describing big data system architectures.
© ISO/IEC 2020 – All rights reserved PROOF/ÉPREUVE 5
---------------------- Page: 10 ----------------------
ISO/IEC TR 20547-1:2020(E)

Reference architectures are developed to meet a wide variety of objectives as shown in Figure 1 taken

from Reference [15], which goes on to describes that the core purpose of a reference architecture is to

be forward looking and should be used (referenced) as the basis for future implementations.

Figure 1 — The concept of reference architectures
7.3 Reference architecture structure

Figure 2 combines concepts and structures from ISO/IEC/IEEE 42010 to depict the outline for a

reference architecture structure.

A reference architecture is defined for a domain. for this reference architecture, the domain is big data.

The domain in turn defines the environment, in the case of big data the environment is primarily defined

by the core characteristics of big data — volume, velocity, variety, variability (see ISO/IEC 20546).

The stakeholders in this environment includes all the common stakeholders (users, owners, architects,

etc.) for any system along with anyone having a concern related to the data and its characteristics.

The environment bounds the concerns. Since the environment is defined by the big data characteristics

the concerns are bound by those characteristics and each concern should relate to one or more of those

characteristics along with the stakeholder(s) which have that concern.

The reference architecture is described using an architecture framework. This framework is described

in ISO/IEC 20547-3 and is presented in terms of two view points:
— roles and activities — user view;
— functional components — functional view.
Each of these viewpoints in turn addresses one or more concerns.
6 PROOF/ÉPREUVE © ISO/IEC 2020 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC TR 20547-1:2020(E)

Those concerns can be addressed by one or more roles, activities and functional components within

those architecture views.

EXAMPLE In a credit monitoring system, everyone with a record in that system is a stakeholder. Most have a

concern that their privacy is maintained, and the security of their personal information is protected.

Within both the user and functional views of the big data reference architecture, there is a security and

privacy cross-cutting aspect. This cross-cutting aspect has a relation with the activity "perform audit"

and functional component "audit framework" which addresses that concern.

Figure 2 — Basic outline of a reference architecture structure based on ISO/IEC/IEEE 42010

8 Big data reference architecture elements
8.1 Overview

For the big data system environment, this document provides BDRA structure by providing the scope

of each part of the ISO/IEC 20547 series, logical relationships of each document, and application process

of BDRA. The scope of each document in the ISO/IEC 20547 series is as follows:

— ISO/IEC 20547-1: Framework and application process describes the framework of the big data

reference architecture and the process for how a user of the standard can apply it to their particular

problem domain;

— ISO/IEC 20547-2: Use cases and derived requirements provides examples of big data use cases with

application domains and technical considerations derived from the contributed use cases;

— ISO/IEC 20547-3: Reference architecture specifies the big data reference architecture (BDRA). The

reference architecture includes concepts and architectural views (user view and functional view);

— ISO/IEC 20547-4: Security and privacy specifies the security and privacy aspects applicable to

the Big Data Reference Architecture (BDRA) including the big data roles, activities, and functional

components, and also provides guidance on security and privacy operations for big data;

© ISO/IEC 2020 – All rights reserved PROOF/ÉPREUVE 7
---------------------- Page: 12 ----------------------
ISO/IEC TR 20547-1:2020(E)

— ISO/IEC 20547-5: Standards roadmap describes big data relevant standards, both in existence and

under development, along with priorities for future big data standards development based on gap

analysis.

Figure 3 shows the relationships and iteration cycle of each part of the ISO/IEC 20547 series. Based on

the contributions from enterprises, organizations and experts of the related research and academia

related, ISO/IEC TR 20547-2 collects use cases and derives technical considerations. ISO/IEC 20547-3

defines reference architecture for big data by reflecting these technical considerations. ISO/IEC 20547-4,

in particular, specifies the security and privacy aspect to support big data. ISO/IEC 20547-5 provides

the applicable list of standards at the BDRA perspective. The ISO/IEC 20547 series represents a point-

in-time view of big data systems and architectures. As big data implementations are created and

evolve based on ISO/IEC 20547-1, ISO/IEC TR 20547-2, ISO/IEC 20547-3 and ISO/IEC 20547-4, they

will reference and make use of the standards documented in ISO/IEC 20547-5. Those new systems

can be documented in ISO/IEC TR 20547-2 as new use cases leading to new technical considerations.

The technical considerations introduced by those use cases can lead to new standardization activities

resulting in new standards to be documented in ISO/IEC 20547-5.
Figure 3 — Relationships between the parts of the ISO/IEC 20547 series

In order to apply this framework to a specific use case, it is necessary to understand the overall

environment in which the big data system will be implemented, who the stakeholders are for that

system, and what are the concerns of those stakeholders. Subclauses 8.2 to 8.4 describe each of these

key components.
8.2 Stakeholders

ISO/IEC/IEEE 42010 defines a stakeholder as any individual, team, organization or classes thereof,

having an interest in a system. Common stakeholders include the system owners, customers, system

implementors and others. In the case of big data systems, the stakeholders also include anyone with an

interest in the data being processed by the system. This includes the data owners who can be providing

data to the big data system, the data consumers who are making decisions based on the data coming

from the big data system, and also those people or organizations who can be described by the data.

Identification of the stakeholders and their concerns is the first step in developing a big data architecture.

ISO/IEC 205
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.