Network Functions Virtualisation (NFV); Assurance; Report on Active Monitoring and Failure Detection

DGS/NFV-REL004

General Information

Status
Published
Publication Date
14-Apr-2016
Current Stage
12 - Completion
Due Date
29-Apr-2016
Completion Date
15-Apr-2016
Ref Project

Buy Standard

Standard
ETSI GS NFV-REL 004 V1.1.1 (2016-04) - Network Functions Virtualisation (NFV); Assurance; Report on Active Monitoring and Failure Detection
English language
61 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

ETSI GS NFV-REL 004 V1.1.1 (2016-04)






GROUP SPECIFICATION
Network Functions Virtualisation (NFV);
Assurance;
Report on Active Monitoring and Failure Detection
Disclaimer
The present document has been produced and approved by the Network Functions Virtualisation (NFV) ETSI Industry
Specification Group (ISG) and represents the views of those members who participated in this ISG.
It does not necessarily represent the views of the entire ETSI membership.

---------------------- Page: 1 ----------------------
2 ETSI GS NFV-REL 004 V1.1.1 (2016-04)



Reference
DGS/NFV-REL004
Keywords
assurance, NFV, testing

ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the
print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© European Telecommunications Standards Institute 2016.
All rights reserved.

TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members.
TM
3GPP and LTE™ are Trade Marks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI

---------------------- Page: 2 ----------------------
3 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 6
3 Definitions and abbreviations . 8
3.1 Definitions . 8
3.2 Abbreviations . 9
4 Active Monitoring in traditional networks . 9
5 Impact of NFV on active monitoring . 10
6 Proposed Active Monitoring Framework for NFV . 12
6.0 Introduction . 12
6.1 Roles and responsibilities for a virtual test agent . 12
6.2 Roles and responsibilities for a Test Controller . 13
6.3 Roles and Responsibilities for Test Results Analysis Module . 14
6.4 Workflow Definition . 14
7 Alternate Active Monitoring Architecture Considerations . 16
7.0 Introduction . 16
7.1 Alternate workflow definition . 17
8 Fault Notification Quality Indicators . 17
8.1 Purpose . 17
8.2 Canonical Failure Notification Model . 17
8.3 Quantitative Failure Notification Indicators . 19
8.4 Failure Notification Quality Indicators in NFV. 19
9 Methods of Measurement . 20
9.1 Introduction . 20
9.2 Service Activation . 20
9.3 Fault Isolation and Troubleshooting . 22
9.4 Failure detection . 24
9.5 Framework for End to End in Situ Monitoring . 25
9.6 Capacity Planning. 27
9.6.0 Introduction. 27
9.6.1 Capacity validation . 27
9.6.2 Capacity planning forecast . 29
9.6.3 Optimize service endpoint location . 29
9.7 Performance monitoring . 29
9.7.1 SLA Monitoring for E2E services . 29
9.7.2 Overload Detection . 31
9.8 Use Case: Active Monitoring of Service Chains . 32
10 Evaluating NFV Resiliency . 34
10.0 Introduction . 34
10.1 Network Resiliency Principles . 34
10.2 NFV Resiliency Evaluation using active fault injection . 35
10.2.0 Introduction. 35
10.2.1 Fault Injection framework for evaluating NFV resiliency . 35
10.2.2 Multilevel Challenge/Fault modelling . 36
10.2.3 NFVI faults & failures . 37
10.3 Traffic Tolerance . 38
10.4 Failure Impact . 38
ETSI

---------------------- Page: 3 ----------------------
4 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
11 Security Considerations . 39
12 Deployment Scenarios . 40
13 Recommendations . 40
Annex A (informative): Active Monitoring Framework Specifics . 42
A.1 Why Active Monitoring . 42
A.2 Test VNF . 42
A.2.1 Test VNF Descriptor . 42
A.2.2 Test VNF Record (VNFR) . 43
A.2.3 Test Instruction Set . 44
A.2.4 KeepAlive messages. 44
A.3 Test Measurement Methods . 45
A.3.1 Fault Localization . 45
A.3.2 NFVI Metrics for fault co-relation . 47
A.4 Synchronization protocol definition for Test Controller HA . 47
Annex B (informative): Test Workload Distributions . 48
B.1 Service QoS & QoE measurements methods . 48
B.2 User Traffic Workloads and Distributions . 49
B.2.1 Workload Mix . 49
B.2.2 Workload Distributions . 49
Annex C (informative): Example Measurement Methods and Metrics . 51
C.1 Example SLAs . 51
C.2 Application Test Methodologies for QoE measurements. 51
C.2.1 NetFLIX™ Adaptive Streaming Load Generator with Quality Detection . 51
C.2.2 HTTP Adaptive Streaming Video Start Time and Underflow Scale Test . 53
C.2.3 CloudMark Virtualised Performance Benchmark . 54
C.2.4 Example Test Methodology for Evaluating NFV Resiliency . 57
Annex D (informative): Authors & contributors . 59
Annex F (informative): Bibliography . 60
History . 61



ETSI

---------------------- Page: 4 ----------------------
5 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This Group Specification (GS) has been produced by ETSI Industry Specification Group (ISG) Network Functions
Virtualisation (NFV).
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.

ETSI

---------------------- Page: 5 ----------------------
6 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
1 Scope
The present document develops a report detailing methods for active monitoring of VNFs, NFVI and E2E network
services and detection of failures. It addresses the following two aspects of active monitoring:
1) Periodic testing of VNFs and service chains in a live environment to ensure proper functionality and
performance adherence to SLAs.
2) Failure prevention and detection - Active monitoring methods for failure prevention (proactive) or timely
detection and recovery from failures. Failures include loss or degradation of network connectivity, loss or
degradation of session capacity, loss of services, VM failures, VM stalls, etc.
The present document proposes that the monitoring agents be on boarded into the NFV environment, just like other
VNFs.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
http://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
Not applicable.
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] IETF RFC 5357: "A two-way active measurement protocol".
[i.2] Recommendation ITU-T Y.1564: "Ethernet Service Activation Test Methodologies".
[i.3] IETF RFC 2544: "Benchmarking Methodology for Network Interconnect Devices".
[i.4] IETF RFC 2681: "A Round-trip Delay Metric for IPPM".
[i.5] ETSI GS NFV-SEC 003: "Network Functions Virtualisation (NFV); NFV Security; Security and
Trust Guidance".
[i.6] IETF RFC 7594: "A Framework for Large-Scale Measurement of Broadband Performance
(LMAP)".
[i.7] IETF RFC 7536: "Large-Scale Broadband Measurement Use Cases".
[i.8] IETF draft-ietf-lmap-information-model-06: "Information Model for Large-Scale Measurement
Platforms (LMAP)".
ETSI

---------------------- Page: 6 ----------------------
7 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
[i.9] Recommendation ITU-T Y.1731: "Internet protocol aspects - Quality of service and network
Performance".
[i.10] ISO/IEC/IEEE 24765:2010: "Systems and software engineering - Vocabulary".
[i.11] IETF RFC 6349: "Framework for TCP Throughput Testing".
[i.12] ETSI GS NFV 003 (V1.1.1): "Network Functions Virtualisation (NFV); Terminology for Main
Concepts in NFV".
[i.13] ETSI GS NFV-MAN 001: "Network Functions Virtualisation (NFV); Management and
Orchestration".
[i.14] ETSI GS NFV-REL 001 (V1.0.0): "Network Functions Virtualisation (NFV); Resiliency
Requirements".
[i.15] Saurabh Kumar Garg et al.: "SLA-based virtual machine management for heterogeneous
workloads in a cloud datacenter", Journal of Network and Computer Applications, Vol. 45,
October 2014, pp. 108-120.
[i.16] Eric Bauer, and Randee Adams: "Service Quality of Cloud-Based Applications, Wiley-IEEE
Press, February 2014.
[i.17] TM Forum Cloud SLA Application Note Version 1.2 - GB963.
[i.18] TM Forum TR 178: "E2E Cloud SLA Management".
[i.19] Raimund Schatz, Tobias Hoßfeld, Lucjan Janowski, and Sebastian Egger: "From Packets to
People: Quality of Experience as a New Measurement Challenge", in 'Data Traffic Monitoring and
Analysis' (E. Biersack, C. Callegari, and M. Matijasevic, Eds.), Springer Lecture Notes in
Computer Science, Vol. 7754, 2013.
[i.20] OPNFV Doctor project stable draft.
NOTE: Available at https://wiki.opnfv.org/display/doctor/Doctor+Home.
[i.21] Michael R. Lyu (Ed.): "Handbook of Software Reliability Engineering", IEEE Computer Society
Press & McGraw-Hill, 1996.
[i.22] SNAPSHOT Draft: "NFV Quality Management Framework", April 23, 2015.
NOTE: The NFV white paper is posted on the NFV team portal on the QuEST Forum member web site/Executive
Board/NFV Strategic Initiative/Files & Documents.
[i.23] D. Cotroneo, L. De Simone, A. Ken Iannillo, A. Lanzaro, and R. Natella: "Dependability
Evaluation and Benchmarking of Network Function Virtualization Infrastructures", IEEE
Conference on Network Softwarization, London, UK, April 2015.
[i.24] CSMIC defined measures.
NOTE: Available at http://csmic.org.
[i.25] "NIST Cloud Computing Cloud Services Description", Rev. 2.3d9.
th
[i.26] R. Ghosh, F. Longo, V.K. Naik, and K.S. Trivedi: "Quantifying Resiliency of IaaS Cloud", 29
IEEE International Symposium on Reliable Distributed Systems, New Delhi, Punjab, India,
October-November 2010.
[i.27] J.P.G. Sterbenz, E.K. Çetinkaya, M.A. Hameed, A. Jabbar, S. Qian, J.P. Rohrer: "Evaluation of
network resilience, survivability, and disruption tolerance: analysis, topology generation,
simulation, and experimentation", Telecommunication Systems, Vol. 52, Issue 2, February 2013,
pp. 705-736.
[i.28] ETSI GS NFV-REL 002 (V1.0.0): "Network Functions Virtualisation (NFV); Reliability; Report
on Scalable Architectures for Reliability Management".
ETSI

---------------------- Page: 7 ----------------------
8 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
[i.29] ETSI GS NFV-REL 003: "Network Functions Virtualisation (NFV); Reliability; Report on Models
and Features for E2E Reliability".
[i.30] ETSI GS NFV-SEC 008: "Security Management and Monitoring for NFV".
[i.31] ETSI GS NFV-REL 005 (V1.1.1): "Network Functions Virtualisation (NFV); Accountability;
Report on Quality Accountability Framework".
[i.32] IETF draft-browne-sfc-nsh-timestamp-00: "Network Service Header Timestamping".
NOTE: Available at https://tools.ietf.org/html/draft-browne-sfc-nsh-timestamp-00.
[i.33] IETF draft-irtf-nfvrg-resource-management-service-chain-02: "Resource Management in Service
Chaining".
NOTE: Available at https://tools.ietf.org/html/draft-irtf-nfvrg-resource-management-service-chain-02.
[i.34] Mark Sylor: "Testing the Cloud," EXFO White Paper 023, 2012.
3 Definitions and abbreviations
3.1 Definitions
For the purposes of the present document, the terms and definitions given in ETSI GS NFV-REL 001 [i.14], ETSI
GS NFV 003 [i.12] and the following apply:
failure: termination of the ability of a product to perform a required function or its inability to perform within
previously specified limits or an event in which a system or system component does not perform a required function
within specified limits
NOTE: As defined in ISO/IEC/IEEE 24765:2010 [i.10].
FaultLoad: set of faults to inject in the NFVI for resiliency evaluation
NOTE: As defined in [i.23].
frame loss ratio: ratio, expressed as a percentage, of the number of service frames not delivered divided by the total
number of service frames during a time interval T, where the number of service frames not delivered is the difference
between the number of service frames arriving at the ingress ETH flow point and the number of service frames
delivered at the egress ETH flow point in a point-to-point ETH connection
NOTE: As defined in Recommendation ITU-T Y.1731 [i.9].
frame delay: round-trip delay for a frame, where frame delay is defined as the time elapsed since the start of
transmission of the first bit of the frame by a source node until the reception of the last bit of the loop backed frame by
the same source node, when the loopback is performed at the frame's destination node
NOTE: As defined in Recommendation ITU-T Y.1731 [i.9].
frame delay variation: measure of the variations in the frame delay between a pair of service frames, where the service
frames belong to the same CoS instance on a point-to-point ETH connection
NOTE: As defined in Recommendation ITU-T Y.1731 [i.9].
Test Controller: management module responsible for management of the test agents/probes
NOTE 1: Provides test instructions to the test probes.
NOTE 2: Co-ordinates the test scheduling when multiple tests with large number of test probes are executed.
NOTE 3: Retrieves results from the results analysis engine to provide actionable information to the network
operator via NFVO. In this case result reporting to OSS/BSS via NFVO has been used as a deployment
option to keep a single interface for communication between Test Controller and MANO. This keeps the
changes required to interfaces of the MANO components to minimum and minimizes the effort for Active
monitoring System integration with NFV framework.
ETSI

---------------------- Page: 8 ----------------------
9 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
Test Results Analysis Module (TRAM): integral part of the active monitoring framework that collects or receives test
results from the VTAs, NFVI resource statistics and alarms from VIM and analyses test results and presents it to Test
Controller, NFVO or other management entities in an actionable format
throughput: maximum rate at which no frame is dropped. This is typically measured under test conditions
NOTE: As defined in IETF RFC 2544 [i.3].
Virtual Test Agent (VTA): VNF for active monitoring probe capable of sending and analysing control plane and data
plane testing
3.2 Abbreviations
For the purposes of the present document, the abbreviations given in ETSI GS NFV 003 [i.12] and the following apply:
BSS Business Support Systems
CBS Constant Bit Rate
CCDF Complementary Cumulative Distribution Function
CiGoodput Cloud infrastructure Goodput
CiQoE Cloud infrastructure Quality of Experience
CiR Cloud infrastructure Reliability
CoS Class of Service
DPI Deep Packet Inspection
DUT Device Under Test
EBS Excess Burst Size
EIR Excess Information Rate
IETF Internet Engineering Task Force
IPPM IP Performance Metrics
LMAP Large scale Measurement of Broadband Performance
NFF No Fault Found
OSS Operations Support Systems
PoP Point of Presence
PPB Parts Per Billion
PTP Precision Time Protocol
NTP Network Time Protocol
NSR Network Service Record
QoE Quality of Experience
QoS Quality of Service
SLA Service Level Agreement
SPC Statistical Process Control
TCO Total Cost of Ownership
TRAM Test Results Analysis Module
VLR Virtual Link Record
VNFR Virtual Network Function Record
VTA Virtual Test Agent
4 Active Monitoring in traditional networks
In general the 3 stages of service lifecycle are addressed in the present document:
1) Service activation - whereby a service or VNF is deployed and verified that the service is running as expected.
2) Service monitoring - where the resource usage by a service is monitored and management components are
alerted upon KPI violation.
3) Service debug - where troubleshooting probes and tools to ascertain the root cause of a service failure are used.
Live testing typically involves end-to-end testing of services versus single node testing where the testing can be
performed at the pre-activation, or post-activation, of services. Three key components of a test system in live networks
are:
1) Test Controller;
ETSI

---------------------- Page: 9 ----------------------
10 ETSI GS NFV-REL 004 V1.1.1 (2016-04)
2) results analysis module; and
3) test agent.
In non-NFV network deployments, the testing agents are typically deployed in the long-term as long as the testing or
monitoring PoP does not change. Test Controller and results analysis module can be part of the OSS/BSS system or can
be a standalone application in the same administration domain as the OSS/BSS system. Figure 1 illustrates a generic
active monitoring deployment scenario.

Figure 1: Live network testing in non-NFV networks
Network monitoring methods can be categorized into active, passive or hybrid modes.
• Active may operate in two modes:
- Test mode involves sending test traffic (based on an OAM protocol such as Recommendation
ITU-T Y.1731 [i.9] or alternative) into the network to validate the services and applications performance,
SLAs and to perform fault isolation.
- Subscriber mode involves marking subscriber traffic user plane headers in a way such that QoE for
subscribers may be derived accurately as flows traverse the network.
• Passive mode testing involves observing the user traffic, providing an analysis based on this untampered traffic
and raising alarms if pre-set thresholds are crossed.
• Hybrid mode approach, as the name suggests, uses the information obtained from both
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.