Road vehicles — Application of predictive maintenance to hardware with ISO 26262-5

Véhicules routiers — Application de la maintenance prédictive au matériel à l'aide de l'ISO 26262-5

General Information

Status
Not Published
Current Stage
5020 - FDIS ballot initiated: 2 months. Proof sent to secretariat
Start Date
24-May-2023
Completion Date
24-May-2023
Ref Project

Buy Standard

Draft
REDLINE ISO/DTR 9839 - Road vehicles — Application of predictive maintenance to hardware with ISO 26262-5 Released:10. 05. 2023
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/DTR 9839 - Road vehicles — Application of predictive maintenance to hardware with ISO 26262-5 Released:10. 05. 2023
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

ISO/TR DTR 9839
2023-08
ISO/TC 22/SC 32
Secretariat: AFNOR JISC
Date: 2023-05-10
Road vehicles — Application of predictive maintenance to
hardware with ISO 26262-5

Véhicules routiers — Application de l’entretienla maintenance prédictive au matériel avec ISO à l'aide de

l'ISO 26262-5
FDIS stage
© ISO 2023 – All rights reserved
---------------------- Page: 1 ----------------------
© ISO 2023

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this

publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical,

including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can

be requested from either ISO at the address below or ISO’s member body in the country of the requester.

ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
Fax: +41 22 749 09 47
EmailE-mail: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO 2023 – All rights reserved 2
---------------------- Page: 2 ----------------------
ISO/TRDTR 9839:2023(:(E)
Contents

Foreword ............................................................................................................................................................................................. v

Introduction ....................................................................................................................................................................................... vi

1 Scope ....................................................................................................................................................................................... 1

2 Normative references ....................................................................................................................................................... 1

3 Terms and definitions ...................................................................................................................................................... 1

4 Abbreviated terms ............................................................................................................................................................. 2

5 Literature survey of degrading faults ........................................................................................................................ 5

5.1 General ................................................................................................................................................................................... 5

5.2 Degrading faults in industry standards .................................................................................................................... 5

[4]

5.2.1 JEDEC JEP122H .......................................................................................................................................................... 5

5.3 Degrading faults in technical publications .............................................................................................................. 6

[5]

5.3.1 Advanced CMOS Reliability Update: Sub 20 nm FinFET Assessment .................................................. 6

[6]

5.3.2 Circuit-Based Reliability Consideration in FinFET Technology ............................................................. 7

[7]

5.3.3 Intermittent Faults and Effects on Reliability of Integrated Circuits ................................................... 7

6 Literature survey on predictive maintenance........................................................................................................ 8

6.1 General ................................................................................................................................................................................... 8

6.2 Predictive maintenance in industry standards...................................................................................................... 8

[9]

6.2.1 IEC 61508 ..................................................................................................................................................................... 8

[3]

6.2.2 IEEE 1856 ..................................................................................................................................................................... 8

6.3 Predictive maintenance in technical publications................................................................................................ 9

[10]

6.3.1 A Survey of Online Failure Prediction Methods .......................................................................................... 9

[11]

6.3.2 An Odometer for CPUs ........................................................................................................................................... 9

[12]

6.3.3 Circuit Failure Prediction for Robust System Design in Scaled CMOS ........................................... 10

[13]

6.3.4 A Circuit Failure Prediction Mechanism (DART) for High Field Reliability ................................. 10

[14]

6.3.5 Predicting Remediations for Hardware Failures in Large-Scale Datacenters ............................. 10

[15]

6.3.6 Improving Analog Functional Safety Using Data-Driven Anomaly Detection ............................ 10

7 Degrading faults and the ISO 26262 series .......................................................................................................... 10

7.1 Understanding the lifecycle of degrading faults ................................................................................................. 10

7.2 Classification of degrading faults ............................................................................................................................. 14

7.3 Quantifying degrading fault base failure rate ..................................................................................................... 15

7.3.1 Industry standards and models ........................................................................................................................... 15

7.3.2 Field data ....................................................................................................................................................................... 15

7.3.3 Expert judgement ....................................................................................................................................................... 15

8 Applying predictive maintenance ............................................................................................................................ 15

© ISO 2023 – All rights reserved iii
© ISO 2023 – All rights reserved 3
---------------------- Page: 3 ----------------------
ISO/TRDTR 9839:2023(:(E)

8.1 Diagnostic coverage (DC) evaluation for predictive mechanisms .............................................................. 15

8.2 Considering random hardware metrics ................................................................................................................ 16

8.2.1 Impacting the SPFM and LFM ............................................................................................................................... 16

8.2.2 Application as a dedicated measure ................................................................................................................... 16

8.3 Considering RUL prediction ....................................................................................................................................... 17

Annex A (informative) An approach to handling degrading faults .......................................................................... 18

Bibliography .................................................................................................................................................................................... 20

4 © ISO 2023 – All rights reserved
© ISO 2023 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/TRDTR 9839:2023(:(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO

collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documentsdocument should be noted. This document was drafted in accordance

with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Field Code Changed

Attention is drawnISO draws attention to the possibility that some of the elementsimplementation of this

document may beinvolve the subjectuse of (a) patent(s). ISO takes no position concerning the evidence,

validity or applicability of any claimed patent rights in respect thereof. As of the date of publication of

this document, ISO had not received notice of (a) patent(s) which may be required to implement this

document. However, implementers are cautioned that this may not represent the latest information,

which may be obtained from the patent database available at www.iso.org/patents.. ISO shall not be held

responsible for identifying any or all such patent rights. Details of any patent rights identified during the

development of the document will be in the Introduction and/or on the ISO list of patent declarations

received (see ).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to the World

Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.
Field Code Changed

This document was prepared by Technical Committee ISO/TC 22, Road Vehiclesvehicles, Subcommittee

SC 32, Electrical and electronic components and general system aspects.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2023 – All rights reserved v
© ISO 2023 – All rights reserved 5
---------------------- Page: 5 ----------------------
ISO/TRDTR 9839:2023(:(E)
Introduction

Hardware elements wear out or degrade with time and usage. The presence of certain faults can cause

the rate of degradation to increase. If the rate of degradation exceeds critical thresholds, then a hardware

element can fail during its normal expected lifespan. Addressing fault behaviours which change over time

is difficult. Functional safety standards such as the ISO 26262 series have traditionally addressed

degrading faults with avoidance measures and simplified assumptions of static behaviours.

Understanding of degrading faults is improving over time. Many industries are taking proactive steps to

control degrading faults using predictive maintenance. Predictive maintenance can detect degrading

faults and predict remaining useful life. Safety mechanisms based on predictive maintenance are not

explicitly discussed in the ISO 26262:2018 series.

This document provides a survey of current state of the art for degrading faults and predictive

maintenance techniques. Approaches are presented to consider degrading faults and predictive

maintenance techniques in an ISO 26262 safety argument. Much of the content is focused on

semiconductors, but the concepts can be applied to other hardware elements.
6 © ISO 2023 – All rights reserved
© ISO 2023 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/DTR 9839:(E)
Road vehicles — Application of predictive maintenance to
hardware with ISO 26262-5
1 Scope

This document is intended to be applied to the usage of predictive maintenance methods for the detection

of degrading faults in safety related E/E hardware elements. It applies to hardware elements developed

[1]

for compliance with the ISO 26262 series in which degrading faults are shown to be relevant due to, for

instance, the technology used.

Specific technical implementations of predictive maintenance solutions are not in scope of this document.

2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 26262--1: 2018, Road vehicles –— Functional Safety safety — Part 1: Vocabulary

ISO 26262-5: 2018 Road vehicles – Functional Safety — Part 5: Product development at the hardware level

43 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 26262-1:2018 and the following

apply.

ISO and IEC maintain terminologicalterminology databases for use in standardization at the following

addresses:
— — ISO Online browsing platform: available at https://www.iso.org/obp
— — IEC Electropedia: available at https://www.electropedia.org/
3.1
degrading fault

fault whose characteristics are not constant and degrade over time, that can result in an error or failure

when stimulated after degradation exceeds a critical threshold

Note 1 to entry: Permanent and intermittent faults can first manifest as degrading faults. Transient faults do not

manifest as degrading faults.

Note 2 to entry: Degrading faults do not create errors or failures until degradation exceeds critical thresholds. The

capability to generate an error or failure is related to the current state of degradation.

Note 3 to entry: Degrading faults exhibit abnormal conditions which can cause an error or failure over time. Normal

degradation does not exhibit abnormal conditions which are necessary to be classified as a fault. Normal

degradation can result in a loss of functionality after expected lifespan has elapsed but cannot be considered a fault

as it is not abnormal.
3.2
degrading fault detection time interval
(DFDTI)
timespan from the occurrence of a degrading fault ((3.1) to its detection
© ISO 2023 – All rights reserved 1
---------------------- Page: 7 ----------------------
ISO/DTR 9839:(E)
3.3
degrading fault handling time interval
(DFHTI)

sum of the degrading fault detection time interval ((3.2) and the degrading fault reaction time interval

((3.4).

Note 1 to entry: The degrading fault handling time interval is a property of a predictive maintenance (3.5) related

safety mechanism.

Note 2 to entry: The degrading fault handling time interval is considered in addition to the fault handling time

interval. See Figure 4.

Note 3 to entry: The timespan from occurrence of a degrading fault ((3.1) until it has the capability to generate an

error or failure is the maximum degrading fault handling time interval that can be specified for a predictive

maintenance related safety mechanism to support the functional safety concept.

Note 4 to entry: A degrading fault ((3.1) is covered in a timely manner by the corresponding safety mechanism if

there is detection and reaction within the degrading fault handling time interval.

3.4
degrading fault reaction time interval
(DFRTI)

timespan from the detection of a degrading fault ((3.1) to reaching a safe state or reaching emergency

operation
3.5
predictive maintenance

techniques that are used to detect degrading faults ((3.1), predict remaining useful life ((3.6), and react

appropriately

Note 1 to entry: Approaches include the use of data driven methods such as machine learning applied locally or on

a remote system. Guidance for developing safety related ML systems can be found in ISO/IEC TR 5469 and ISO PAS

[2 ]
8800 . .

Note 2 to entry: Prediction of remaining useful life ((3.6) can be used to replace a faulty element before it can cause

an error or failure.
3.6
remaining useful life
(RUL)

length of time from the present time to the estimated time that the item or element is expected to no

longer perform its intended function within desired specifications
[SOURCE: IEEE Std 1856-2017, modified for compliance to ISO directives]

Note 1 to entry: RUL can be estimated using predictive maintenance ((3.5) or with other approaches.

Note 2 to entry: RUL can be estimated for expected degradation or degradation in the presence of a fault.

[3]
[SOURCE: IEEE 1856-2017 , modified for compliance to ISO directives]
54 Abbreviated terms
ADAS Advanced Driver Assistance System
ADS Automated Driving System
2 © ISO 2023 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/DTR 9839:(E)
AI Artificial Intelligence
BEoL Back End of Line (sometimes BEOL)
BFR Base Failure Rate
BIST Built-In Self-Test
BLM Barrier Layer Material
CHC Channel Hot Carrier
COTS Commercial Off The Shelf
DC Diagnostic Coverage
DFDTI Degrading Fault Detection Time Interval
DFHTI Degrading Fault Handling Time Interval
DFRTI Degrading Fault Reaction Time Interval
DRAM Dynamic Random Access Memory
EM Electromigration
ESD Electrostatic Discharge
FEoL Front End of Line (sometimes FEOL)
FET Field Effect Transistor
FDTI Fault Detection Time Interval
FHTI Fault Handling Time Interval
FTTI Fault Tolerant Time Interval
HCI Hot Carrier Injection
ILD Inter-Layer Dielectric
LFM Latent Fault Metric
ML Machine Learning
MoL Middle of Line (sometimes MOL)
MEoL Middle End of Line (sometimes MEOL)
MPFDTI Multiple Point Fault Detection Time Interval
NBTI Negative Bias Temperature Instability
NVM Non-Volatile Memory
PCM Phase Change Memory
© ISO 2023 – All rights reserved 3
---------------------- Page: 9 ----------------------
ISO/DTR 9839:(E)
PHM Prognostics and Health Management
QMS Quality Management System
RUL Remaining Useful Life
SBD Soft Breakdown
SHE Self Heating Effect
SILC Stress Induced Leakage Current
SM Stress Migration
SoC System on Chip
SPFM Single Point Fault Metric
TDDB Time Dependent Dielectric Breakdown
TDJD Time Dependent Junction Degradation
TID Total Ionizing Dose
ADAS Advanced Driver Assistance System
ADS Automated Driving System
AI Artificial Intelligence
BEoL Back End of Line (sometimes BEOL)
BFR Base Failure Rate
BLM Barrier Layer Material
CHC Channel Hot Carrier
COTS Commercial Off The Shelf
DC Diagnostic Coverage
DFDTI Degrading Fault Detection Time Interval
DFHTI Degrading Fault Handling Time Interval
DFRTI Degrading Fault Reaction Time Interval
DRAM Dynamic Random Access Memory
EM Electromigration
ESD Electrostatic Discharge
FEoL Front End of Line (sometimes FEOL)
FET Field Effect Transistor
FDTI Fault Detection Time Interval
FHTI Fault Handling Time Interval
FTTI Fault Tolerant Time Interval
HCI Hot Carrier Injection
4 © ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/DTR 9839:(E)
ILD Inter-Layer Dielectric
LFM Latent Fault Metric
ML Machine Learning
MoL Middle of Line (sometimes MOL)
MEoL Middle End of Line (sometimes MEOL)
MPFDTI Multiple Point Fault Detection Time Interval
NBTI Negative Bias Temperature Instability
NVM Non-Volatile Memory
PCM Phase Change Memory
PHM Prognostics and Health Management
RUL Remaining Useful Life
SBD Soft Breakdown
SHE Self-Heating Effect
SILC Stress-Induced Leakage Current
SM Stress Migration
SoC System on Chip
SPFM Single Point Fault Metric
TDDB Time Dependent Dielectric Breakdown
TDJD Time Dependent Junction Degradation
TID Total Ionizing Dose
65 Literature survey of degrading faults
5.1 General

This technical reportdocument reviews many technical documents to summarize the current state of the

art understanding of degrading faults in industry standards and technical publications.

NOTE : Terminology in the referenced publications and standards is not always aligned to ISO 26262 terms

and definitions of the ISO 26262 series. When referencing publications and standards, the terminology of the

referenced work is used.
6.15.2 Degrading faults in industry standards
[4]
6.1.15.2.1 JEDEC JEP122H

The JEDEC Solid State Technology Association is a semiconductor industry trade association and

standardization body. JEDEC has over 300 companies as members and publishes electronics standards

on a wide variety of topics.

The JEDEC JEP122H standard is the latest revision on JEDEC’s standard for “Failure Mechanisms and

Models for Semiconductor Devices,” last updated in 2016. The standard describes eighteen different

failure mechanisms, classifying them as being related to the die Front Endfront end of Lineline (FEoL),

die Back Endback end of Lineline (BEoL), or packaging. Models are provided for estimating the rates of

degradation per failure mode. The information provided in JEP122H is validated by a team of reliability

experts from the SEMATECH/ISMI Reliability Council and supported by extensive references to technical

publications.
© ISO 2023 – All rights reserved 5
---------------------- Page: 11 ----------------------
ISO/DTR 9839:(E)
The die FEoL failure mechanisms described by the JEP122H include:

— Time Dependent Dielectric Breakdown time dependent dielectric breakdown (TDDB) due to gate

oxide breakdown;
— Hot Carrierhot carrier Injection (HCI) );

— Negative Bias Temperature Instabilitynegative bias temperature instability (NBTI) );

— Surfacesurface inversion due to mobile ions;

— Floatingfloating gate Non-Volatile Memorynon-volatile memory (NVM) data retention;

— Localizedlocalized charge trapping NVM data retention;
— Phase Change Memoryphase change memory (PCM) NVM data retention.
The die BEoL failure mechanisms described by JEP122H include:
— TDDB due to ILD/Lowlow-k/Mobilemobile Cu ions;
— Aluminium Electromigrationaluminium electromigration (EM) );
— Coppercopper EM;
— Aluminiumaluminium and copper corrosion;
— Aluminium Stress Migrationaluminium stress migration (SM) );
— Coppercopper SM.
The packaging failure mechanisms described in JEP122H include:
— Fatiguefatigue failures due to temperature cycling and thermal shock;
— Interfacialinterfacial failures due to temperature cycling and thermal shock;
— Intermetallicintermetallic and oxidation failure due to high temperature;
— Tintin whiskers;
— Ion Mobility Kineticsion mobility kinetics due to component cleanliness.
6.25.3 Degrading faults in technical publications
[5]
6.2.15.3.1 Advanced CMOS Reliability Update: Sub 20nm20 nm FinFET Assessment

Reference [5] was published by Sandia National Laboratories, a research organization of the United

States Department of Energy, in 2020. The purpose of the report is to document the most critical failure

modes impacting advanced semiconductor technologies using FinFET technology. FinFET based

semiconductors are used for most current generation SoCs (Systemsystem on Chipchip devices), dGPUs

(discrete Graphics Processing Unitsgraphics processing units), and DRAMs (Dynamic Random-Access

Memoriesdynamic random-access memories) which are used in infotainment, ADAS (Advanced Driver

Assistance Systemsadvanced driver assistance systems), and ADS (Automated Driving Systemautomated

driving system) applications. While the use of FinFET transistors enables smaller process geometries

(e.g., <20nm. <20 nm feature size) and faster processing, it also changes the failure mode susceptibility

characteristics compared to more traditional planar transistor technologies found in 28nm28 nm and

larger process technologies.
6 © ISO 2023 – All rights reserved
---------------------- Page: 12 ----------------------
ISO/DTR 9839:(E)
The report provides details for the following failure modes:
— Diedie related failure modes:
— Bias Temperature Instabilitybias temperature instability (BTI) );
— Dielectricdielectric integrity;
o Hot Carrier Injection (HCI)
— HCI;
— BEoL, EM, and stress voiding;

— Middle Endmiddle end of Lineline (MEoL) concerns (also known as Middlemiddle of Lineline,

or MoL) );
— Packagingpackaging and package-die interaction;

— Integratedintegrated die design and process reliability – electrostatic discharge (ESD) );

— Radiationradiation effects:
— Total Ionizing Dosetotal ionizing dose (TID));
— Displacementdisplacement damage;
— COTS electronics and radiation effects.

The die and packaging related failure modes discussed can generally be argued to manifest as random

degrading faults before becoming intermittent or permanent faults. The ESD and radiation effects can

generally be argued to be systematic or transient in nature.

Also of interest is the section on reliability degradation and its impact on circuit/system performance.

This section focuses on “soft” logic failures which manifest before “hard” physical failures of the

semiconductor devices. As most of the degradation mechanisms discussed result in parameter

degradation, it is suggested that statistical methods can be used to predict circuit failures.

[6]
6.2.25.3.2 Circuit-Based Reliability Consideration in FinFET Technology

Reference [6] is authored by four experts from Taiwan Semiconductor Manufacturing Company (TSMC)

in 2017 to describe the primary reliability failure modes of concern for FinFET based process

technologies and to present a model for estimating reliability. Comparisons are made between the

performance of 28nm28 nm planar technologies vs. 16nmversus 16 nm and 7nm7 nm FinFET

technologies.

The authors highlight several new reliability concerns including Bias Temperature Instabilitybias

temperature instability (BTI), Stress-Induced Leakage Currentsstress-induced leakage currents (SILCs),

Self-Heating Effectsself-heating effects (SHEs), and Time Dependent Junction Degradationtime

dependent junction degradation (TDJD). Models are proposed to estimate the reliability impacts of these

mechanisms, based on a combination of simulation and reliability testing.
[7]
6.2.35.3.3 Intermittent Faults and Effects on Reliability of Integrated Circuits

Reference [7] is authored by a reliability expert from AMD in 2008 and studies intermittent faults. An

experiment was conducted using more than 250 servers (from pre-2008) to provide over 300 server

years of operational data. Identified memory single bit errors were analysed for root cause, and the

findings documented. The rates of occurrence of the errors introduced by these faults can vary from one

design to another and one technology to another.
© ISO 2023 – All rights reserved 7
---------------------- Page: 13 ----------------------
ISO/DTR 9839:(E)

Failure modes discussed in this paper include ultra-thin oxide breakdown, soft breakdown (SBD),

electromigration (EM)EM voids, barrier layer material (BLM) cracks, and crosstalk as sources of

intermittent faults. It is noted that these intermittent faults can be detected by monitoring the Vmin

(voltage minimum) thresholds necessary for correct operation. Mitigations are discussed in terms of

systematic avoidance, screening at manufacturing test, and online fault detection in application including

failure prediction.
76 Literature survey on predictive maintenance
6.1 General

This technical reportdocument reviews many technical documents to summarize the current state of the

art understanding of predictive maintenance in industry standards and technical publication. Additional

[8 ]
application domain specific standards are in development (e.g.,. IEC 63270 ). ).

NOTE : Terminology in the referenced publications and standards is not always aligned to ISO 26262the terms

and definitions of the ISO 26262 series. When referencing publications and standards, the terminology of the

referenced work is used.
7.16.2 Predictive maintenance in industry standards
[9]
7.1.16.2.1 IEC 61508

The IEC 61508 standard is a basic safety publication for functional safety which was the original basis for

the ISO 26262 standardseries. The 2010 edition of the standard includes guidance on the use of fault

forecasting, maintenance, and supervisory actions supported by Artificial Intelligenceartificial

intelligence (AI) systems.
[3]
7.1.26.2.2 IEEE Std 1856

The IEEE 1856 standard, “IEEE Standard Framework for Prognostics and Health Management of

Electronic Systems,” provides an industry-independent approach to the use of predictive maintenance

and similar techniques. This standard is intended to be applied at many different levels of design

abstraction and is not specific to semiconductor technologies. This standard applies the terms

“prognostics” and “Prognostics and Health Management (PHM)” interchangeably with predictive

methods. The standard is primarily focused on estimating the remaining useful life (RUL) after a fault is

detected, rather than the method by which the fault is detected.
IEEE 1856 provides a lifecycle model for PHM as illustrated in Figure 1.:
— Thethe product is initially deployed without faults;
— Anan off-nominal behaviour (fault) is detected;
— Aa failure occurs.
8 © ISO 2023 – All rights reserved
---------------------- Page: 14 -----------
...

FINAL
TECHNICAL ISO/DTR
DRAFT
REPORT 9839
ISO/TC 22/SC 32
Road vehicles — Application of
Secretariat: JISC
predictive maintenance to hardware
Voting begins on:
2023-05-24 with ISO 26262-5
Voting terminates on:
Véhicules routiers — Application de la maintenance prédictive au
2023-07-19
matériel à l'aide de l'ISO 26262-5
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/DTR 9839:2023(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS. © ISO 2023
---------------------- Page: 1 ----------------------
ISO/DTR 9839:2023(E)
FINAL
TECHNICAL ISO/DTR
DRAFT
REPORT 9839
ISO/TC 22/SC 32
Road vehicles — Application of
Secretariat: JISC
predictive maintenance to hardware
Voting begins on:
with ISO 26262-5
Voting terminates on:
Véhicules routiers — Application de la maintenance prédictive au
matériel à l'aide de l'ISO 26262-5
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may

be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on

the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below

or ISO’s member body in the country of the requester.
RECIPIENTS OF THIS DRAFT ARE INVITED TO
ISO copyright office
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
CP 401 • Ch. de Blandonnet 8
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
CH-1214 Vernier, Geneva
DOCUMENTATION.
Phone: +41 22 749 01 11
IN ADDITION TO THEIR EVALUATION AS
Reference number
Email: copyright@iso.org
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO­
ISO/DTR 9839:2023(E)
Website: www.iso.org
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
Published in Switzerland
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN­
DARDS TO WHICH REFERENCE MAY BE MADE IN
© ISO 2023 – All rights reserved
NATIONAL REGULATIONS. © ISO 2023
---------------------- Page: 2 ----------------------
ISO/DTR 9839:2023(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction .................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ..................................................................................................................................................................................... 1

3 Terms and definitions .................................................................................................................................................................................... 1

4 Abbreviated terms ............................................................................................................................................................................................. 2

5 Literature survey of degrading faults ........................................................................................................................................... 4

5.1 General ......................................................................................................................................................................................................... 4

5.2 Degrading faults in industry standards ........................................................................................................................... 4

[4]

5.2.1 JEDEC JEP122H ............................................................................................................................................................... 4

5.3 Degrading faults in technical publications .................................................................................................................... 5

[5]

5.3.1 Advanced CMOS Reliability Update: Sub 20 nm FinFET Assessment ............................ 5

[6]

5.3.2 Circuit-Based Reliability Consideration in FinFET Technology .......................................... 6

[7]

5.3.3 Intermittent Faults and Effects on Reliability of Integrated Circuits ........................... 6

6 Literature survey on predictive maintenance ..................................................................................................................... 6

6.1 General ........................................................................................................................................................................................................... 6

6.2 Predictive maintenance in industry standards ........................................................................................................ 6

[9]

6.2.1 IEC 61508 ............................................................................................................................................................................. 6

[3]

6.2.2 IEEE 1856 ............................................................................................................................................................................ 6

6.3 Predictive maintenance in technical publications ................................................................................................. 7

[10]

6.3.1 A Survey of Online Failure Prediction Methods ................................... ........................................... 7

[11]

6.3.2 An Odometer for CPUs ........................................................................................................................................... 7

[12]

6.3.3 Circuit Failure Prediction for Robust System Design in Scaled CMOS ...................... 8

[13]

6.3.4 A Circuit Failure Prediction Mechanism (DART) for High Field Reliability ......... 8

6.3.5 Predicting Remediations for Hardware Failures in Large­Scale
[14]

Datacenters ...................................................................................................................................................................... 8

6.3.6 Improving Analog Functional Safety Using Data-Driven Anomaly Detection
[15]

.................................................................................................................................................................................................... 8

7 Degrading faults and the ISO 26262 series .............................................................................................................................. 8

7.1 Understanding the lifecycle of degrading faults....................................................................................................... 8

7.2 Classification of degrading faults .......................................................................................................................................12

7.3 Quantifying degrading fault base failure rate .........................................................................................................12

7.3.1 Industry standards and models .........................................................................................................................12

7.3.2 Field data ................................................................................................................................................................................ 13

7.3.3 Expert judgement ........................................................................................................................................................... 13

8 Applying predictive maintenance ..................................................................................................................................................13

8.1 Diagnostic coverage (DC) evaluation for predictive mechanisms ........................................................13

8.2 Considering random hardware metrics ........................................................................................................................ 13

8.2.1 Impacting the SPFM and LFM .............................................................................................................................. 13

8.2.2 Application as a dedicated measure .............................................................................................................. 14

8.3 Considering RUL prediction ..................................................................................................................................................... 14

Annex A (informative) An approach to handling degrading faults .................................................................................16

Bibliography .............................................................................................................................................................................................................................18

iii
© ISO 2023 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/DTR 9839:2023(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non­governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO document should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

ISO draws attention to the possibility that the implementation of this document may involve the use

of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed

patent rights in respect thereof. As of the date of publication of this document, ISO had not received

notice of (a) patent(s) which may be required to implement this document. However, implementers are

cautioned that this may not represent the latest information, which may be obtained from the patent

database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all

such patent rights.

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and

expressions related to conformity assessment, as well as information about ISO's adherence to

the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see

www.iso.org/iso/foreword.html.

This document was prepared by Technical Committee ISO/TC 22, Road vehicles, Subcommittee SC 32,

Electrical and electronic components and general system aspects.

Any feedback or questions on this document should be directed to the user’s national standards body. A

complete listing of these bodies can be found at www.iso.org/members.html.
© ISO 2023 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/DTR 9839:2023(E)
Introduction

Hardware elements wear out or degrade with time and usage. The presence of certain faults can

cause the rate of degradation to increase. If the rate of degradation exceeds critical thresholds, then

a hardware element can fail during its normal expected lifespan. Addressing fault behaviours which

change over time is difficult. Functional safety standards such as the ISO 26262 series have traditionally

addressed degrading faults with avoidance measures and simplified assumptions of static behaviours.

Understanding of degrading faults is improving over time. Many industries are taking proactive steps

to control degrading faults using predictive maintenance. Predictive maintenance can detect degrading

faults and predict remaining useful life. Safety mechanisms based on predictive maintenance are not

explicitly discussed in the ISO 26262 series.

This document provides a survey of current state of the art for degrading faults and predictive

maintenance techniques. Approaches are presented to consider degrading faults and predictive

maintenance techniques in an ISO 26262 safety argument. Much of the content is focused on

semiconductors, but the concepts can be applied to other hardware elements.
© ISO 2023 – All rights reserved
---------------------- Page: 5 ----------------------
TECHNICAL REPORT ISO/DTR 9839:2023(E)
Road vehicles — Application of predictive maintenance to
hardware with ISO 26262-5
1 Scope

This document is intended to be applied to the usage of predictive maintenance methods for the

detection of degrading faults in safety related E/E hardware elements. It applies to hardware elements

[1]

developed for compliance with the ISO 26262 series in which degrading faults are shown to be

relevant due to, for instance, the technology used.

Specific technical implementations of predictive maintenance solutions are not in scope of this

document.
2 Normative references

The following documents are referred to in the text in such a way that some or all of their content

constitutes requirements of this document. For dated references, only the edition cited applies. For

undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 26262­1, Road vehicles — Functional safety — Part 1: Vocabulary
3 Terms and definitions

For the purposes of this document, the terms and definitions given in ISO 26262-1 and the following

apply.

ISO and IEC maintain terminology databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
degrading fault

fault whose characteristics are not constant and degrade over time, that can result in an error or failure

when stimulated after degradation exceeds a critical threshold

Note 1 to entry: Permanent and intermittent faults can first manifest as degrading faults. Transient faults do not

manifest as degrading faults.

Note 2 to entry: Degrading faults do not create errors or failures until degradation exceeds critical thresholds.

The capability to generate an error or failure is related to the current state of degradation.

Note 3 to entry: Degrading faults exhibit abnormal conditions which can cause an error or failure over time.

Normal degradation does not exhibit abnormal conditions which are necessary to be classified as a fault. Normal

degradation can result in a loss of functionality after expected lifespan has elapsed but cannot be considered a

fault as it is not abnormal.
3.2
degrading fault detection time interval
DFDTI
timespan from the occurrence of a degrading fault (3.1) to its detection
© ISO 2023 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/DTR 9839:2023(E)
3.3
degrading fault handling time interval
DFHTI

sum of the degrading fault detection time interval (3.2) and the degrading fault reaction time interval

(3.4).

Note 1 to entry: The degrading fault handling time interval is a property of a predictive maintenance (3.5) related

safety mechanism.

Note 2 to entry: The degrading fault handling time interval is considered in addition to the fault handling time

interval. See Figure 4.

Note 3 to entry: The timespan from occurrence of a degrading fault (3.1) until it has the capability to generate

an error or failure is the maximum degrading fault handling time interval that can be specified for a predictive

maintenance related safety mechanism to support the functional safety concept.

Note 4 to entry: A degrading fault (3.1) is covered in a timely manner by the corresponding safety mechanism if

there is detection and reaction within the degrading fault handling time interval.

3.4
degrading fault reaction time interval
DFRTI

timespan from the detection of a degrading fault (3.1) to reaching a safe state or reaching emergency

operation
3.5
predictive maintenance

techniques that are used to detect degrading faults (3.1), predict remaining useful life (3.6), and react

appropriately

Note 1 to entry: Approaches include the use of data driven methods such as machine learning applied locally or

[2]

on a remote system. Guidance for developing safety related ML systems can be found in ISO/IEC TR 5469 .

Note 2 to entry: Prediction of remaining useful life (3.6) can be used to replace a faulty element before it can cause

an error or failure.
3.6
remaining useful life
RUL

length of time from the present time to the estimated time that the item or element is expected to no

longer perform its intended function within desired specifications

Note 1 to entry: RUL can be estimated using predictive maintenance (3.5) or with other approaches.

Note 2 to entry: RUL can be estimated for expected degradation or degradation in the presence of a fault.

[3]
[SOURCE: IEEE 1856­2017 , modified for compliance to ISO directives]
4 Abbreviated terms
ADAS Advanced Driver Assistance System
ADS Automated Driving System
AI Artificial Intelligence
BEoL Back End of Line (sometimes BEOL)
BFR Base Failure Rate
BLM Barrier Layer Material
© ISO 2023 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/DTR 9839:2023(E)
CHC Channel Hot Carrier
COTS Commercial Off The Shelf
DC Diagnostic Coverage
DFDTI Degrading Fault Detection Time Interval
DFHTI Degrading Fault Handling Time Interval
DFRTI Degrading Fault Reaction Time Interval
DRAM Dynamic Random Access Memory
EM Electromigration
ESD Electrostatic Discharge
FEoL Front End of Line (sometimes FEOL)
FET Field Effect Transistor
FDTI Fault Detection Time Interval
FHTI Fault Handling Time Interval
FTTI Fault Tolerant Time Interval
HCI Hot Carrier Injection
ILD Inter-Layer Dielectric
LFM Latent Fault Metric
ML Machine Learning
MoL Middle of Line (sometimes MOL)
MEoL Middle End of Line (sometimes MEOL)
MPFDTI Multiple Point Fault Detection Time Interval
NBTI Negative Bias Temperature Instability
NVM Non-Volatile Memory
PCM Phase Change Memory
PHM Prognostics and Health Management
RUL Remaining Useful Life
SBD Soft Breakdown
SHE Self­Heating Effect
SILC Stress­Induced Leakage Current
SM Stress Migration
SoC System on Chip
© ISO 2023 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/DTR 9839:2023(E)
SPFM Single Point Fault Metric
TDDB Time Dependent Dielectric Breakdown
TDJD Time Dependent Junction Degradation
TID Total Ionizing Dose
5 Literature survey of degrading faults
5.1 General

This document reviews many technical documents to summarize the current state of the art

understanding of degrading faults in industry standards and technical publications.

NOTE Terminology in the referenced publications and standards is not always aligned to terms and

definitions of the ISO 26262 series. When referencing publications and standards, the terminology of the

referenced work is used.
5.2 Degrading faults in industry standards
[4]
5.2.1 JEDEC JEP122H

The JEDEC Solid State Technology Association is a semiconductor industry trade association and

standardization body. JEDEC has over 300 companies as members and publishes electronics standards

on a wide variety of topics.

JEDEC JEP122H is the latest revision on JEDEC’s standard for “Failure Mechanisms and Models for

Semiconductor Devices,” last updated in 2016. The standard describes eighteen different failure

mechanisms, classifying them as being related to the die front end of line (FEoL), die back end of line

(BEoL), or packaging. Models are provided for estimating the rates of degradation per failure mode. The

information provided in JEP122H is validated by a team of reliability experts from the SEMATECH/ISMI

Reliability Council and supported by extensive references to technical publications.

The die FEoL failure mechanisms described by the JEP122H include:
— time dependent dielectric breakdown (TDDB) due to gate oxide breakdown;
— hot carrier Injection (HCI);
— negative bias temperature instability (NBTI);
— surface inversion due to mobile ions;
— floating gate non-volatile memory (NVM) data retention;
— localized charge trapping NVM data retention;
— phase change memory (PCM) NVM data retention.
The die BEoL failure mechanisms described by JEP122H include:
— TDDB due to ILD/low-k/mobile Cu ions;
— aluminium electromigration (EM);
— copper EM;
— aluminium and copper corrosion;
— aluminium stress migration (SM);
© ISO 2023 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/DTR 9839:2023(E)
— copper SM.
The packaging failure mechanisms described in JEP122H include:
— fatigue failures due to temperature cycling and thermal shock;
— interfacial failures due to temperature cycling and thermal shock;
— intermetallic and oxidation failure due to high temperature;
— tin whiskers;
— ion mobility kinetics due to component cleanliness.
5.3 Degrading faults in technical publications
[5]
5.3.1 Advanced CMOS Reliability Update: Sub 20 nm FinFET Assessment

Reference [5] was published by Sandia National Laboratories, a research organization of the United

States Department of Energy, in 2020. The purpose of the report is to document the most critical

failure modes impacting advanced semiconductor technologies using FinFET technology. FinFET

based semiconductors are used for most current generation SoCs (system on chip devices), dGPUs

(discrete graphics processing units), and DRAMs (dynamic random-access memories) which are used

in infotainment, ADAS (advanced driver assistance systems), and ADS (automated driving system)

applications. While the use of FinFET transistors enables smaller process geometries (e.g. <20 nm

feature size) and faster processing, it also changes the failure mode susceptibility characteristics

compared to more traditional planar transistor technologies found in 28 nm and larger process

technologies.
The report provides details for the following failure modes:
— die related failure modes:
— bias temperature instability (BTI);
— dielectric integrity;
— HCI;
— BEoL, EM and stress voiding;
— middle end of line (MEoL) concerns (also known as middle of line, or MoL);
— packaging and package-die interaction;
— integrated die design and process reliability – electrostatic discharge (ESD);
— radiation effects:
— total ionizing dose (TID);
— displacement damage;
— COTS electronics and radiation effects.

The die and packaging related failure modes discussed can generally be argued to manifest as random

degrading faults before becoming intermittent or permanent faults. The ESD and radiation effects can

generally be argued to be systematic or transient in nature.

Also of interest is the section on reliability degradation and its impact on circuit/system performance.

This section focuses on “soft” logic failures which manifest before “hard” physical failures of the

semiconductor devices. As most of the degradation mechanisms discussed result in parameter

degradation, it is suggested that statistical methods can be used to predict circuit failures.

© ISO 2023 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/DTR 9839:2023(E)
[6]
5.3.2 Circuit-Based Reliability Consideration in FinFET Technology

Reference [6] is authored by four experts from Taiwan Semiconductor Manufacturing Company

(TSMC) in 2017 to describe the primary reliability failure modes of concern for FinFET based process

technologies and to present a model for estimating reliability. Comparisons are made between the

performance of 28 nm planar technologies versus 16 nm and 7 nm FinFET technologies.

The authors highlight several new reliability concerns including bias temperature instability (BTI),

stress-induced leakage currents (SILCs), self-heating effects (SHEs), and time dependent junction

degradation (TDJD). Models are proposed to estimate the reliability impacts of these mechanisms,

based on a combination of simulation and reliability testing.
[7]
5.3.3 Intermittent Faults and Effects on Reliability of Integrated Circuits

Reference [7] is authored by a reliability expert from AMD in 2008 and studies intermittent faults. An

experiment was conducted using more than 250 servers (from pre-2008) to provide over 300 server

years of operational data. Identified memory single bit errors were analysed for root cause, and the

findings documented. The rates of occurrence of the errors introduced by these faults can vary from

one design to another and one technology to another.

Failure modes discussed in this paper include ultra-thin oxide breakdown, soft breakdown (SBD), EM

voids, barrier layer material (BLM) cracks, and crosstalk as sources of intermittent faults. It is noted

that these intermittent faults can be detected by monitoring the Vmin (voltage minimum) thresholds

necessary for correct operation. Mitigations are discussed in terms of systematic avoidance, screening

at manufacturing test, and online fault detection in application including failure prediction.

6 Literature survey on predictive maintenance
6.1 General

This document reviews many technical documents to summarize the current state of the art

understanding of predictive maintenance in industry standards and technical publication. Additional

[8]
application domain specific standards are in development (e.g. IEC 63270 ).

NOTE Terminology in the referenced publications and standards is not always aligned to the terms and

definitions of the ISO 26262 series. When referencing publications and standards, the terminology of the

referenced work is used.
6.2 Predictive maintenance in industry standards
[9]
6.2.1 IEC 61508

IEC 61508 is a basic safety publication for functional safety which was the original basis for the

ISO 26262 series. The 2010 edition of the standard includes guidance on the use of fault forecasting,

maintenance, and supervisory actions supported by artificial intelligence (AI) systems.

[3]
6.2.2 IEEE 1856

IEEE 1856 provides an industry-independent approach to the use of predictive maintenance and similar

techniques. This standard is intended to be applied at many different levels of design abstraction

and is not specific to semiconductor technologies. This standard applies the terms “prognostics” and

“Prognostics and Health Management (PHM)” interchangeably with predictive methods. The standard

is primarily focused on estimating the remaining useful life (RUL) after a fault is detected, rather than

the method by which the fault is detected.
IEEE 1856 provides a lifecycle model for PHM as illustrated in Figure 1:
— the product is initially deployed without faults;
© ISO 2023 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/DTR 9839:2023(E)
— an off-nominal behaviour (fault) is detected;
— a failure occurs.
Figure 1 — IEEE 1856-2017 lifecycle model for prognostics

The IEEE 1856 model introduces three metrics, which when used together can be used to compare the

effectiveness of different PHM approaches:

— the response time for the predictive algorithm, defined as the time between first fault detection and

first correct prediction of RUL;

— the prognostic distance, defined as the time between the correct prediction and the occurrence of a

failure;

— the prognostic system accuracy, defined as the difference between the predicted failure time and

the actual failure time.

NOTE Prognostic system accuracy can be positive (failure occurs before prediction) or negative (failure

occurs after prediction).

In IEEE 1856-2017, Annex A, the standard provides additional guidance. The content on levels of PHM

implementation closely matches the ISO 26262 series approach of performing analysis on multiple

levels of design hierarchy: device, component, assembly, sub-system, system, and system of systems.

6.3 Predictive maintenance in technical publications
[10]
6.3.1 A Survey of Online Failure Prediction Methods

Reference [10] is a literature survey compiled by three researchers from Humboldt University in Berlin

in 2010. It is intended to provide a picture of the state of the art in online failure prediction methods as

of 2010. Some of the key information included in this document is:

— a lifecycle approach based on the progression of faults to errors to failures which is largely compatible

with the ISO 26262 series;

— a definition of nine metrics to evaluate predictive methods, with focus on precision and recall;

...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.