SIST EN 62429:2008
(Main)Reliability growth - Stress testing for early failures in unique complex systems
Reliability growth - Stress testing for early failures in unique complex systems
This International Standard gives guidance for reliability growth during final testing or acceptance testing of unique complex systems. It gives guidance on accelerated test conditions and criteria for stopping these tests. "Unique" means that no information exists on similar systems, and the small number of produced systems means that information deducted from the test has limited use for future production. This standard concerns reliability growth of repairable complex systems consisting of hardware with embedded software. It can be used for describing the procedure for acceptance testing, "running-in", and to ensure that reliability of a delivered system is not compromised by coding errors, workmanship errors or manufacturing errors. It only covers the early failure period of the system life cycle and neither the constant failure period, nor the wear out failure period. It can also be used when a company wants to optimize the duration of internal production testing during manufacturing of prototypes, single systems or small series. It is applicable mainly to large hardware/software systems, but does not cover large networks, for example telecommunications and power networks, since new parts of such systems cannot usually be isolated during the testing. It does not cover software tested alone, but the methods can be used during testing of large embedded software programs in operational hardware, when simulated operating loads are used. It addresses growth testing before or at delivery of a finished system. The testing can therefore take place at the manufacturer's or at the end user's premises. If the user of a system performs reliability growth by a policy of updating hardware and software with improved versions, this standard can be used to guide the growth process. This standard covers a wide field of applications, but is not applicable to health or safety aspects of systems. This standard does not apply to systems that are covered by IEC 62279.
Zuverlässigkeitswachstum - Beanspruchungsprüfung auf Frühausfälle in einzelnen komplexen Systemen
Croissance de fiabilité - Essais de contraintes pour révéler les défaillances précoces d'un système complexe et unique
La présente Norme internationale donne des recommandations applicables à la croissance de fiabilité au cours des essais finaux ou des essais d'acceptation d'un système complexe et unique. Elle donne des indications relatives aux conditions d'essais accélérés et des critères pour l'arrêt de ces essais.
Rast zanesljivosti - Obremenjevalno preskušanje za odkrivanje zgodnjih odpovedi v edinstvenih kompleksnih sistemih (IEC 62429:2007)
General Information
Standards Content (Sample)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.Rast zanesljivosti - Obremenjevalno preskušanje za odkrivanje zgodnjih odpovedi v edinstvenih kompleksnih sistemih (IEC 62429:2007)Zuverlässigkeitswachstum - Beanspruchungsprüfung auf Frühausfälle in einzelnen komplexen SystemenCroissance de fiabilité - Essais de contraintes pour révéler les défaillances précoces d'un système complexe et uniqueReliability growth - Stress testing for early failures in unique complex systems21.020Characteristics and design of machines, apparatus, equipment03.120.01Kakovost na splošnoQuality in generalICS:Ta slovenski standard je istoveten z:EN 62429:2008SIST EN 62429:2008en01-junij-2008SIST EN 62429:2008SLOVENSKI
STANDARD
SIST EN 62429:2008
EUROPEAN STANDARD EN 62429 NORME EUROPÉENNE
EUROPÄISCHE NORM April 2008
CENELEC European Committee for Electrotechnical Standardization Comité Européen de Normalisation Electrotechnique Europäisches Komitee für Elektrotechnische Normung
Central Secretariat: rue de Stassart 35, B - 1050 Brussels
© 2008 CENELEC -
All rights of exploitation in any form and by any means reserved worldwide for CENELEC members.
Ref. No. EN 62429:2008 E
ICS 03.120.01; 03.120.99
English version
Reliability growth -
Stress testing for early failures in unique complex systems (IEC 62429:2007)
Croissance de fiabilité -
Essais de contraintes pour révéler
les défaillances précoces
d'un système complexe et unique (CEI 62429:2007)
Zuverlässigkeitswachstum -
Beanspruchungsprüfung auf Frühausfälle in einzelnen komplexen Systemen (IEC 62429:2007)
This European Standard was approved by CENELEC on 2008-03-01. CENELEC members are bound to comply with the CEN/CENELEC Internal Regulations which stipulate the conditions for giving this European Standard the status of a national standard without any alteration.
Up-to-date lists and bibliographical references concerning such national standards may be obtained on application to the Central Secretariat or to any CENELEC member.
This European Standard exists in three official versions (English, French, German). A version in any other language made by translation under the responsibility of a CENELEC member into its own language and notified to the Central Secretariat has the same status as the official versions.
CENELEC members are the national electrotechnical committees of Austria, Belgium, Bulgaria, Cyprus, the Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and the United Kingdom.
SIST EN 62429:2008
EN 62429:2008 - 2 -
Foreword The text of document 56/1232/FDIS, future edition 1 of IEC 62429, prepared by IEC TC 56, Dependability, was submitted to the IEC-CENELEC parallel vote and was approved by CENELEC as EN 62429 on 2008-03-01. The following dates were fixed: – latest date by which the EN has to be implemented
at national level by publication of an identical
national standard or by endorsement
(dop)
2008-12-01 – latest date by which the national standards conflicting
with the EN have to be withdrawn
(dow)
2011-03-01 Annex ZA has been added by CENELEC. __________ Endorsement notice The text of the International Standard IEC 62429:2007 was approved by CENELEC as a European Standard without any modification. In the official version, for Bibliography, the following notes have to be added for the standards indicated: IEC 60300-1 NOTE
Harmonized as EN 60300-1:2003 (not modified). IEC 60300-2 NOTE
Harmonized as EN 60300-2:2004 (not modified). IEC 60300-3-1 NOTE
Harmonized as EN 60300-3-1:2004 (not modified). IEC 60706-5 NOTE
Harmonized as EN 60706-5:2007 (not modified). IEC 60812 NOTE
Harmonized as EN 60812:2006 (not modified). IEC 61014 NOTE
Harmonized as EN 61014:2003 (not modified). IEC 61025 NOTE
Harmonized as EN 61025:2007 (not modified). IEC 61078 NOTE
Harmonized as EN 61078:2006 (not modified). IEC 61160 NOTE
Harmonized as EN 61160:2005 (not modified). ISO 9000 NOTE
Harmonized as EN ISO 9000:2005 (not modified). __________
SIST EN 62429:2008
- 3 - EN 62429:2008 Annex ZA (normative)
Normative references to international publications with their corresponding European publications
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
NOTE
When an international publication has been modified by common modifications, indicated by (mod), the relevant EN/HD applies.
Publication Year Title EN/HD Year
IEC 60050-191 1990 International Electrotechnical Vocabulary (IEV) -
Chapter 191: Dependability and quality of service - -
IEC 60300-3-5 -1) Dependability management -
Part 3-5: Application guide - Reliability test conditions and statistical test principles - -
IEC 60605-2 -1) Equipment reliability testing -
Part 2: Design of test cycles - -
IEC 61163-1 2006 Reliability stress screening -
Part 1: Repairable assemblies manufactured in lots EN 61163-1 2006
IEC 61163-2 -1) Reliability stress screening -
Part 2: Electronic components - -
IEC 61164 -1) Reliability growth - Statistical test and estimation methods EN 61164 20042)
IEC 61710 -1) Power law model - Goodness-of-fit tests and estimation methods - -
1) Undated reference. 2) Valid edition at date of issue. SIST EN 62429:2008
SIST EN 62429:2008
IEC 62429Edition 1.0 2007-11INTERNATIONAL STANDARD NORME INTERNATIONALEReliability growth – Stress testing for early failures in unique complex systems
Croissance de fiabilité – Essais de contraintes pour révéler les défaillances précoces d’un système complexe et unique
INTERNATIONAL ELECTROTECHNICAL COMMISSION COMMISSION ELECTROTECHNIQUE INTERNATIONALE WICS 03.120.01;
03.120.99 PRICE CODECODE PRIXISBN 2-8318-9427-1
SIST EN 62429:2008
– 2 – 62429 © IEC:2007 CONTENTS FOREWORD.4
1 Scope.6 2 Normative references.6 3 Terms, definitions, abbreviations and symbols.7 3.1 Terms and definitions.7 3.2 Acronyms.9 3.3 Symbols.9 4 General.10 5 Planning and performing a reliability growth test.13 5.1 Step 1 – Should a reliability growth test be used?.13 5.2 Step 2 – Failure definitions and data collection.13 5.3 Step 3 – Stress levels.14 5.3.1 General.14 5.3.2 Increased operating load.14 5.3.3 Increased environmental stress.15 5.4 Step 4 – Failure analysis and classification of failures.15 5.4.1 General.15 5.4.2 Relevant failures.16 5.4.3 Non-relevant failures.17 5.5 Step 5 – Stop criteria.17 5.5.1 General.17 5.5.2 Method 1 – Fixed testing programs.17 5.5.3 Method 2 – Graphical analysis.18 5.5.4 Method 3 – Success ratio test.19 5.5.5 Method 4 – Estimation of reliability.21 5.5.6 Method 5 – Comparison with acceptable instantaneous failure intensity.22 5.5.7 Method 6 – Estimation of remaining latent faults.24 5.5.8 Method 7 – Reliability indicator testing.24 5.6 Step 6 – Verification of repairs and reliability growth.25 5.7 Step 7 – Reporting and feedback.26
Annex A (informative)
Practical example of method 3 – Success ratio test.27 Annex B (informative)
Practical example of method 5 –
Comparison with acceptable instantaneous failure intensity.28 Annex C (informative)
Practical example of method 6 –
Estimation of remaining latent faults.31
Bibliography.33
Figure 1 – The bathtub curve.12 Figure 2 – Evaluating whether the cumulative failure curve has levelled out.18 Figure 3 – Method 2.19 Figure B.1 – A reliability growth plot of the data from Table B.1.29 SIST EN 62429:2008
62429 © IEC:2007 – 3 –
Table 1 – Probability that a system with failure probability of 0,001
will pass N successive tests.21 Table 2 – Probability that a system with failure probability of 0,000 001
will pass N successive tests.21 Table 3 – Correct and incorrect decisions using reliability indicators.25 Table B.1 – Reliability growth and stopping times for the practical example.28 Table C.1 – Determining when to stop the test.32
SIST EN 62429:2008
– 4 – 62429 © IEC:2007 INTERNATIONAL ELECTROTECHNICAL COMMISSION ____________
RELIABILITY GROWTH –
STRESS TESTING FOR EARLY FAILURES
IN UNIQUE COMPLEX SYSTEMS
FOREWORD 1) The International Electrotechnical Commission (IEC) is a worldwide organization for standardization comprising all national electrotechnical committees (IEC National Committees). The object of IEC is to promote international co-operation on all questions concerning standardization in the electrical and electronic fields. To this end and in addition to other activities, IEC publishes International Standards, Technical Specifications, Technical Reports, Publicly Available Specifications (PAS) and Guides (hereafter referred to as “IEC Publication(s)”). Their preparation is entrusted to technical committees; any IEC National Committee interested in the subject dealt with may participate in this preparatory work. International, governmental and non-governmental organizations liaising with the IEC also participate in this preparation. IEC collaborates closely with the International Organization for Standardization (ISO) in accordance with conditions determined by agreement between the two organizations. 2) The formal decisions or agreements of IEC on technical matters express, as nearly as possible, an international consensus of opinion on the relevant subjects since each technical committee has representation from all interested IEC National Committees.
3) IEC Publications have the form of recommendations for international use and are accepted by IEC National Committees in that sense. While all reasonable efforts are made to ensure that the technical content of IEC Publications is accurate, IEC cannot be held responsible for the way in which they are used or for any misinterpretation by any end user. 4) In order to promote international uniformity, IEC National Committees undertake to apply IEC Publications transparently to the maximum extent possible in their national and regional publications. Any divergence between any IEC Publication and the corresponding national or regional publication shall be clearly indicated in the latter. 5) IEC provides no marking procedure to indicate its approval and cannot be rendered responsible for any equipment declared to be in conformity with an IEC Publication. 6) All users should ensure that they have the latest edition of this publication. 7) No liability shall attach to IEC or its directors, employees, servants or agents including individual experts and members of its technical committees and IEC National Committees for any personal injury, property damage or other damage of any nature whatsoever, whether direct or indirect, or for costs (including legal fees) and expenses arising out of the publication, use of, or reliance upon, this IEC Publication or any other IEC Publications.
8) Attention is drawn to the Normative references cited in this publication. Use of the referenced publications is indispensable for the correct application of this publication. 9) Attention is drawn to the possibility that some of the elements of this IEC Publication may be the subject of patent rights. IEC shall not be held responsible for identifying any or all such patent rights. International Standard IEC 62429 has been prepared by IEC technical committee 56: Dependability. The text of this standard is based on the following documents: FDIS Report on voting 56/1232/FDIS 56/1249/RVD
Full information on the voting for the approval of this standard can be found in the report on voting indicated in the above table. This publication has been drafted in accordance with the ISO/IEC Directives, Part 2. The committee has decided that the contents of this publication will remain unchanged until the maintenance result date indicated on the IEC web site under "http://webstore.iec.ch" in the data related to the specific publication. At this date, the publication will be
SIST EN 62429:2008
62429 © IEC:2007 – 5 – • reconfirmed, • withdrawn, • replaced by a revised edition, or • amended.
SIST EN 62429:2008
– 6 – 62429 © IEC:2007 RELIABILITY GROWTH –
STRESS TESTING FOR EARLY FAILURES
IN UNIQUE COMPLEX SYSTEMS
1 Scope This International Standard gives guidance for reliability growth during final testing or acceptance testing of unique complex systems. It gives guidance on accelerated test conditions and criteria for stopping these tests. “Unique” means that no information exists on similar systems, and the small number of produced systems means that information deducted from the test has limited use for future production. This standard concerns reliability growth of repairable complex systems consisting of hardware with embedded software. It can be used for describing the procedure for acceptance testing, "running-in", and to ensure that reliability of a delivered system is not compromised by coding errors, workmanship errors or manufacturing errors. It only covers the early failure period of the system life cycle and neither the constant failure period, nor the wear out failure period. It can also be used when a company wants to optimize the duration of internal production testing during manufacturing of prototypes, single systems or small series.
It is applicable mainly to large hardware/software systems, but does not cover large networks, for example telecommunications and power networks, since new parts of such systems cannot usually be isolated during the testing. It does not cover software tested alone, but the methods can be used during testing of large embedded software programs in operational hardware, when simulated operating loads are used. It addresses growth testing before or at delivery of a finished system. The testing can therefore take place at the manufacturer's or at the end user's premises.
If the user of a system performs reliability growth by a policy of updating hardware and software with improved versions, this standard can be used to guide the growth process. This standard covers a wide field of applications, but is not applicable to health or safety aspects of systems. This standard does not apply to systems that are covered by IEC 62279[39]. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. IEC 60050-191:1990, International Electrotechnical Vocabulary – Chapter 191: Dependability and quality of service IEC 60300-3-5, Dependability management – Part 3-5: Application guide – Reliability test conditions and statistical test principles IEC 60605-2, Equipment reliability testing – Part 2 Design of test cycles SIST EN 62429:2008
62429 © IEC:2007 – 7 – IEC 61163-1:2006, Reliability stress screening – Part 1: Repairable assemblies manufactured in lots IEC 61163-2, Reliability stress screening – Part 2: Electronic components IEC 61164, Reliability growth – Statistical test and estimation methods IEC 61710, Power law model – Goodness-of-fit and estimation methods 3 Terms, definitions, abbreviations and symbols 3.1 Terms and definitions For the purposes of this document, the terms and definitions given in IEC 60050-191, as well as the following, apply. 3.1.1 time compression reducing test time by testing with higher use time than in the field NOTE An example is testing a system that is used 8 h a day for 24 h a day. 3.1.2 accelerated test test in which the applied stress level is chosen to exceed that stated in the reference conditions in order to shorten the time duration required to observe the stress response of the item, or to magnify the response in a given time duration NOTE To be valid, an accelerated test should not alter the basic fault modes and failure mechanisms, or their relative prevalence. [IEV 191-14-07] 3.1.3 (time) acceleration factor ratio between the time durations necessary to obtain the same stated number of failures or degradations in two equal size samples, under two different sets of stress conditions involving the same failure mechanisms and fault modes and their relative prevalence. NOTE One of the two sets of stress conditions should be a reference set. [IEV 191-14-10] 3.1.4 execution time time to perform a stated number of transactions 3.1.5 fault state of an item characterized by inability to perform a required function, excluding the inability during preventive maintenance or other planned actions, or due to lack of external resources. NOTE 1 A fault is often the result of a failure of the item itself, but may exist without prior failure. [IEV 191-05-01] SIST EN 62429:2008
– 8 – 62429 © IEC:2007 NOTE 2 In English, the term “fault” is also used in the field of electric power systems with the meaning as given in IEV 604-02-01[42]1; then, the corresponding term in French is “défaut”. NOTE 3 In this standard, the term “latent fault” is used to emphasize that the fault has not yet caused a failure. NOTE 4 Software alone is deterministic. But this standard considers software embedded in hardware where the software can have latent faults relating to the hardware and the environment, e.g. insufficient protection against double keying, no checksum in communication, or no sanity check of input data or output data. 3.1.6 bug popular name for a software latent fault 3.1.7 reliability indicator non-functional parameter that points to a probable failure in a short time 3.1.8 success ratio test test repeated a number of times of which all have to be passed without failures 3.1.9 system set of interrelated or interacting elements [ISO 9000:2005, 3.2.1] [41] NOTE 1 In the context of dependability, a system will have – a defined purpose expressed in terms of intended functions, – stated conditions of operation/use, and – defined boundaries. NOTE 2 The structure of a system may be hierarchical [IEC 60300-1, 3.6] [43]. NOTE 3 For some systems, such as information technology products, data is an important part of the system elements. [Future IEC 60300-3-15, modified] [44]. 3.1.10 transaction set of input parameters and preconditions selected from operating loads for the system 3.1.11 root cause analysis activity to identify the cause of a fault or failure, so it can be removed by design or process changes 3.1.12 error discrepancy between a computed, observed or measured value or condition and the true, specified or theoretically correct value or condition NOTE 1 An error can be caused by a faulty item, e.g. a computing error made by faulty computer equipment. NOTE 2 The French term “erreur” may also designate a mistake (see IEV 191-05-25). [IEV 191-05-24] ——————— 1
References in square brackets refer to the biblioraphy. SIST EN 62429:2008
62429 © IEC:2007 – 9 – 3.1.13 mistake human error human action that produces an unintended result [IEV 191-05-25] 3.1.14 failure termination of the ability of an item to perform a required function NOTE 1 After failure the item has a fault NOTE 2 "Failure" is an event, as distinguished from "fault", which is a state. NOTE 3 This concept as defined does not apply to items consisting of software only [IEV 191-04-01] NOTE 4 Software alone is deterministic. But this standard considers software embedded in hardware where the software can have latent faults relating to the hardware and the environment, e.g. insufficient protection against double keying, no checksum in communication, or no validity check of input data or output data. 3.1.15 failure intensity failure intensity; instantaneous failure intensity
z(t)
limit, if this exists, of the ratio of the mean number of failures of a repaired item in a time interval (t, t + ût), and the length of this interval, ût, when the length of the time interval tends to zero NOTE 1 The instantaneous failure intensity is expressed by the formula as formula as
()()()[]ttNttNEtztΔ−Δ+=+→Δ0lim [IEV 191-12-04] NOTE 2 To avoid confusion this standard will use “instantaneous failure intensity” since a system is repaired when it fails, and a latent fault is repaired (removed) when precipitated as a failure. 3.2 Abbreviations
CPU Central processor unit EMC Electro magnetic compatibility ESD Electro static discharge FMEA Failure mode and effect analysis MTBF Mean operating time between failures RAM Random access memory 3.3 Symbols C total number of transactions )(tD the number of faults detected by time t Fu unacceptable number of failed transactions out of C transactions SIST EN 62429:2008
– 10 – 62429 © IEC:2007 i fault number M probability that a system with an unacceptable reliability passes N tests without a failure m number of latent faults in the system N number of transactions to be performed without failure p unacceptable probability of failure per transaction RCM r(Tt) risk criterion metric for remaining latent faults at total test time Tt rc the estimated number of remaining latent faults in the system r(Tt) remaining (undetected) latent faults predicted at accumulated test time Tt s number of test time intervals used in the Schneidewind model to estimate the model parameters t actual test time tstatus test time at status ()DtT the accumulated test time by which D(t) faults were detected iT the accumulated test time when fault i was detected minT minT the minimum test time that shall be accumulated by the system for 0 failures tT accumulated test time measured in time units of the Schneidewind model z the acceptable instantaneous failure intensity zi the instantaneous failure intensity of fault i iθ cumulative mean operating time between failures (MTBF) when fault i was detected
NOTE The term “cumulative MTBF” is used to be in line with other reliability growth models described in the literature. It is instructive in displaying a growth in reliability due to defect root cause elimination. The cumulative MTBF (θt) for each fault i is determined as iiTiθ=. α empirical constant in the Schneidewind model – failure intensity at test time = 0 β empirical constant in the Schneidewind model – proportionality constant for failure intensity over time – Unit: (time)-1 δ the probability of no failure occurring by minT for a given acceptable instantaneous failure intensity 4 General
This standard is one of a series of standards under the application guide IEC 61014 [34]. This standard applies to large hardware-software systems when tested using a simulated operating load. Therefore, it is not known during the test if a failure is caused by hardware, software, operating load, or a combination of these. A failure may be caused by a hardware failure, e.g. a random access memory (RAM) failure, a change of timing causing data collision, or an electromagnetic disturbance, changing data transmitted. The failure may also be caused by a software latent fault or by illegal data. How the failed item is repaired or the software is changed is, for this standard, only relevant to the extent that it influences the test decisions, e.g. through the assumptions of the statistical model. SIST EN 62429:2008
62429 © IEC:2007 – 11 – Nearly all modern systems contain embedded software. The software is typically tested on development hardware using transactions derived from the system specifications. Often the software is finished late so that the time for testing the software in the actual hardware is limited. It is usually not acceptable that the customer is the first to operate the software in the real hardware. Therefore, there is a need for a standard to guide testing and reliability growth of hardware with the embedded software. With hardware, it is assumed that early failures are caused by a latent fault in the hardware. Depending on the stress type and stress level, these latent faults can be precipitated into permanent or intermittent failures after some time. An example could be a crack in a component. Under dry operating conditions without vibration or shocks, the latent fault may remain a latent fault. But under moist operating conditions, moisture and contaminants may penetrate the crack and cause corrosion, ending in a permanent fault. Similarly, vibration or shock can cause crack propagation that may cause a permanent fault after some time. Software alone is deterministic. This means that a latent fault in the software (commonly called a software bug) will not result in a failure until the part of the code containing the latent fault is activated. The moment when this occurs depend on the operating conditions (e.g. input parameters and the internal states of the program, e.g. memory content). Therefore, there is a similarity between hardware latent faults and software latent faults. The software latent fault, once activated, may
cause a permanent fault but will often only cause an intermittent failure. Logical failures are systematic (i.e. they can be reproduced at will once the trigger for the associated fault is known). Since the trigger for any latent fault is encountered at random in the operating environment of the system, logical failures are observed as a stochastic process. Therefore, the usual measures of reliability can be applied (probability of time to next failure, failure intensity, etc.) Reliability growth will normally occur as latent faults are removed. In this standard the term "latent fault" will therefore be used to cover weaknesses in hardware as well as bugs in software [10]. A failure caused by a combination of hardware and software could be, for example, that a hardware latent fault causes insufficient cooling of a component. The temperature rise changes the time delays in the circuit, causing data collision that results in a software failure. Another combination could be that a hardware design error causes insufficient shielding of signal wires. The increased level of electromagnetic noise corrupts the data in the signal wires causing a software failure, given that the software does not have an error correction feature, and the operating environment has a high electromagnetic noise level. This standard covers repairable systems that are produced in a very small number of copies, so that experience from tests of previous similar systems is limited or non-existent. It can be used when a manufacturer wants to optimize the duration of internal acceptance testing and running-in. It addresses growth testing before or at delivery of a finished system. The testing can therefore take place at the manufacturer's or at t
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.