ASTM E3159-21
(Guide)Standard Guide for General Reliability
Standard Guide for General Reliability
ABSTRACT
This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in industrial areas and as applied to simple components, processes, and systems or complex final products. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its application to products and processes.
SIGNIFICANCE AND USE
4.1 The theory of reliability is used for estimating and demonstrating the probability of survival at specific times or for specific usage cycles for simple components, devices, assemblies, processes, and systems. As reliability is one key dimension of quality, it may be more generally used as a measure of quality over time or over a usage or demand sequence.
4.1.1 Many industries require performance metrics and requirements that are reliability-centered. Reliability assessments may be needed for the determination of maintenance requirements, for spare parts allocation, for life cycle cost analysis and for warranty purposes. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its application to products and processes. Many mathematical relationships and methods are found in the annexes. For general statistical terms not found in Section 3, Terminology E456 and ISO 3534-1 can be used for definitional purposes and ISO Guide 73 for general terminology regarding risk analysis.
4.2 The term “system” implies a configuration of interacting components, sub-assemblies, materials, and possibly processes all acting together to make the system work as a whole. Parts of the system may be linked in combinations of series and parallel configuration and redundancy used in some parts to improve reliability. Additional conditions of complex engineering may have to be considered.
4.3 Process reliability concerns the assessment of any type of well-defined process. This can include manufacturing processes, business processes, and dispatch/demand type processes. Assessment typically measures the extent to which the process can continually perform its intended function without “upset” as well as process robustness.
4.4 A number of reliability metrics are in use. For example, mean time to failure (MTTF) is a common measure of average life or average time to the first time a unit fails. For this reason it is said to apply t...
SCOPE
1.1 This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in industrial areas and as applied to simple components, processes, and systems or complex final products.
1.2 The system of units for this guide is not specified. Quantities in the guide are presented only as illustrations of the method or of a calculation. Any examples used are not binding on any particular product or industry.
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.
1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
General Information
- Status
- Published
- Publication Date
- 30-Apr-2021
- Technical Committee
- E11 - Quality and Statistics
- Drafting Committee
- E11.40 - Reliability
Relations
- Effective Date
- 01-Apr-2022
- Effective Date
- 01-Apr-2018
- Effective Date
- 01-Oct-2017
- Effective Date
- 01-Oct-2017
- Effective Date
- 15-Nov-2013
- Effective Date
- 15-Nov-2013
- Effective Date
- 15-Nov-2013
- Effective Date
- 15-Nov-2013
- Effective Date
- 15-Aug-2013
- Effective Date
- 01-Apr-2013
- Effective Date
- 01-Apr-2013
- Effective Date
- 01-Apr-2013
- Effective Date
- 01-Apr-2013
- Effective Date
- 01-May-2012
- Effective Date
- 01-May-2012
Overview
ASTM E3159-21: Standard Guide for General Reliability provides essential guidance on the core concepts, terminology, methods, and mathematical models for assessing and applying reliability in a wide variety of contexts. Developed by ASTM, this international standard supports industries in understanding and managing the reliability of components, processes, systems, and final products-whether simple or highly complex. The guide is non-prescriptive and applicable across sectors where reliability and maintenance are critical to performance, safety, and cost-effectiveness.
Key Topics
Fundamental Reliability Concepts
The guide introduces basic reliability theory, illustrating how reliability is one of the key elements of quality over time or usage cycles. It describes the importance of probabilities associated with survival and failure for components, devices, assemblies, processes, and entire systems.Terminology and Definitions
Common reliability terms are standardized, such as:- Reliability: The probability a system or component performs its intended function for a specified period under stated conditions.
- Mean Time To Failure (MTTF): Average operational time before failure, chiefly for non-repairable items.
- Mean Time Between Failures (MTBF): Average operational time between repairs for repairable systems.
- Failure Mode: Ways a component, process, or system can fail.
- Hazard Rate: The instantaneous failure rate at a specific time.
- B life percentiles (e.g., B10, B50): The life by which a defined percentage of units are expected to fail.
System Reliability Structures
The standard addresses reliability in systems made of interacting parts, discussing series, parallel, and redundant configurations that impact overall reliability performance.Failure Modes and Metrics
Describes infant mortality, random, and wear-out failures, as well as metrics such as reliability functions, failure rates, service life, duty cycles, and bench testing strategies.Reliability Assessment Methods
Covers statistical tools and test planning approaches including:- Binomial and exponential models for reliability estimation
- Demonstration testing and confidence intervals
- Planning accelerated and life-limited tests
- Probability plotting and data analysis
Applications
ASTM E3159-21 can be applied across a range of industries and functions where reliability is a performance or regulatory requirement, including:
Product Development and Design
Inform reliability allocation, component selection, and risk mitigation using methods such as Failure Mode and Effects Analysis (FMEA).Manufacturing and Quality Management
Integrate reliability analysis for maintenance scheduling, spare parts planning, and warranty management.Process Control and Operations
Assess and ensure robust operation in business, manufacturing, and field service processes by defining and measuring process reliability.Compliance and Risk Management
Use standardized terminology and methods in meeting client or regulatory reliability requirements and in performing life cycle assessments.Reliability Testing and Data Analysis
Employ recommended models and test planning procedures to demonstrate compliance with reliability specifications under controlled conditions.
Related Standards
- ASTM E456: Terminology Relating to Quality and Statistics
- ASTM E2334: Practice for Confidence Bounds Using Attribute Data with Zero Response
- ASTM E2555: Practices for Applying MIL-STD-105 Plans for Life and Reliability Inspection
- ASTM E2696: Practice for Life and Reliability Testing Based on the Exponential Distribution
- ISO 3534-1: Statistics - Vocabulary and Symbols, Probability and General Statistical Terms
- ISO Guide 73: Risk Management Vocabulary
Practical Value
Adopting ASTM E3159-21 enables organizations to:
- Standardize reliability terminology and assessment methods
- Improve product and process durability and customer satisfaction
- Support maintenance optimization and cost control
- Enhance regulatory compliance and decision-making based on quantifiable reliability data
Utilizing this guide gives businesses a common language and framework for managing, reporting, and improving reliability across the entire product or process life cycle.
Buy Documents
ASTM E3159-21 - Standard Guide for General Reliability
REDLINE ASTM E3159-21 - Standard Guide for General Reliability
Frequently Asked Questions
ASTM E3159-21 is a guide published by ASTM International. Its full title is "Standard Guide for General Reliability". This standard covers: ABSTRACT This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in industrial areas and as applied to simple components, processes, and systems or complex final products. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its application to products and processes. SIGNIFICANCE AND USE 4.1 The theory of reliability is used for estimating and demonstrating the probability of survival at specific times or for specific usage cycles for simple components, devices, assemblies, processes, and systems. As reliability is one key dimension of quality, it may be more generally used as a measure of quality over time or over a usage or demand sequence. 4.1.1 Many industries require performance metrics and requirements that are reliability-centered. Reliability assessments may be needed for the determination of maintenance requirements, for spare parts allocation, for life cycle cost analysis and for warranty purposes. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its application to products and processes. Many mathematical relationships and methods are found in the annexes. For general statistical terms not found in Section 3, Terminology E456 and ISO 3534-1 can be used for definitional purposes and ISO Guide 73 for general terminology regarding risk analysis. 4.2 The term “system” implies a configuration of interacting components, sub-assemblies, materials, and possibly processes all acting together to make the system work as a whole. Parts of the system may be linked in combinations of series and parallel configuration and redundancy used in some parts to improve reliability. Additional conditions of complex engineering may have to be considered. 4.3 Process reliability concerns the assessment of any type of well-defined process. This can include manufacturing processes, business processes, and dispatch/demand type processes. Assessment typically measures the extent to which the process can continually perform its intended function without “upset” as well as process robustness. 4.4 A number of reliability metrics are in use. For example, mean time to failure (MTTF) is a common measure of average life or average time to the first time a unit fails. For this reason it is said to apply t... SCOPE 1.1 This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in industrial areas and as applied to simple components, processes, and systems or complex final products. 1.2 The system of units for this guide is not specified. Quantities in the guide are presented only as illustrations of the method or of a calculation. Any examples used are not binding on any particular product or industry. 1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
ABSTRACT This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in industrial areas and as applied to simple components, processes, and systems or complex final products. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its application to products and processes. SIGNIFICANCE AND USE 4.1 The theory of reliability is used for estimating and demonstrating the probability of survival at specific times or for specific usage cycles for simple components, devices, assemblies, processes, and systems. As reliability is one key dimension of quality, it may be more generally used as a measure of quality over time or over a usage or demand sequence. 4.1.1 Many industries require performance metrics and requirements that are reliability-centered. Reliability assessments may be needed for the determination of maintenance requirements, for spare parts allocation, for life cycle cost analysis and for warranty purposes. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its application to products and processes. Many mathematical relationships and methods are found in the annexes. For general statistical terms not found in Section 3, Terminology E456 and ISO 3534-1 can be used for definitional purposes and ISO Guide 73 for general terminology regarding risk analysis. 4.2 The term “system” implies a configuration of interacting components, sub-assemblies, materials, and possibly processes all acting together to make the system work as a whole. Parts of the system may be linked in combinations of series and parallel configuration and redundancy used in some parts to improve reliability. Additional conditions of complex engineering may have to be considered. 4.3 Process reliability concerns the assessment of any type of well-defined process. This can include manufacturing processes, business processes, and dispatch/demand type processes. Assessment typically measures the extent to which the process can continually perform its intended function without “upset” as well as process robustness. 4.4 A number of reliability metrics are in use. For example, mean time to failure (MTTF) is a common measure of average life or average time to the first time a unit fails. For this reason it is said to apply t... SCOPE 1.1 This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in industrial areas and as applied to simple components, processes, and systems or complex final products. 1.2 The system of units for this guide is not specified. Quantities in the guide are presented only as illustrations of the method or of a calculation. Any examples used are not binding on any particular product or industry. 1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.4 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
ASTM E3159-21 is classified under the following ICS (International Classification for Standards) categories: 21.020 - Characteristics and design of machines, apparatus, equipment. The ICS classification helps identify the subject area and facilitates finding related standards.
ASTM E3159-21 has the following relationships with other standards: It is inter standard links to ASTM E456-13a(2022)e1, ASTM E2555-07(2018), ASTM E456-13A(2017)e1, ASTM E456-13A(2017)e3, ASTM E456-13a, ASTM E456-13ae2, ASTM E456-13ae1, ASTM E456-13ae3, ASTM E456-13, ASTM E2334-09(2013), ASTM E2696-09(2013), ASTM E2334-09(2013)e2, ASTM E2334-09(2013)e1, ASTM E456-12, ASTM E456-12e1. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
ASTM E3159-21 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E3159 − 21 An American National Standard
Standard Guide for
General Reliability
This standard is issued under the fixed designation E3159; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2.2 ISO Standards:
ISO 3534-1 Statistics–Vocabulary and Symbols, Part 1:
1.1 This guide covers fundamental concepts, applications,
Probability and General Statistical Terms
and mathematical relationships associated with reliability as
ISO Guide 73 Risk Management Vocabulary
used in industrial areas and as applied to simple components,
processes, and systems or complex final products.
3. Terminology
1.2 The system of units for this guide is not specified.
3.1 Definitions:
Quantities in the guide are presented only as illustrations of the 3.1.1 Unless otherwise noted, terms relating to quality and
method or of a calculation.Any examples used are not binding statistics are as defined in Terminology E456. Other general
statistical terms and terms related to risk are defined in
on any particular product or industry.
ISO 3534-1 and ISO Guide 73.
1.3 This standard does not purport to address all of the
3.1.2 B life, n—for continuous variables, the life at which
p
safety concerns, if any, associated with its use. It is the
there is a probability, p, (expressed as a percentage) of failure
responsibility of the user of this standard to establish appro-
at or less than this value.
priate safety, health, and environmental practices and deter-
3.1.2.1 Discussion—Example:The B life is a value of life,
mine the applicability of regulatory limitations prior to use.
t,suchthatcumulativedistributionfunction,F(t)=0.1or10%.
1.4 This international standard was developed in accor-
3.1.3 failure mode, n—thewayinwhichadevice,processor
dance with internationally recognized principles on standard-
system has failed.
ization established in the Decision on Principles for the
3.1.3.1 Discussion—Under some set of conditions, any
Development of International Standards, Guides and Recom-
device, process or system may be vulnerable to several failure
mendations issued by the World Trade Organization Technical
modes. For example, a tire may fail in the course of time due
Barriers to Trade (TBT) Committee.
to a puncture by a sharp object, from the tire simply wearing
out, or from a tire manufacturing anomaly. Each of these
2. Referenced Documents
describe different failure modes. These three failure modes are
said to be competing with respect to the failure event.
2.1 ASTM Standards:
E456 Terminology Relating to Quality and Statistics
3.1.4 hazard rate, n—differential fraction of items failing at
E2334 Practice for Setting an Upper Confidence Bound for a
time t among those surviving up to time t, symbolized by h(t).
Fraction or Number of Non-Conforming items, or a Rate
E2555
of Occurrence for Non-Conformities, Using Attribute
3.1.4.1 Discussion—h(t) is also referred to as the instanta-
Data, When There is a Zero Response in the Sample
neous failure rate at time t and called a hazard function. It is
E2555 Practice for Factors and Procedures for Applying the
related to the probability density (pdf) and cumulative distri-
MIL-STD-105 Plans in Life and Reliability Inspection
bution function (cdf)by h(t) = f(t)/(l – F(t)), where f(t) is the
E2696 Practice for Life and Reliability Testing Based on the
pdf and F(t) the cdf.
Exponential Distribution
3.1.5 mean time between failures (MTBF), n—the average
time to failure for a repairable item.
3.1.5.1 Discussion—A repairable system is one that can be
This guide is under the jurisdiction of ASTM Committee E11 on Quality and repaired and returned to service following a failure. When an
Statistics and is the direct responsibility of Subcommittee E11.40 on Reliability.
itemisrepaired,itmaynotnecessarilybereturnedtoservicein
CurrenteditionapprovedMay1,2021.PublishedJuly2021.Originallyapproved
as good as new condition. There may be a reduction in life in
in 2018. Last previous edition approved in 2018 as E3159 – 18. DOI: 10.1520/
a repaired item making the item not as robust as a new item.
E3159-21.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on Available from American National Standards Institute (ANSI), 25 W. 43rd St.,
the ASTM website. 4th Floor, New York, NY 10036, http://www.ansi.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3159 − 21
Any failure-repair sequence may continue for several cycles, life or average time to the first time a unit fails. For this reason
further reducing longevity of service following each repair it is said to apply to non-repairable systems. Other life
time.Oftenthemoretimestheitemisrepaired,thesmallerwill percentiles (or quantiles) are in use such as for example a B
p
be the expected remaining life until the next repair. However, life or that life at which there is p % expected failure.Thus, the
some repairable systems (for example, electronic) may just B or median life is the life at which 50 % of items would be
have some components replaced from time to time rendering expected to fail as well as survive; The B life is the life at
0.1
theunitasgoodasnew.Inthosecases,MTBFisthesamething which would be expecteda1in 1000 failure probability 0.1 %
as MTTF. failure) and a 99.9 % reliability.
4.4.1 Failure rate and average failure rate are also common
3.1.6 mean time to failure (MTTF), θ, n—in life testing, the
metrics in reliability. With failure rates, it is important to
average length of life of items in a lot. E2696
understand that a rate may be changing with time and this may
3.1.7 reliability, n—the probability that a component,
be increasing, decreasing or some combination of these over
device, product, process or system will function or fulfill a
the life of a product or service. The failure rate may also be
function after a specified duration of time or usage under
constant.
specified conditions.
4.5 Bench testing of a device is used to obtain early
3.2 Definitions of Terms Specific to This Standard:
reliability assessment or to demonstrate a specific reliability
3.2.1 non-repairable system, n—a system that is intended
requirement or a related metric. There are a number of key
for a single use and discarded/replaced following its first
methodologies that are used for this purpose. Demonstration
failure.
testingmaybedependentontheassumptionofadistributionof
3.2.2 repairable system, n—a system that is intended to be
failure time or may be carried out using nonparametric
used through multiple failure-repair cycles.
methods.
4.6 When a system is repaired following failure and placed
4. Significance and Use
back into service, we refer to the object as a repairable system.
4.1 The theory of reliability is used for estimating and Akeymetricforthisisthe mean time between failure(MTBF);
demonstrating the probability of survival at specific times or
and this is not to be confused with MTTF. When a system is
for specific usage cycles for simple components, devices, repaired, it may not be the case that its expected remaining life
assemblies, processes, and systems. As reliability is one key
is as good as a new one. There may be a reduction in expected
dimension of quality, it may be more generally used as a life following a repair and this may continue with continuing
measure of quality over time or over a usage or demand
repair cycles. The MTBF metric applies to all such sequences
sequence. of repair and restoration cycles over a service life period. This
4.1.1 Many industries require performance metrics and includes the first time to failure, the 2nd time, the 3rd time, etc.
requirements that are reliability-centered. Reliability assess-
5. Life Concepts
ments may be needed for the determination of maintenance
requirements, for spare parts allocation, for life cycle cost
5.1 Before reliability can be assessed, the measure of life
analysis and for warranty purposes. This guide summarizes
must be selected. Table 1 shows a sample of units that are
selected concepts, terminology, formulas, and methods associ-
commonly used as a measure of life.
ated with reliability and its application to products and pro-
5.1.1 Variations of these units can be found as for example
cesses. Many mathematical relationships and methods are
the difference between an aircraft total engine operating time
found in the annexes. For general statistical terms not found in
(EOT) and its time/hours in flight or engine flight hours (EFH).
Section 3, Terminology E456 and ISO 3534-1 can be used for
Cycles are dependent on ordinary time in that any cycle may
definitional purposes and ISO Guide 73 for general terminol-
last for any length of time. In another case, continued life may
ogy regarding risk analysis.
be driven more by calendar time.
5.1.2 A dispatch of a product or service can be used to
4.2 Theterm“system”impliesaconfigurationofinteracting
compute the product’s dispatch reliability (for example, rela-
components, sub-assemblies, materials, and possibly processes
tive frequency of failure free dispatches without a change to a
all acting together to make the system work as a whole. Parts
schedule). Demand cycle is different than ordinary cycles in
of the system may be linked in combinations of series and
that a product may only be demanded infrequently but must be
parallel configuration and redundancy used in some parts to
in serviceable condition when called on (for example, fire
improvereliability.Additionalconditionsofcomplexengineer-
extinguishers, ambulance vehicles). Calendar time may be
ing may have to be considered.
applicable in situations where a product is exposed to field or
4.3 Process reliability concerns the assessment of any type
of well-defined process. This can include manufacturing
processes, business processes, and dispatch/demand type pro-
TABLE 1 Common Measures of Product Life
cesses. Assessment typically measures the extent to which the
Life Unit Example
process can continually perform its intended function without
operating time hrs., minutes, days
“upset” as well as process robustness.
cycles of usage flights, dispatches
calendar time days since new; shelf life
4.4 Anumber of reliability metrics are in use. For example,
demand cycles unit is demanded occasionally
mean time to failure (MTTF) is a common measure of average
E3159 − 21
environmental conditions during most of its life and would be modes – depending on the type of product considered. Asso-
subject to chemical, thermal or other actions causing perfor- ciated with each of the three broad classes of failure mode are
mance degradation over time. Shelf life is applied to many the three types of failure rate.
chemical and biological products and is a prime example of the
5.3.2 A failure rate (also called force of mortality)isa
more general “service life” concept.
measure of the rate of failure of currently surviving units at a
5.1.3 A product’s service life is a duration of life (in the
specific time. For infant mortality cases, the failure rate
appropriatelifeunits)overwhichthemanufacturerbelievesthe
decreases with time. The explanation is that the presence of
product is serviceable or useful. The term useful life is also
special causes will cause failure, typically early in the life
used synonymously. In some industries, the concept of “mis-
cycle; the longer a unit survives, the less likely it is infected
sion” is used interchangeably. A duty cycle is often used to
with the said special cause and hence the failure rate decreases
describe the fraction of the time and under what conditions that
as the unit ages.
a product is called on for its intended use relative to some
5.3.3 Infant mortality is the reason for conducting a “burn-
arbitrary time period. For example jet engines operate at a
in” application where products are exposed to usage prior to
lower level of stress during the cruise portion of any flight,
field introduction in order to identify potential early failures
whereasduringthetakeoffandlandingportionoftheflight,the
prior to field use by a customer. For example, this practice is
duty requirements are greater. In a duty cycle profile, a product
common for personal computer (PC) manufacturers who want
may be exposed to a distribution of stress during a typical
to ensure their machines do not have special cause type defects
usage cycle.
and will function immediately upon a customer’s use.
5.1.4 Inmanytypesofproducts,componentsorsubsystems,
5.3.4 A random failure mode is one that may occur at any
the unit may be subject to life limiting. The unit must be
time over a service life period but generally may be a rare
replaced with a new one immediately upon reaching the life
event. The frequency of such failures is not age-dependent and
limit, if not failed. Such units have increasing failure rates with
is only a function of duration time or size of the observation
age and the life limit is judiciously selected at a point prior to
region (that is, how long the unit is observed for). Random
reaching the unacceptable failure rate. Life limiting is different
failures occur at a constant failure rate throughout a service
than service life in that the former applies to non-repairable
life. Examples include errors of operation; installation and
items (for example, one use only, then dispose and replace).A
maintenance mistakes; foreign object damage (including hard
concept related to life limiting is a replacement or preventative
objects, liquids, or biological interferences); other contamina-
maintenance interval. Replacement intervals are commonly
tion and damage due to extreme environmental conditions
found in electro-mechanical applications such as in machine
includingextremeorexcessiveconditionsofproductuse.Also,
hardware, automotive or aerospace applications.
several rare events that collectively can cause a failure (par-
5.2 Maintenance Schedules or Intervals—The continued ticularly in large systems) may manifest itself as a random type
useful life of many types of products is dependent on appro-
failure mode. Numerous other random causes may be found.
priate maintenance. Such maintenance is often specified con-
5.3.5 A wear-out failure mode is generally caused by
tractually or as part of a warrantee stipulation. Inappropriate
gradual performance degradation with usage or time, ulti-
product use or operation of the product outside of an intended
mately resulting in failure. In electro-mechanical applications,
usage range may retard or negate the desirable effect of a
causes of this type of failure may be driven by chemical,
maintenance interval.
thermal, mechanical or electrical stress until some endurance
limit is reached causing the failure. Cases of rare catastrophic
5.3 Failure Modes and Failure Rate—In using reliability
shocks are more likely random events than gradual degrada-
calculationsconsiderationshouldbegiventothetypeoffailure
tion. In all types of products and services understanding the
mode that is expected for the specific product and its intended
type of failure mode and the potential cause is important in
application. Three broad classes of failure modes are in
design considerations and in installing improvements that
common use. Table 2 describes these. See (1) and (2) for
would make the product more robust.
further detail around the failure mode concept.
5.3.1 The term “infant mortality,” borrowed from the bio- 5.3.6 In terms of the broader product development cycle the
logical science, is now common in engineering. Each of these three classes of failure modes are depicted using the bath-tub
three classes may contain numerous more specific failure curve shown conceptually in Fig. 1. Early in the development
cycle, a new product may exhibit certain failure modes, for any
number of reasons that are classified as infant mortality.As the
The boldface numbers in parentheses refer to the list of references at the end of
causes of these early failures are removed or corrected and the
this standard.
product develops and is improved, it moves into a period of
random failure with a constant failure rate.This random period
TABLE 2 General Failure Mode Classes
is sometimes used as a basis for warrantee development. With
Class Description
increasing usage, products fall into the wearout period and
Infant Mortality Early failures due to special causes that typically
performance degradation. In this period, there is an increasing
only apply to some units in a population.
failure rate.
Random Failures due to random causes that can happen at
any time.
5.3.7 The depiction in Fig. 1 is not to be construed as
Wearout Failures due to wear or degradation action such as
applying to every type of product, nor its shape the standard
chemical, thermal, mechanical, or electrical.
form. Some products may only see a random-wearout life
E3159 − 21
5.4.2 For variable data, where a life distribution such as a
Weibull is used, the mean of the failure time distribution is
commonly called the mean time to failure or MTTF (3), and
understood as the mean of the first failure times. Generally it is
used with non-repairable systems or single use components.
For repairable systems, where an object has a recurrence of life
following each repair, the mean of the recurrence life cycles is
called the mean time between failure or MTBF. When the
failure mode under study is of the random type, MTTF and
MTBF are theoretically the same thing.
5.4.3 For variable data, the B life (0
p
time distribution is that life at which there is a reliability
(survival probability) of (100-p) % or a failure probability of
FIG. 1 The “Bathtub Curve”
p %. Care should be exercised when the MTTF is used as an
indicator of reliability since, for random type failure modes,
cycle. In other cases, the failure rate may rise to a maximum
there is an approximate 63.2 % chance of a failure prior to the
and then gradually decline.
mean time. For example if a particular component has been
5.3.8 In working with large systems where there may be
shown to have an MTTF of 10 000 hours for a certain random
several failure modes related to the failure of different compo-
failuremode,thenundertheseconditionsthereliabilityat1000
nents of the system, each one causing the system to fail, the
hours is approximately 90 % (see AnnexA1). Suppose further
failure rates for the several failure modes can be added to give
that the customer demands the reliability at t = 1000 to be
the composite system failure rate.
99 %, what would the MTTF have to be to achieve the 99 %
5.3.9 When a failure rate is variable (a function of time), as
reliability? For a reliability of 99 % at t = 1000 hours, the
for example with infant mortality cases or the wearout portion
MTTF would have to be approximately 99 500 hours. This
of life, the average failure rate over an interval can be
illustrates that the MTTF by itself should not be taken as a
calculated. If the life distribution is a known form, such as a
reliability benchmark – without calculating the reliability at
Weibull model, the instantaneous failure rate curve as well as
some critical time t. In addition, the use of maintenance or
the average failure rate over an interval can be calculated in
closed form (in some cases). For infant mortality cases, the replacement intervals can affect both MTTF and MTBF (see
average failure rate will be greater than the instantaneous rate; Annex A1).
for the wear-out case, the average failure rate will be less than
5.4.4 Other reliability functions may be defined and many
the instantaneous rate.
of the important related functions are discussed in the annexes
to this standard. References (4-9) contain additional general
5.4 Reliability Metrics and Functions—Anumberofmetrics
and requirements for reliability are in use. Table 3 lists information about these functions for various types of statisti-
commonly applied reliability metrics. cal distributions as well as plentiful information on reliability
5.4.1 If t is a variable time, the metric R(t) is the reliability
more generally.
function at time t and related to the assumed distribution such
5.5 Reliability Data—Field performance data are the prin-
as a Weibull or a lognormal distribution. If t is a discrete
ciple indicator of reliability. How often failures occur, at what
variable, such as demand cycles, R(t) may be based on the
times, their severity and for what reasons are the key reliability
binomial or Poisson models where the failure probability on
intelligence. In development activity including improvement
any cycle is constant throughout life. Where a degradation of
efforts, bench testing is the main indicator of reliability, but it
life occurs with increasing demand cycles, a discrete versionof
may be difficult to emulate all possible field conditions.
the Weibull can be used.
5.5.1 When a unit is either tested on a “bench” or observed
in the field, there are two conditions that can occur: (1) the unit
TABLE 3 Reliability Metrics/Requirements
has failed at a specific time t; and (2) the unit is still in good
Metric Description
running condition at time t. If a unit has failed in some way at
R(t) Reliability at time t.
MTTF and MTBF MTTF or mean time to failure is the mean of the
a specific time, that is called a complete failure case. If the unit
failure time distribution (1st failure). MTBF or
is still working properly, it is called a right suspension or a
mean time between failures is the mean failure
time between failures for reparable systems “run-out” if the actual failure time, now unknown, is in the
(1st, 2nd, 3rd, failure etc.).
future (in statistics, this is referred to as censored on the right).
B Life The pth percentile of the life distribution; for
p
In some cases it is only known that a unit has failed and not its
example B or B life are the 0.1th and 10th
0.1 10
percentiles of the life distribution.
specifictime.Theunit’sconditionmayhavebeendiscoveredat
Failure Rate Generally applicable for random failure modes in
a time after it had failed – called a left suspension. In another
the units of “events” or defects per unit where
case it may only be known that a unit has failed sometime
“unit” means some observational region such
as time, space, area, volume, etc.
between two times – called an interval suspension. Interval
Dispatch Reliability The probability that a unit will be available and in
type data is common in certain types of component bench
good operating condition when demanded.
testing.
E3159 − 21
6. Reliability Estimation and Calculation Methods confidence requirement is used, such as 90 % or 95 %
confidence, that would apply to the final stated result. When
6.1 Simple Binomial and Exponential Reliability Functions:
confidence is not specified, it may be assumed to be approxi-
6.1.1 There are cases where a product or service is de-
mately 50 % as for example, when a point estimate of
mandedperiodicallyandcouldpossiblyfail(foranynumberof
reliability metric is used. There are no general industry wide
reasons) at the time it is demanded but not dependent on its
standards as far as confidence is concerned.Aconfidence level
previous usage. The binomial distribution may be used to
of 95 % or 90 % is very commonly used, but some industries
express the reliability in its simplest form and this is seen to be
or applications may require a different value.
related to the more familiar exponential reliability function.
6.2.3 Confidence may also depend on the specific
For the binomial, the failure probability p is assumed to be
application, as for example when safety is a concern. Users
constant throughout life, and failure on any demand assumed
should seek out industrial benchmarks in their specific areas.
independent of the past. Under these conditions, the reliability
following n successful uses of a unit is (see E2334):
6.2.4 Atest plan generally consists of a sample size, the test
n
“duration” and a life requirement. There may also be a
R n 5 1 2 p (1)
~ ! ~ !
requirement for a maximum number of failures allowed by the
6.1.2 Taking the natural log of both gives:
plan. Life requirement implies how long a specific device
ln$R ~n!% 5 n$ln ~1 2 p!% (2)
should last and with what reliability at the stated life. The life
requirement may be stated as a service life with an associated
6.1.3 When p is small (say p < 0.01), –ln(1 – p) will be
reliability at the end of life. Service Life means the useful
approximately equal to p. Upon simplification this gives:
functioning life at the end of which the device is repaired,
2np
R~n! 5 e (3)
overhauled, or disposed of. The units of service life can be a
variable time such as hours or cycles of operation, or can be
6.1.4 Let a be the average time between demand cycles.
demand variable for the device as for example a safety device
Then after n demand cycles, the total time that has passed is
that has to work when it is called for.
approximated byt=na making n = t/a. Substituting this for n
gives: 6.2.5 The life requirement can also be a mean life, a median
2tp⁄a 2λt life (B ), a failure rate or a general life point at which there is
R~t! 5 e 'e (4)
a stated reliability. For the latter case, an example requirement
6.1.5 In the last expression the quantity p/a is in the units of
might be to demonstrate 99 % reliability at a service life of
failures per unit time and this is rate constant, λ, here assumed
2000 hours. Always, when such a requirement is specified, it
constant throughout life. Under homogeneous conditions of
must be accompanied by a set of specified test conditions.
continued usage the above expression can be used to find the
These conditions are designed to emulate field conditions or to
conservative upper confidence bound for the rate λ,orthe
besomewhatmoreseverethantypicalfieldconditions.Inother
required time t that would validate an assumed or desired rate.
cases, a duty cycle is determined that specifies a distribution of
For confidence C the relationship starts with:
field stress that a device would be expected to see in practice.
2λt
e $ 1 2 C (5)
The test plan would incorporate the duty cycle in some way.
6.2.6 In many cases of testing, it is not economical to
Solving for λ:
implement a given test plan under ordinary usage time or
2ln~1 2 C!
cycles. In such cases, an accelerated test is used. In an
λ# (6)
t
accelerated test, one or more variables are adjusted to an
equivalent longer duration of actual device use. The simplest
6.1.6 Eq 6 is the upper confidence bound on the rate
parameter, λ when zero events have been observed in an such case is where a severity multiplier factor is determined at
interval t. Cases where r failures have been observed in the which one test hour or cycle is equivalent to some number, k,
interval or where a test is aborted following the rth failure are of actual or typical hours of cycles. This is entirely device and
discussed in the Annex A6.
application dependent and in many cases a more complicated
relationship between the accelerated variable and the actual
6.2 Reliability Demonstration Testing:
time is needed. There are many such accelerated models used
6.2.1 Test planning concerns how many units to run, for
in practice. Reference (10) contains many of these methods.
how long, and at what operational parameters (temperature,
6.2.7 In cases where a distribution assumption is made, a
moisture, etc.) in order to demonstrate that a reliability
further assumption may be to assume a value for a parameter
requirementhasbeenmetatsomeconfidencelevel.Thiscanbe
of that distribution. For example, if a Weibull model is chosen
done using parametric or nonparametric methods. In addition,
and valid, users might assume the value for the shape param-
there can be attribute type test plans and variable type test
eter (also called the Weibull slope or β). For the lognormal
plans.Incasesofvariabledatawhereaspecificlifedistribution
is used, such as a Weibull or lognormal, the value of a distribution, the scale parameter, σ, is sometimes assumed. If
the normal distribution is used, the standard deviation might be
parameter of that distribution is sometimes assumed when
some engineering/scientific knowledge justifies this assump- assumed.Ineachofthesecases,whentheassociatedparameter
tion. isassumed,itispossibletodesignatestplanfordemonstrating
any quantile of the distribution, with any degree of confidence,
6.2.2 Thereareoneormorespecifiedoutputrequirementsto
be demonstrated by using a plan. Generally, but not always, a given the assumed parameter.
E3159 − 21
6.2.8 Demonstrating a reliability requirement may be more 6.4.2 For the lognormal model with location parameter µ,
difficult (costly) if all of the parameters of the assumed and scale parameter σ, similar equations can be developed. In
distribution are unknown. A distribution parameter, such as what follows the lognormal scale parameter is known or
discussed above, is sometimes assumed because there might be assumed.
engineering or scientific knowledge from prior performance, Compute:
from industry experience, or from material properties that
n
q 5 1 2 =1 2 C (12)
supports such an assumption.
6.2.9 Another assumption that is sometimes useful is the For test time t, the lower bound at confidence C is:
scatter factor of a distribution. The scatter factor, f, is the ratio 21
µ$ µ 5 ln t 2 σF q (13)
~ ! ~ !
of the B life to the B life for the assumed distribution. For
50 0.1
-1
The function F (q) is the inverse standard normal distribu-
the Weibull and lognormal distributions, the scatter factor is
tion function evaluated at q (see Eq 12). Refer to this lower
functionally related to the Weibull shape parameter, β, and the
confidence bound on the lognormal scale value as µ . Then the
lognormal scale parameter, σ, respectively (see Annex A2 and 0
lower bound on mission reliability at mission time t is:
m
Annex A4). In many cases of materials testing, engineers may
know the approximate scatter factor under the general condi-
ln t 2 µ
~ !
m 0
R t $ 1 2 F (14)
~ ! S D
m
tions of a specific test.
σ
6.3 For pure attribute pass/fail testing, a “zero failure” test
Let Z be the standard normal quantile value at cumula-
p/100
plan is a common theme. The following basic equation, based
tive probability p/100. Then the lower bound on the B life is:
p
on the binomial distribution, relates sample size, n, confidence, 21
~Z 2 F ~q!!σ1ln~t!
p⁄100
B $ e (15)
p
C, and reliability, R, (11).
NOTE 1—In Eq 14, when t = t, the mission reliability at time t reduces
m
n
to1– q, which is the non-parametric result. The sample size required to
R$ =1 2 C (7)
state a lower confidence bound on a lognormal quantile B is:
p
6.3.1 From Eq 7 when any two values are known or
ln~1 2 C!
n 5 (16)
assumed, the third may be solved for.
ln B ⁄ t 2 Z σ
~ !
p p
ln F
H S DJ
6.3.2 Asecond common case for pass/fail type data is when
σ
a single failure in the sample is allowed. In that case the
6.4.3 All of the above formulas are zero failure plans and
relation among n, C and R, based on a binomial model, is:
there are many additional variations on this topic. It is also
n21 n
nR 2 n 2 1 R $ 1 2 C (8)
~ !
possible to derive similar plans that allow a maximum number
of failures, r > 0. Some detail is discussed in Annex A7.
6.3.3 Eq8maybesolvednumericallyforanyvariablewhen
Further variations of the Weibull model can be found in
the remaining two are known or assumed. The general case
(12-15).
where r > 1 failures are allowed in n is discussed in AnnexA7.
6.5 In certain types of testing, it may be possible to test
6.4 Demonstration or “substantiation” testing is used to
several units at a time. For example, this method is used in the
show that a B life is at least some specified value with some
p
bearing industry and is called “sudden death”or“first of n”
specified confidence. A distribution for the failure time is
testing (16). The method is particularly useful when the failure
assumed and often that distribution is Weibull, lognormal or
mode has a Weibull distribution. In that case the first failure
extreme value. The Weibull model may be imposed on Eq 7
time or the minimum failure time in n units tested also has a
resulting in the following equations relating reliability at
Weibull distribution with the same shape parameter (β)asthe
mission time t to confidence, C, sample size, n, test time, t,
m
parent Weibull of the individual failure times. If η is the
and assumed Weibull shape parameter β, (12):
Weibull scale parameter of the individual failure times, then
t β 1/β
1 m
S D η/n will be the scale parameter of the first of n distribution.
R t $ 1 2 C n t (9)
~ ! ~ !
m
This is the so called reproductive property of the Weibull
β
ln~1 2 C! B
p
distribution. The first of n methodology is efficient in that
n 5 (10)
S DS D
ln 1 2 p ⁄ 100 t
~ !
failures will occur more rapidly when multiple units are tested
1⁄β
ln 1 2 p ⁄ 100
~ !
at the same time.
1⁄β
B $tn (11)
S D
p
ln~1 2 C!
6.5.1 For example if k = 4 sets of n = 6 units are tested until
the first failure in 6 occur in each set of 4, one then has 4
6.4.1 In Eq 9 the demonstrated reliability at mission time t
m 1/β
failures from which to estimate the scale parameter, η/n ,of
is related to confidence, C, sample size, n, test time, t, and
the 1st of 6 distribution. From that estimate, the individual
assumedWeibullshape, β.Inthatcasenunitsaretestedtotime
Weibull scale parameter may be estimated.
t and all survive. In Eq 10 the sample size is related to the B
p
requirement, the test time, t, confidence, C, and assumed 6.6 Reliability considerations are best addressed in the
Weibull shape β.In Eq 11 the B life is related to the sample design phase of product development and where failure rate
p
size,n,confidence,C,testtime,t,andWeibullshape, β.Ineach requirements are available so that engineers can factor these
case there is a test time, t, and n units are tested to that time into a design. Reliability allocation methods attempt to distrib-
without failure. More general Weibull plans, wherer>0 ute product strength in various ways so that the entire system
failures are allowed are briefly discussed in the Annex A7. just meets the requirement. There are many ways to do this
E3159 − 21
depending on the system and its requirements, and in each distribution is used and exact formulas are not available in
scenario cost is typically a factor (there will generally be closed form.Aconvenient approximation is often used.Again,
reliability/cost tradeoff). Some popular methods include the associated with the ith order statistic, the median rank plotting
following. position is approximately (i – 0.3)/(n + 0.4). Once again, these
6.6.1 Choosing materials or components, or both, that have formulas apply to cases with no suspensions. Where suspen-
superior reliability or material performance properties. sions are distributed throughout the data, these plotting posi-
6.6.2 Allocation using various combinations or series and tions must be adjusted. Software packages that provide prob-
parallel networks and redundancy. ability plots will do these calculations automatically.
6.6.3 Derating a device or a system means specifying the
6.7.2 The most commonly used probability plot is the
operational conditions below actual capability.
normal probability plot where the normal distribution is being
6.6.4 TheuseofFailureModeandEffectsAnalysis(FMEA)
used. This plot is appropriate for all types of data that can be
to identify failure modes, their frequency and severity (and
assumed normal. For reliability type data, it is typically the
possible latent type failure modes) during development activ-
Weibull, lognormal or extreme value distributions that are
ity.
used. The calculation method used in a probability plot can
also vary. In Fig. 2, the Weibull distribution is being used, and
6.7 The probability plotting technique is most appropriate
a maximum likelihood estimation method is used to estimate
for field reliability data of the variable type. In a probability
the two parameters of theWeibull model. Median rank plotting
plot, failure times are plotted with consideration given to the
positions are also being used.
numberandtypeofsuspensionsthatareamongthedatasetand
6.7.3 In constructing a probability plot it is also possible to
the type of distribution that is assumed to apply to the data.
create confidence bands around the model (straight line por-
What results is an estimate and plot of the assumed cumulative
tion). There are several ways to do this including parametric
distribution function versus time (cdf versus t). The plot is
and nonparametric methods, and Monte Carlo simulation.
typically scaled so that the assumed distribution plots as a
Many software packages offer at least several options to create
straight line as a function of time. In certain cases the slope of
confidence bands on probability plots.Acommon method uses
the resulting line has meaning with respect to the assumed
the estimated standard errors of the parameter estimates and
distribution.Forexample,inaWeibullplot,theslopemeasures
takes advantage of the asymptotic normality property of the
the shape parameter (β). Most software packages will create
maximum likelihood parameter estimates (MLE).The standard
probability plots of various kinds and also return a statistic that
error estimates are typically supplied in the form of matrix –
measures goodness of fit for the model being used.
the Fisher Information matrix – that provides estimates of the
6.7.1 In a probability plot, such as Fig. 2, the estimate of the
variances and covariance of the MLE’s. These values are then
failure probability at each failure point needs to be determined.
used in standard formulas to create confidence bands for any
These estimates are called plotting positions, and there are
desired level of confidence (8). Most software packages will
several methods in use for this. Plotting positions also depend
provide confidence bands automatically. For more detail on
on how suspensions are distributed among the failure data
The Fisher Information matrix, see Annex A6.
points. Two (of many) commonly used plotting positions are
the mean rank, and median rank methods. The simplest way to
7. Systems Reliability
think of a mean rank is the case where there are no suspen-
sions. If the sample size is n, then the plotting position
7.1 A system is a set of interconnected and possibly inter-
(estimate of the cdf at that data value) associated with the ith
acting components or subsystems, or both, that functions as a
order statistic is i/(n + 1). This is theoretically the expected
whole. Systems can take many configurations, but design
fraction falling below the ith order statistic in any sample of
typically considers two fundamental types. Any two parts of a
size n, for any distribution. To calculate median ranks, a beta
system are said to be connected either in a series or a parallel
configuration. A series configuration is similar to a chain
containing some number of links. The chain (system) fails if at
least one of the links fail – all links are required to hold any
specific load or the chain will fail. An active parallel system
will continue to perform in an unfailed state if at least one of
the several components works. In systems design this is
referred to as an active redundant system – all parts see service
but only one is required to maintain system life. This design is
commonly used in safety applications.
7.1.1 Astandby redundant system is the case where several
units are connected in parallel but only one is actually seeing
any service, the redundant units being in a dormant state until
the one unit fails. This may be further complicated by
imperfect switching from one unit to the next. The simplest
systems are either series or parallel configurations. Systems
gain in complexity as combinations of series and parallel
FIG. 2 Weibull Probability Plot subsystems are connected in various ways.
E3159 − 21
7.1.2 In most cases of analyzing simple system reliability, 7.3.4 The actual distribution function of failure time, al-
the individual component reliabilities are assumed independent though simple to derive algebraically, is not a simple result.
of one another. The failure probability may also be used in The easiest and most direct way to calculate the system MTTF
computations sinceR=1–F. and associated distribution for this case is to use Monte Carlo
simulation. This also holds if the rates differ for each compo-
7.2 A series system containing for example three indepen-
nent. Fig. 4 shows an example of the system life resulting from
dent components, with reliabilities R , R and R has a system
a b c
four components in active redundancy. Two of these compo-
reliability given by Eq 17:
nents have an MTTF of 100 hours and two have an MTTF of
R 5 R R R (17)
s a b c
200 hours. The components are independent and exponentially
distributed. A Monte Carlo simulation was used with 500 000
Or, alternatively, in terms of failure probability:
cases.
R 5 ~1 2 F !~1 2 F !~1 2 F ! (18)
s a b c
7.3.5 Typically, random type failure modes are assumed for
7.2.1 Eq 17 may also be used in various ways to solve for reliability planning during the design phase of a system.
the component reliability that would give specified system
7.3.6 In some basic systems analysis, where n components
reliability. For example, suppose that the components are have a fixed reliability and each component is active, the
similar and have equal reliability. Thus R = R = R.Ifthe requirement might be for at least k out of n components to
a b c
desired system reliability is R = 0.999, what is the minimum
work. For example, the simple case of parallel redundancy
s
reliability of the component: above is a one out of three system. When all components have
an equal reliability, R, the binomial distribution is used for this
=0.999 5 0.99967 (19)
purpose. The cumulative binomial distribution withp=R, and
7.3 Aparallel system containing for example three indepen-
n components can be used for this purpose.
dent components, with reliabilities R , R and R has a system
7.3.7 More generally, complex systems are built up from
a b c
reliability given by Eq 20:
various configurations of series and parallel subsystems, with
perhaps some standby or active redundancy. Fig. 5 shows a
R 5 1 2 ~1 2 R !~1 2 R !~1 2 R ! (20)
s a b c
basic version of complexity.
7.3.1 In this configuration, the system works if at least one
7.3.8 Assume that all six components have the same
of the three components works. The reliability calculation is
reliability, R. Assume further that the subsystem with compo-
one minus the product of the three failure probabilities – each
nents 1–4 (System A) as well as the subsystem with compo-
failure given by 1 – R. Fig. 3 shows a simple configuration
nents 5–6 (System B) are in active redundancy. Also note that
graphic for: (a) a series system, and (b) a parallel system.
SubsystemsAand B are in series. Overall system reliability is
7.3.2 In the simple examples above, the reliability of each
calculated by first figuring the individual subsystem reliability
component is considered fixed throughout life. In some cases a
(Systems A and B separately) then combing these results to
constant rate is used instead of probability and these may differ
calculate the series of A and B. The calculation would be:
for each component in the system. Standard formulas are
4 2
R 5 ~1 2 ~1 2 R! !~1 2 ~1 2 R! ! (23)
s
available for figuring composite failure rates and mean life in
series and parallel configurations. For a series configuration
7.3.9 If the component reliability is R = 0.9, the system
containing n independent exponentially distributed units, in
reliability would be R = 0.9899. There are several important
s
series each with a differing rate λ, the system rate is:
methods that are used for analyzing complex systems. Among
i
n
themostimport
...
This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E3159 − 18 E3159 − 21 An American National Standard
Standard Guide for
General Reliability
This standard is issued under the fixed designation E3159; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope
1.1 This guide covers fundamental concepts, applications, and mathematical relationships associated with reliability as used in
industrial areas and as applied to simple components, processes, and systems or complex final products.
1.2 The system of units for this guide is not specified. Quantities in the guide are presented only as illustrations of the method
or of a calculation. Any examples used are not binding on any particular product or industry.
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of
regulatory limitations prior to use.
1.4 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Referenced Documents
2.1 ASTM Standards:
E456 Terminology Relating to Quality and Statistics
E2334 Practice for Setting an Upper Confidence Bound for a Fraction or Number of Non-Conforming items, or a Rate of
Occurrence for Non-Conformities, Using Attribute Data, When There is a Zero Response in the Sample
E2555 Practice for Factors and Procedures for Applying the MIL-STD-105 Plans in Life and Reliability Inspection
E2696 Practice for Life and Reliability Testing Based on the Exponential Distribution
2.2 ISO Standards:
ISO 3534-1 Statistics–Vocabulary and Symbols, Part 1: Probability and General Statistical Terms
ISO Guide 73 Risk Management Vocabulary
3. Terminology
3.1 Definitions:
3.1.1 Unless otherwise noted, terms relating to quality and statistics are as defined in Terminology E456. Other general statistical
terms and terms related to risk are defined in ISO 3534-1 ISO 3534-1 and ISO Guide 73.
This guide is under the jurisdiction of ASTM Committee E11 on Quality and Statistics and is the direct responsibility of Subcommittee E11.40 on Reliability.
Current edition approved April 1, 2018May 1, 2021. Published May 2018July 2021. Originally approved in 2018. Last previous edition approved in 2018 as E3159 – 18.
DOI: 10.1520/E3159-18.10.1520/E3159-21.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
Available from American National Standards Institute (ANSI), 25 W. 43rd St., 4th Floor, New York, NY 10036, http://www.ansi.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3159 − 21
3.1.2 B life, n—for continuous variables, the life at which there is a probability, p, (expressed as a percentage) of failure at or
p
less than this value.
3.1.2.1 Discussion—
Example: The B life is a value of life, t, such that cumulative distribution function, F(t) = 0.1 or 10 %.
3.1.3 failure mode, n—the way in which a device, process or system has failed.
3.1.3.1 Discussion—
Under some set of conditions, any device, process or system may be vulnerable to several failure modes. For example, a tire may
fail in the course of time due to a puncture by a sharp object, from the tire simply wearing out, or from a tire manufacturing
anomaly. Each of these describe different failure modes. These three failure modes are said to be competing with respect to the
failure event.
3.1.4 hazard rate, n—differential fraction of items failing at time t among those surviving up to time t, symbolized by h(t). E2555
3.1.4.1 Discussion—
h(t) is also referred to as the instantaneous failure rate at time t and called a hazard function. It is related to the probability density
(pdf) and cumulative distribution function (cdf) by h(t) = f(t)/(l – F(t)), where f(t) is the pdf and F(t) the cdf.
3.1.5 mean time between failures (MTBF), n—the average time to failure for a repairable item.
3.1.5.1 Discussion—
A repairable system is one that can be repaired and returned to service following a failure. When an item is repaired, it may not
necessarily be returned to service in as good as new condition. There may be a reduction in life in a repaired item making the item
not as robust as a new item. Any failure-repair sequence may continue for several cycles, further reducing longevity of service
following each repair time. Often the more times the item is repaired, the smaller will be the expected remaining life until the next
repair. However, some repairable systems (for example, electronic) may just have some components replaced from time to time
rendering the unit as good as new. In those cases, MTBF is the same thing as MTTF.
3.1.6 mean time to failure (MTTF), θ, n—in life testing, the average length of life of items in a lot. E2696
3.1.7 reliability, n—the probability that a component, device, product, process or system will function or fulfill a function after
a specified duration of time or usage under specified conditions.
3.2 Definitions of Terms Specific to This Standard:
3.2.1 failure mode, n—the way in which a device, process or system has failed.
3.2.1.1 Discussion—
Under some set of conditions, any device, process or system may be vulnerable to several failure modes. For example, a tire may
fail in the course of time due to a puncture by a sharp object, from the tire simply wearing out, or from a tire manufacturing
anomaly. Each of these describe different failure modes. These three failure modes are said to be competing with respect to the
failure event.
3.2.1 non-repairable system, n—a system that is intended for a single use and discarded/replaced following its first failure.
3.2.2 repairable system, n—a system that is intended to be used through multiple failure-repair cycles.
4. Significance and Use
4.1 The theory of reliability is used for estimating and demonstrating the probability of survival at specific times or for specific
usage cycles for simple components, devices, assemblies, processes, and systems. As reliability is one key dimension of quality,
it may be more generally used as a measure of quality over time or over a usage or demand sequence.
4.1.1 Many industries require performance metrics and requirements that are reliability-centered. Reliability assessments may be
needed for the determination of maintenance requirements, for spare parts allocation, for life cycle cost analysis and for warranty
purposes. This guide summarizes selected concepts, terminology, formulas, and methods associated with reliability and its
application to products and processes. Many mathematical relationships and methods are found in the annexes. For general
statistical terms not found in Section 3, Terminology E456 and ISO 3534-1 can be used for definitional purposes and ISO Guide
73 for general terminology regarding risk analysis.
E3159 − 21
4.2 The term “system” implies a configuration of interacting components, sub-assemblies, materials, and possibly processes all
acting together to make the system work as a whole. Parts of the system may be linked in combinations of series and parallel
configuration and redundancy used in some parts to improve reliability. Additional conditions of complex engineering may have
to be considered.
4.3 Process reliability concerns the assessment of any type of well-defined process. This can include manufacturing processes,
business processes, and dispatch/demand type processes. Assessment typically measures the extent to which the process can
continually perform its intended function without “upset” as well as process robustness.
4.4 A number of reliability metrics are in use. For example, mean time to failure (MTTF) is a common measure of average life
or average time to the first time a unit fails. For this reason it is said to apply to non-repairable systems. Other life percentiles (or
quantiles) are in use such as for example a B life or that life at which there is p % expected failure. Thus, the B or median life
p 50
is the life at which 50 % of items would be expected to fail as well as survive; The B life is the life at which would be expected
0.1
a 1 in 1000 failure probability 0.1 % failure) and a 99.9 % reliability.
4.4.1 Failure rate and average failure rate are also common metrics in reliability. With failure rates, it is important to understand
that a rate may be changing with time and this may be increasing, decreasing or some combination of these over the life of a
product or service. The failure rate may also be constant.
4.5 Bench testing of a device is used to obtain early reliability assessment or to demonstrate a specific reliability requirement or
a related metric. There are a number of key methodologies that are used for this purpose. Demonstration testing may be dependent
on the assumption of a distribution of failure time or may be carried out using nonparametric methods.
4.6 When a system is repaired following failure and placed back into service, we refer to the object as a repairable system. A key
metric for this is the mean time between failure (MTBF); and this is not to be confused with MTTF. When a system is repaired,
it may not be the case that its expected remaining life is as good as a new one. There may be a reduction in expected life following
a repair and this may continue with continuing repair cycles. The MTBF metric applies to all such sequences of repair and
restoration cycles over a service life period. This includes the first time to failure, the 2nd time, the 3rd time, etc.
5. Life Concepts
5.1 Before reliability can be assessed, the measure of life must be selected. Table 1 shows a sample of units that are commonly
used as a measure of life.
5.1.1 Variations of these units can be found as for example the difference between an aircraft total engine operating time (EOT)
and its time/hours in flight or engine flight hours (EFH). Cycles are dependent on ordinary time in that any cycle may last for any
length of time. In another case, continued life may be driven more by calendar time.
5.1.2 A dispatch of a product or service can be used to compute the product’s dispatch reliability (for example, relative frequency
of failure free dispatches without a change to a schedule). Demand cycle is different than ordinary cycles in that a product may
only be demanded infrequently but must be in serviceable condition when called on (for example, fire extinguishers, ambulance
vehicles). Calendar time may be applicable in situations where a product is exposed to field or environmental conditions during
most of its life and would be subject to chemical, thermal or other actions causing performance degradation over time. Shelf life
is applied to many chemical and biological products and is a prime example of the more general “service life” concept.
5.1.3 A product’s service life is a duration of life (in the appropriate life units) over which the manufacturer believes the product
is serviceable or useful. The term useful life is also used synonymously. In some industries, the concept of “mission” is used
interchangeably. A duty cycle is often used to describe the fraction of the time and under what conditions that a product is called
TABLE 1 Common Measures of Product Life
Life Unit Example
operating time hrs., minutes, days
cycles of usage flights, dispatches
calendar time days since new; shelf life
demand cycles unit is demanded occasionally
E3159 − 21
on for its intended use relative to some arbitrary time period. For example jet engines operate at a lower level of stress during the
cruise portion of any flight, whereas during the takeoff and landing portion of the flight, the duty requirements are greater. In a
duty cycle profile, a product may be exposed to a distribution of stress during a typical usage cycle.
5.1.4 In many types of products, components or subsystems, the unit may be subject to life limiting. The unit must be replaced
with a new one immediately upon reaching the life limit, if not failed. Such units have increasing failure rates with age and the
life limit is judiciously selected at a point prior to reaching the unacceptable failure rate. Life limiting is different than service life
in that the former applies to non-repairable items (for example, one use only, then dispose and replace). A concept related to life
limiting is a replacement or preventative maintenance interval. Replacement intervals are commonly found in electro-mechanical
applications such as in machine hardware, automotive or aerospace applications.
5.2 Maintenance Schedules or Intervals—The continued useful life of many types of products is dependent on appropriate
maintenance. Such maintenance is often specified contractually or as part of a warrantee stipulation. Inappropriate product use or
operation of the product outside of an intended usage range may retard or negate the desirable effect of a maintenance interval.
5.3 Failure Modes and Failure Rate—In using reliability calculations consideration should be given to the type of failure mode
that is expected for the specific product and its intended application. Three broad classes of failure modes are in common use. Table
2 describes these. See (1) and (2) for further detail around the failure mode concept.
5.3.1 The term “infant mortality,” borrowed from the biological science, is now common in engineering. Each of these three
classes may contain numerous more specific failure modes – depending on the type of product considered. Associated with each
of the three broad classes of failure mode are the three types of failure rate.
5.3.2 A failure rate (also called force of mortality) is a measure of the rate of failure of currently surviving units at a specific time.
For infant mortality cases, the failure rate decreases with time. The explanation is that the presence of special causes will cause
failure, typically early in the life cycle; the longer a unit survives, the less likely it is infected with the said special cause and hence
the failure rate decreases as the unit ages.
5.3.3 Infant mortality is the reason for conducting a “burn-in” application where products are exposed to usage prior to field
introduction in order to identify potential early failures prior to field use by a customer. For example, this practice is common for
personal computer (PC) manufacturers who want to ensure their machines do not have special cause type defects and will function
immediately upon a customer’s use.
5.3.4 A random failure mode is one that may occur at any time over a service life period but generally may be a rare event. The
frequency of such failures is not age-dependent and is only a function of duration time or size of the observation region (that is,
how long the unit is observed for). Random failures occur at a constant failure rate throughout a service life. Examples include
errors of operation; installation and maintenance mistakes; foreign object damage (including hard objects, liquids, or biological
interferences); other contamination and damage due to extreme environmental conditions including extreme or excessive
conditions of product use. Also, several rare events that collectively can cause a failure (particularly in large systems) may manifest
itself as a random type failure mode. Numerous other random causes may be found.
5.3.5 A wear-out failure mode is generally caused by gradual performance degradation with usage or time, ultimately resulting
in failure. In electro-mechanical applications, causes of this type of failure may be driven by chemical, thermal, mechanical or
electrical stress until some endurance limit is reached causing the failure. Cases of rare catastrophic shocks are more likely random
events than gradual degradation. In all types of products and services understanding the type of failure mode and the potential cause
is important in design considerations and in installing improvements that would make the product more robust.
TABLE 2 General Failure Mode Classes
Class Description
Infant Mortality Early failures due to special causes that typically
only apply to some units in a population.
Random Failures due to random causes that can happen at
any time.
Wearout Failures due to wear or degradation action such as
chemical, thermal, mechanical, or electrical.
The boldface numbers in parentheses refer to the list of references at the end of this standard.
E3159 − 21
5.3.6 In terms of the broader product development cycle the three classes of failure modes are depicted using the bath-tub curve
shown conceptually in Fig. 1. Early in the development cycle, a new product may exhibit certain failure modes, for any number
of reasons that are classified as infant mortality. As the causes of these early failures are removed or corrected and the product
develops and is improved, it moves into a period of random failure with a constant failure rate. This random period is sometimes
used as a basis for warrantee development. With increasing usage, products fall into the wearout period and performance
degradation. In this period, there is an increasing failure rate.
5.3.7 The depiction in Fig. 1 is not to be construed as applying to every type of product, nor its shape the standard form. Some
products may only see a random-wearout life cycle. In other cases, the failure rate may rise to a maximum and then gradually
decline.
5.3.8 In working with large systems where there may be several failure modes related to the failure of different components of
the system, each one causing the system to fail, the failure rates for the several failure modes can be added to give the composite
system failure rate.
5.3.9 When a failure rate is variable (a function of time), as for example with infant mortality cases or the wearout portion of life,
the average failure rate over an interval can be calculated. If the life distribution is a known form, such as a Weibull model, the
instantaneous failure rate curve as well as the average failure rate over an interval can be calculated in closed form (in some cases).
For infant mortality cases, the average failure rate will be greater than the instantaneous rate; for the wear-out case, the average
failure rate will be less than the instantaneous rate.
5.4 Reliability Metrics and Functions—A number of metrics and requirements for reliability are in use. Table 3 lists commonly
applied reliability metrics.
5.4.1 If t is a variable time, the metric R(t) is the reliability function at time t and related to the assumed distribution such as a
Weibull or a lognormal distribution. If t is a discrete variable, such as demand cycles, R(t) may be based on the binomial or Poisson
models where the failure probability on any cycle is constant throughout life. Where a degradation of life occurs with increasing
demand cycles, a discrete version of the Weibull can be used.
5.4.2 For variable data, where a life distribution such as a Weibull is used, the mean of the failure time distribution is commonly
called the mean time to failure or MTTF (3), and understood as the mean of the first failure times. Generally it is used with
non-repairable systems or single use components. For repairable systems, where an object has a recurrence of life following each
repair, the mean of the recurrence life cycles is called the mean time between failure or MTBF. When the failure mode under study
is of the random type, MTTF and MTBF are theoretically the same thing.
5.4.3 For variable data, the B life (0
p
probability) of (100-p) % ) % or a failure probability of p %. %. Care should be exercised when the MTTF is used as an indicator
of reliability since, for random type failure modes, there is an approximate 63.2 % 63.2 % chance of a failure prior to the mean
time. For example if a particular component has been shown to have an MTTF of 10 000 hours for a certain random failure mode,
then under these conditions the reliability at 1000 hours is approximately 90 % 90 % (see Annex A1). Suppose further that the
customer demands the reliability at t = 1000 to be 99 %, 99 %, what would the MTTF have to be to achieve the 99 % 99 %
reliability? For a reliability of 99 % 99 % at t = 1000 hours, the MTTF would have to be approximately 99 500 hours. This
FIG. 1 The “Bathtub Curve”
E3159 − 21
TABLE 3 Reliability Metrics/Requirements
Metric Description
R(t) Reliability at time t.
MTTF and MTBF MTTF or mean time to failure is the mean of the
failure time distribution (1st failure). MTBF or
mean time between failures is the mean failure
time between failures for reparable systems
(1st, 2nd, 3rd, failure etc.).
B Life The pth percentile of the life distribution; for
p
example B or B life are the 0.1th and 10th
0.1 10
percentiles of the life distribution.
Failure Rate Generally applicable for random failure modes in
the units of “events” or defects per unit where
“unit” means some observational region such
as time, space, area, volume, etc.
Dispatch Reliability The probability that a unit will be available and in
good operating condition when demanded.
illustrates that the MTTF by itself should not be taken as a reliability benchmark – without calculating the reliability at some
critical time t. In addition, the use of maintenance or replacement intervals can affect both MTTF and MTBF (see Annex A1).
5.4.4 Other reliability functions may be defined and many of the important related functions are discussed in the annexes to this
standard. References (4-9) contain additional general information about these functions for various types of statistical distributions
as well as plentiful information on reliability more generally.
5.5 Reliability Data—Field performance data are the principle indicator of reliability. How often failures occur, at what times, their
severity and for what reasons are the key reliability intelligence. In development activity including improvement efforts, bench
testing is the main indicator of reliability, but it may be difficult to emulate all possible field conditions.
5.5.1 When a unit is either tested on a “bench” or observed in the field, there are two conditions that can occur: (1) the unit has
failed at a specific time t; and (2) the unit is still in good running condition at time t. If a unit has failed in some way at a specific
time, that is called a complete failure case. If the unit is still working properly, it is called a right suspension or a “run-out” if the
actual failure time, now unknown, is in the future (in statistics, this is referred to as censored on the right). In some cases it is only
known that a unit has failed and not its specific time. The unit’s condition may have been discovered at a time after it had failed
– called a left suspension. In another case it may only be known that a unit has failed sometime between two times – called an
interval suspension. Interval type data is common in certain types of component bench testing.
6. Reliability Estimation and Calculation Methods
6.1 Simple Binomial and Exponential Reliability Functions:
6.1.1 There are cases where a product or service is demanded periodically and could possibly fail (for any number of reasons) at
the time it is demanded but not dependent on its previous usage. The binomial distribution may be used to express the reliability
in its simplest form and this is seen to be related to the more familiar exponential reliability function. For the binomial, the failure
probability p is assumed to be constant throughout life, and failure on any demand assumed independent of the past. Under these
conditions, the reliability following n successful uses of a unit is (see E2334):
n
R~n! 5 ~1 2 p! (1)
6.1.2 Taking the natural log of both gives:
ln R n 5 n ln 1 2 p (2)
$ ~ !% $ ~ !%
6.1.3 When p is small (say p < 0.01), –ln(1 – p) will be approximately equal to p. Upon simplification this gives:
2np
R n 5 e (3)
~ !
6.1.4 Let a be the average time between demand cycles. Then after n demand cycles, the total time that has passed is approximated
by t = na making n = t/a. Substituting this for n gives:
2tp⁄a 2λt
R t 5 e 'e (4)
~ !
E3159 − 21
6.1.5 In the last expression the quantity p/a is in the units of failures per unit time and this is rate constant, λ, here assumed constant
throughout life. Under homogeneous conditions of continued usage the above expression can be used to find the conservative upper
confidence bound for the rate λ, or the required time t that would validate an assumed or desired rate. For confidence C the
relationship starts with:
2λt
e $ 12 C (5)
Solving for λ:
2ln 1 2 C
~ !
λ # (6)
t
6.1.6 Eq 6 is the upper confidence bound on the rate parameter, λ when zero events have been observed in an interval t. Cases
where r failures have been observed in the interval or where a test is aborted following the rth failure are discussed in the Annex
A6.
6.2 Reliability Demonstration Testing:
6.2.1 Test planning concerns how many units to run, for how long, and at what operational parameters (temperature, moisture, etc.)
in order to demonstrate that a reliability requirement has been met at some confidence level. This can be done using parametric
or nonparametric methods. In addition, there can be attribute type test plans and variable type test plans. In cases of variable data
where a specific life distribution is used, such as a Weibull or lognormal, the value of a parameter of that distribution is sometimes
assumed when some engineering/scientific knowledge justifies this assumption.
6.2.2 There are one or more specified output requirements to be demonstrated by using a plan. Generally, but not always, a
confidence requirement is used, such as 90 % or 95 % confidence, that would apply to the final stated result. When confidence is
not specified, it may be assumed to be approximately 50 % as for example, when a point estimate of reliability metric is used. There
are no general industry wide standards as far as confidence is concerned. A confidence level of 95 % or 90 % is very commonly
used, but some industries or applications may require a different value.
6.2.3 Confidence may also depend on the specific application, as for example when safety is a concern. Users should seek out
industrial benchmarks in their specific areas.
6.2.4 A test plan generally consists of a sample size, the test “duration” and a life requirement. There may also be a requirement
for a maximum number of failures allowed by the plan. Life requirement implies how long a specific device should last and with
what reliability at the stated life. The life requirement may be stated as a service life with an associated reliability at the end of
life. Service Life means the useful functioning life at the end of which the device is repaired, overhauled, or disposed of. The units
of service life can be a variable time such as hours or cycles of operation, or can be demand variable for the device as for example
a safety device that has to work when it is called for.
6.2.5 The life requirement can also be a mean life, a median life (B ), a failure rate or a general life point at which there is a stated
reliability. For the latter case, an example requirement might be to demonstrate 99 % reliability at a service life of 2000 hours.
Always, when such a requirement is specified, it must be accompanied by a set of specified test conditions. These conditions are
designed to emulate field conditions or to be somewhat more severe than typical field conditions. In other cases, a duty cycle is
determined that specifies a distribution of field stress that a device would be expected to see in practice. The test plan would
incorporate the duty cycle in some way.
6.2.6 In many cases of testing, it is not economical to implement a given test plan under ordinary usage time or cycles. In such
cases, an accelerated test is used. In an accelerated test, one or more variables are adjusted to an equivalent longer duration of actual
device use. The simplest such case is where a severity multiplier factor is determined at which one test hour or cycle is equivalent
to some number, k, of actual or typical hours of cycles. This is entirely device and application dependent and in many cases a more
complicated relationship between the accelerated variable and the actual time is needed. There are many such accelerated models
used in practice. Reference (10) contains many of these methods.
6.2.7 In cases where a distribution assumption is made, a further assumption may be to assume a value for a parameter of that
distribution. For example, if a Weibull model is chosen and valid, users might assume the value for the shape parameter (also called
the Weibull slope or β). For the lognormal distribution, the scale parameter, σ, is sometimes assumed. If the normal distribution
E3159 − 21
is used, the standard deviation might be assumed. In each of these cases, when the associated parameter is assumed, it is possible
to design a test plan for demonstrating any quantile of the distribution, with any degree of confidence, given the assumed parameter.
6.2.8 Demonstrating a reliability requirement may be more difficult (costly) if all of the parameters of the assumed distribution
are unknown. A distribution parameter, such as discussed above, is sometimes assumed because there might be engineering or
scientific knowledge from prior performance, from industry experience, or from material properties that supports such an
assumption.
6.2.9 Another assumption that is sometimes useful is the scatter factor of a distribution. The scatter factor,f, is the ratio of the
B life to the B life for the assumed distribution. For the Weibull and lognormal distributions, the scatter factor is functionally
50 0.1
related to the Weibull shape parameter, β, and the lognormal scale parameter, σ, respectively (see Annex A2 and Annex A4). In
many cases of materials testing, engineers may know the approximate scatter factor under the general conditions of a specific test.
6.3 For pure attribute pass/fail testing, a “zero failure” test plan is a common theme. The following basic equation, based on the
binomial distribution, relates sample size, n, confidence, C, and reliability, R,(11).
n
R $=12 C (7)
6.3.1 From Eq 7 when any two values are known or assumed, the third may be solved for.
6.3.2 A second common case for pass/fail type data is when a single failure in the sample is allowed. In that case the relation
among n,C and R, based on a binomial model, is:
n21 n
nR 2 ~n 2 1!R $ 12 C (8)
6.3.3 Eq 8 may be solved numerically for any variable when the remaining two are known or assumed. The general case where
r > 1 failures are allowed in n is discussed in Annex A7.
6.4 Demonstration or “substantiation” testing is used to show that a B life is at least some specified value with some specified
p
confidence. A distribution for the failure time is assumed and often that distribution is Weibull, lognormal or extreme value. The
Weibull model may be imposed on Eq 7 resulting in the following equations relating reliability at mission time t to confidence,
m
C, sample size, n, test time, t, and assumed Weibull shape parameter β, (12):
t β
1 m
S D
R t $ 1 2 C n t (9)
~ ! ~ !
m
β
ln~1 2 C! B
p
n 5 (10)
S DS D
ln 1 2 p ⁄ 100 t
~ !
1⁄β
ln 1 2 p ⁄ 100
~ !
1⁄β
B $ tn (11)
S D
p
ln~1 2 C!
6.4.1 In Eq 9 the demonstrated reliability at mission time t is related to confidence, C, sample size, n, test time, t, and assumed
m
Weibull shape, β. In that case n units are tested to time t and all survive. In Eq 10 the sample size is related to the B requirement,
p
the test time, t, confidence, C, and assumed Weibull shape β. In Eq 11 the B life is related to the sample size, n, confidence, C,
p
test time, t, and Weibull shape, β. In each case there is a test time, t, and n units are tested to that time without failure. More general
Weibull plans, where r > 0 failures are allowed are briefly discussed in the Annex A7.
6.4.2 For the lognormal model with location parameter μ, and scale parameter σ, similar equations can be developed. In what
follows the lognormal scale parameter is known or assumed.
Compute:
n
q 5 12=12 C (12)
For test time t, the lower bound at confidence C is:
μ $ μ 5 ln~t! 2σF ~q! (13)
-1
The function F (q) is the inverse standard normal distribution function evaluated at q (see Eq 12). Refer to this lower confidence
bound on the lognormal scale value as μ . Then the lower bound on mission reliability at mission time t is:
0 m
ln~t ! 2 μ
m 0
R t $ 12 F (14)
~ ! S D
m
σ
E3159 − 21
Let Z be the standard normal quantile value at cumulative probability p/100. Then the lower bound on the B life is:
p/100 p
~Z 2 F q !σ1ln t
~ ! ~ !
p⁄100
B $ e (15)
p
NOTE 1—In Eq 14, when t = t, the mission reliability at time t reduces to 1 – q, which is the non-parametric result. The sample size required to state
m
a lower confidence bound on a lognormal quantile B is:
p
ln~1 2 C!
n 5 (16)
ln~B ⁄ t! 2 Z σ
p p
ln F
H S DJ
σ
6.4.3 All of the above formulas are zero failure plans and there are many additional variations on this topic. It is also possible to
derive similar plans that allow a maximum number of failures, r > 0. Some detail is discussed in Annex A7. Further variations of
the Weibull model can be found in (12-15).
6.5 In certain types of testing, it may be possible to test several units at a time. For example, this method is used in the bearing
industry and is called “sudden death” or “first of n” testing (16). The method is particularly useful when the failure mode has a
Weibull distribution. In that case the first failure time or the minimum failure time in n units tested also has a Weibull distribution
with the same shape parameter (β) as the parent Weibull of the individual failure times. If η is the Weibull scale parameter of the
1/β
individual failure times, then η/n will be the scale parameter of the first of n distribution. This is the so called reproductive
property of the Weibull distribution. The first of n methodology is efficient in that failures will occur more rapidly when multiple
units are tested at the same time.
6.5.1 For example if k = 4 sets of n = 6 units are tested until the first failure in 6 occur in each set of 4, one then has 4 failures
1/β
from which to estimate the scale parameter, η/n , of the 1st of 6 distribution. From that estimate, the individual Weibull scale
parameter may be estimated.
6.6 Reliability considerations are best addressed in the design phase of product development and where failure rate requirements
are available so that engineers can factor these into a design. Reliability allocation methods attempt to distribute product strength
in various ways so that the entire system just meets the requirement. There are many ways to do this depending on the system and
its requirements, and in each scenario cost is typically a factor (there will generally be reliability/cost tradeoff). Some popular
methods include the following.
6.6.1 Choosing materials or components, or both, that have superior reliability or material performance properties.
6.6.2 Allocation using various combinations or series and parallel networks and redundancy.
6.6.3 Derating a device or a system means specifying the operational conditions below actual capability.
6.6.4 The use of Failure Mode and Effects Analysis (FMEA) to identify failure modes, their frequency and severity (and possible
latent type failure modes) during development activity.
6.7 The probability plotting technique is most appropriate for field reliability data of the variable type. In a probability plot, failure
times are plotted with consideration given to the number and type of suspensions that are among the data set and the type of
distribution that is assumed to apply to the data. What results is an estimate and plot of the assumed cumulative distribution
function versus time (cdf versus t). The plot is typically scaled so that the assumed distribution plots as a straight line as a function
of time. In certain cases the slope of the resulting line has meaning with respect to the assumed distribution. For example, in a
Weibull plot, the slope measures the shape parameter (β). Most software packages will create probability plots of various kinds
and also return a statistic that measures goodness of fit for the model being used.
6.7.1 In a probability plot, such as Fig. 2, the estimate of the failure probability at each failure point needs to be determined. These
estimates are called plotting positions, and there are several methods in use for this. Plotting positions also depend on how
suspensions are distributed among the failure data points. Two (of many) commonly used plotting positions are the mean rank, and
median rank methods. The simplest way to think of a mean rank is the case where there are no suspensions. If the sample size is
n, then the plotting position (estimate of the cdf at that data value) associated with the ith order statistic is i/(n + 1). This is
theoretically the expected fraction falling below the ith order statistic in any sample of size n, for any distribution. To calculate
median ranks, a beta distribution is used and exact formulas are not available in closed form. A convenient approximation is often
used. Again, associated with the ith order statistic, the median rank plotting position is approximately (i – 0.3)/(n + 0.4). Once
E3159 − 21
FIG. 2 Weibull Probability Plot
again, these formulas apply to cases with no suspensions. Where suspensions are distributed throughout the data, these plotting
positions must be adjusted. Software packages that provide probability plots will do these calculations automatically.
6.7.2 The most commonly used probability plot is the normal probability plot where the normal distribution is being used. This
plot is appropriate for all types of data that can be assumed normal. For reliability type data, it is typically the Weibull, lognormal
or extreme value distributions that are used. The calculation method used in a probability plot can also vary. In Fig. 2, the Weibull
distribution is being used, and a maximum likelihood estimation method is used to estimate the two parameters of the Weibull
model. Median rank plotting positions are also being used.
6.7.3 In constructing a probability plot it is also possible to create confidence bands around the model (straight line portion). There
are several ways to do this including parametric and nonparametric methods, and Monte Carlo simulation. Many software packages
offer at least several options to create confidence bands on probability plots. A common method uses the estimated standard errors
of the parameter estimates and takes advantage of the asymptotic normality property of the maximum likelihood parameter
estimates (MLE). The standard error estimates are typically supplied in the form of matrix – the Fisher Information matrix – that
provides estimates of the variances and covariance of the MLE’s. These values are then used in standard formulas to create
confidence bands for any desired level of confidence (8). Most software packages will provide confidence bands automatically. For
more detail on The Fisher Information matrix, see Annex A6.
7. Systems Reliability
7.1 A system is a set of interconnected and possibly interacting components or subsystems, or both, that functions as a whole.
Systems can take many configurations, but design typically considers two fundamental types. Any two parts of a system are said
to be connected either in a series or a parallel configuration. A series configuration is similar to a chain containing some number
of links. The chain (system) fails if at least one of the links fail – all links are required to hold any specific load or the chain will
fail. An active parallel system will continue to perform in an unfailed state if at least one of the several components works. In
systems design this is referred to as an active redundant system – all parts see service but only one is required to maintain system
life. This design is commonly used in safety applications.
7.1.1 A standby redundant system is the case where several units are connected in parallel but only one is actually seeing any
service, the redundant units being in a dormant state until the one unit fails. This may be further complicated by imperfect
switching from one unit to the next. The simplest systems are either series or parallel configurations. Systems gain in complexity
as combinations of series and parallel subsystems are connected in various ways.
7.1.2 In most cases of analyzing simple system reliability, the individual component reliabilities are assumed independent of one
another. The failure probability may also be used in computations since R = 1 – F.
7.2 A series system containing for example three independent components, with reliabilities R ,R and R has a system reliability
a b c
given by Eq 17:
R 5 R R R (17)
s a b c
Or, alternatively, in terms of failure probability:
E3159 − 21
R 5 1 2 F 1 2 F 1 2 F (18)
~ !~ !~ !
s a b c
7.2.1 Eq 17 may also be used in various ways to solve for the component reliability that would give specified system reliability.
For example, suppose that the components are similar and have equal reliability. Thus R = R = R . If the desired system reliability
a b c
is R = 0.999, what is the minimum reliability of the component:
s
=0.999 5 0.99967 (19)
7.3 A parallel system containing for example three independent components, with reliabilities R ,R and R has a system reliability
a b c
given by Eq 20:
R 5 12 1 2 R 1 2 R 1 2 R (20)
~ !~ !~ !
s a b c
7.3.1 In this configuration, the system works if at least one of the three components works. The reliability calculation is one minus
the product of the three failure probabilities – each failure given by 1 – R.Fig. 3 shows a simple configuration graphic for: (a) a
series system, and (b) a parallel system.
7.3.2 In the simple examples above, the reliability of each component is considered fixed throughout life. In some cases a constant
rate is used instead of probability and these may differ for each component in the system. Standard formulas are available for
figuring composite failure rates and mean life in series and parallel configurations. For a series configuration containing n
independent exponentially distributed units, in series each with a differing rate λ , the system rate is:
i
n
λ 5 λ (21)
s ( i
i51
The series system would still have an exponential distribution with rate λ in Eq 21.
s
7.3.3 For a parallel system, with n independent units in active parallel redundancy, each with identical rate λ, the system life is
not exponential, but the mean life can be determined using:
n
1 1
MTTF 5 (22)
s (
λ r
r51
7.3.4 The actual distribution function of failure time, although simple to derive algebraically, is not a simple result. The easiest
and most direct way to calculate the system MTTF and associated distribution for this case is to use Monte Carlo simulation. This
also holds if the rates differ for each component. Fig. 4 shows an example of the system life resulting from four components in
active redundancy. Two of these components have an MTTF of 100 hours and two have an MTTF of 200 hours. The components
are independent and exponentially distributed. A Monte Carlo simulation was used with 500 000 cases
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...