ISO/IEC 60559:2020
(Main)Information technology — Microprocessor Systems — Floating-Point arithmetic
Information technology — Microprocessor Systems — Floating-Point arithmetic
This standard specifies interchange and arithmetic formats and methods for binary and decimal floating-point arithmetic in computer programming environments. This standard specifies exception conditions and their default handling. An implementation of a floating-point system conforming to this standard may be realized entirely in software, entirely in hardware, or in any combination of software and hardware. For operations specified in the normative part of this standard, numerical results and exceptions are uniquely determined by the values of the input data, sequence of operations, and destination formats, all under user control.
Technologies de l'information — Systèmes de microprocesseurs — Arithmétique flottante
General Information
Relations
Standards Content (Sample)
ISO/IEC 60559
Edition 2.0 2020-05
™
IEEE Std 754
INTERNATIONAL
STANDARD
Floating-point arithmetic
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form
or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from
ISO, IEC or IEEE at the respective address given below.
ISO copyright office IEC Central Office Institute of Electrical and Electronics Engineers, Inc.
Case postale 56 3, rue de Varembé 3 Park Avenue
CH-1211 Geneva 20 CH-1211 Geneva 20 New York, NY 10016-5997
Switzerland Switzerland United States of America
Tel.: +41 22 749 01 11 Tel.: +41 22 919 02 11 stds.info@ieee.org
copyright@iso.org info@iec.ch www.ieee.org
www.iso.org www.iec.ch
About the IEC
The International Electrotechnical Commission (IEC) is the leading global organization that prepares and publishes
International Standards for all electrical, electronic and related technologies.
About IEC publications
The technical content of IEC publications is kept under constant review by the IEC. Please make sure that you have the
latest edition, a corrigendum or an amendment might have been published.
IEC publications search - webstore.iec.ch/advsearchform Electropedia - www.electropedia.org
The advanced search enables to find IEC publications by a The world's leading online dictionary on electrotechnology,
variety of criteria (reference number, text, technical containing more than 22 000 terminological entries in English
committee,…). It also gives information on projects, replaced and French, with equivalent terms in 16 additional languages.
and withdrawn publications. Also known as the International Electrotechnical Vocabulary
(IEV) online.
IEC Just Published - webstore.iec.ch/justpublished
Stay up to date on all new IEC publications. Just Published IEC Glossary - std.iec.ch/glossary
details all new publications released. Available online and 67 000 electrotechnical terminology entries in English and
once a month by email. French extracted from the Terms and definitions clause of
IEC publications issued between 2002 and 2015. Some
IEC Customer Service Centre - webstore.iec.ch/csc entries have been collected from earlier publications of IEC
If you wish to give us your feedback on this publication or TC 37, 77, 86 and CISPR.
need further assistance, please contact the Customer Service
Centre: sales@iec.ch.
ISO/IEC 60559
Edition 2.0 2020-05
IEEE Std 754™
INTERNATIONAL
STANDARD
Floating-point arithmetic
INTERNATIONAL
ELECTROTECHNICAL
COMMISSION
ICS 35.200 ISBN 978-2-8322-8178-9
IEEE Std 754-2019
– 4 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
Contents
1. Overview.11
1.1 Scope.11
1.2 Purpose.11
1.3 Inclusions.11
1.4 Exclusions.11
1.5 Programming environment considerations.12
1.6 Word usage.12
2. Definitions, abbreviations, and acronyms.13
2.1 Definitions.13
2.2 Abbreviations and acronyms.15
3. Floating-point formats.16
3.1 Overview.16
3.2 Specification levels.17
3.3 Sets of floating-point data.17
3.4 Binary interchange format encodings.19
3.5 Decimal interchange format encodings.20
3.6 Interchange format parameters.23
3.7 Extended and extendable precisions.25
4. Attributes and rounding.26
4.1 Attribute specification.26
4.2 Dynamic modes for attributes.26
4.3 Rounding-direction attributes.27
5. Operations.29
5.1 Overview.29
5.2 Decimal exponent calculation.30
5.3 Homogeneous general-computational operations.31
5.4 formatOf general-computational operations.33
5.5 Quiet-computational operations.35
5.6 Signaling-computational operations.37
5.7 Non-computational operations.37
5.8 Details of conversions from floating-point to integer formats.39
5.9 Details of operations to round a floating-point datum to integral value.41
5.10 Details of totalOrder predicate.42
5.11 Details of comparison predicates.43
5.12 Details of conversion between floating-point data and external character sequences.44
6. Infinity, NaNs, and sign bit.48
6.1 Infinity arithmetic.48
6.2 Operations with NaNs.48
6.3 The sign bit.50
7. Exceptions and default exception handling.51
7.1 Overview: exceptions and flags.51
7.2 Invalid operation.52
7.3 Division by zero.53
7.4 Overflow.53
7.5 Underflow.53
7.6 Inexact.54
8. Alternate exception handling attributes.55
8.1 Overview.55
8.2 Resuming alternate exception handling attributes.55
8.3 Immediate and delayed alternate exception handling attributes.56
IEEE Std 754-2019 – 5 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
9. Recommended operations.58
9.1 Conforming language- and implementation-defined operations.58
9.2 Additional mathematical operations.58
9.3 Dynamic mode operations.65
9.4 Reduction operations.66
9.5 Augmented arithmetic operations.68
9.6 Minimum and maximum operations.69
9.7 NaN payload operations.71
10. Expression evaluation.72
10.1 Expression evaluation rules.72
10.2 Assignments, parameters, and function values.72
10.3 preferredWidth attributes for expression evaluation.73
10.4 Literal meaning and value-changing optimizations.74
11. Reproducible floating-point results.75
Annex A (informative) Bibliography.77
Annex B (informative) Program debugging support.79
Annex C (informative) List of operations.81
Annex D (informative) IEEE list of participants.83
IEEE Std 754-2019
– 6 –
FLOATING-POINT ARITHMETIC
FOREWORD
1) The International Electrotechnical Commission (IEC) is a worldwide organization for standardization comprising
all national electrotechnical committees (IEC National Committees). The object of IEC is to promote
international co-operation on all questions concerning standardization in the electrical and electronic fields. To
this end and in addition to other activities, IEC publishes International Standards, Technical Specifications,
Technical Reports, Publicly Available Specifications (PAS) and Guides (hereafter referred to as “IEC
Publication(s)”). Their preparation is entrusted to technical committees; any IEC National Committee interested
in the subject dealt with may participate in this preparatory work. International, governmental and non-
governmental organizations liaising with the IEC also participate in this preparation.
IEEE Standards documents are developed within IEEE Societies and Standards Coordinating Committees of the
IEEE Standards Association (IEEE-SA) Standards Board. IEEE develops its standards through a consensus
development process, which brings together volunteers representing varied viewpoints and interests to achieve
the final product. Volunteers are not necessarily members of IEEE and serve without compensation. While IEEE
administers the process and establishes rules to promote fairness in the consensus development process, IEEE
does not independently evaluate, test, or verify the accuracy of any of the information contained in its
standards. Use of IEEE Standards documents is wholly voluntary. IEEE documents are made available for use
subject to important notices and legal disclaimers (see http://standards.ieee.org/ipr/disclaimers.html for more
information).
IEC collaborates closely with IEEE in accordance with conditions determined by agreement between the two
organizations.
2) The formal decisions of IEC on technical matters express, as nearly as possible, an international consensus of
opinion on the relevant subjects since each technical committee has representation from all interested IEC
National Committees. The formal decisions of IEEE on technical matters, once consensus within IEEE Societies
and Standards Coordinating Committees has been reached, is determined by a balanced ballot of materially
interested parties who indicate interest in reviewing the proposed standard. Final approval of the IEEE
standards document is given by the IEEE Standards Association (IEEE-SA) Standards Board.
3) IEC/IEEE Publications have the form of recommendations for international use and are accepted by IEC
National Committees/IEEE Societies in that sense. While all reasonable efforts are made to ensure that the
technical content of IEC/IEEE Publications is accurate, IEC or IEEE cannot be held responsible for the way in
which they are used or for any misinterpretation by any end user.
4) In order to promote international uniformity, IEC National Committees undertake to apply IEC Publications
(including IEC/IEEE Publications) transparently to the maximum extent possible in their national and regional
publications. Any divergence between any IEC/IEEE Publication and the corresponding national or regional
publication shall be clearly indicated in the latter.
5) IEC and IEEE do not provide any attestation of conformity. Independent certification bodies provide conformity
assessment services and, in some areas, access to IEC marks of conformity. IEC and IEEE are not responsible
for any services carried out by independent certification bodies.
6) All users should ensure that they have the latest edition of this publication.
7) No liability shall attach to IEC or IEEE or their directors, employees, servants or agents including individual
experts and members of technical committees and IEC National Committees, or volunteers of IEEE Societies
and the Standards Coordinating Committees of the IEEE Standards Association (IEEE-SA) Standards Board,
for any personal injury, property damage or other damage of any nature whatsoever, whether direct or indirect,
or for costs (including legal fees) and expenses arising out of the publication, use of, or reliance upon, this
IEC/IEEE Publication or any other IEC or IEEE Publications.
8) Attention is drawn to the normative references cited in this publication. Use of the referenced publications is
indispensable for the correct application of this publication.
9) Attention is drawn to the possibility that implementation of this IEC/IEEE Publication may require use of
material covered by patent rights. By publication of this standard, no position is taken with respect to the
existence or validity of any patent rights in connection therewith. IEC or IEEE shall not be held responsible for
identifying Essential Patent Claims for which a license may be required, for conducting inquiries into the legal
validity or scope of Patent Claims or determining whether any licensing terms or conditions provided in
connection with submission of a Letter of Assurance, if any, or in any licensing agreements are reasonable or
non-discriminatory. Users of this standard are expressly advised that determination of the validity of any patent
rights, and the risk of infringement of such rights, is entirely their own responsibility.
IEEE Std 754-2019 – 7 –
International Standard ISO/IEC 60559/IEEE Std 754 has been processed through ISO/IEC
subcommittee 25: Interconnection of information technology equipment, of ISO/IEC joint
technical committee 1: Information technology, under the IEC/IEEE Dual Logo Agreement.
The text of this standard is based on the following documents:
IEEE Std FDIS Report on voting
754 (2019) JTC1-SC25/2933/FDIS JTC1-SC25/2936/RVD
Full information on the voting for the approval of this standard can be found in the report on
voting indicated in the above table.
The IEC Technical Committee and IEEE Technical Committee have decided that the contents
of this publication will remain unchanged until the stability date indicated on the IEC web site
under "http://webstore.iec.ch" in the data related to the specific publication. At this date, the
publication will be
• reconfirmed,
• withdrawn,
• replaced by a revised edition, or
• amended.
IEEE Std 754-2019
– 8 –
P754 2.63
IEEE Std 754™-2019
(Revision of IEEE Std 754-2008)
IEEE Standard for Floating-Point
Arithmetic
Sponsor
Microprocessor Standards Committee
of the
IEEE Computer Society
Approved 13 June 2019
IEEE-SA Standards Board
IEEE Std 754-2019 – 9 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
Abstract: This standard specifies interchange and arithmetic formats and methods for binary and
decimal floating-point arithmetic in computer programming environments. This standard specifies
exception conditions and their default handling. An implementation of a floating-point system
conforming to this standard may be realized entirely in software, entirely in hardware, or in any
combination of software and hardware. For operations specified in the normative part of this
standard, numerical results and exceptions are uniquely determined by the values of the input
data, sequence of operations, and destination formats, all under user control.
Keywords: arithmetic, binary, computer, decimal, exponent, floating-point, format, IEEE 754™,
interchange, NaN, number, rounding, significand, subnormal.
IEEE Std 754-2019
– 10 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
IEEE Introduction
This introduction is not part of IEEE Std 754-2019, IEEE Standard for Floating-Point Arithmetic.
This standard is a product of the Floating-Point Working Group of, and sponsored by, the Microprocessor
Standards Committee of the IEEE Computer Society.
This standard provides a discipline for performing floating-point computation that yields results
independent of whether the processing is done in hardware, software, or a combination of the two. For
operations specified in the normative part of this standard, numerical results and exceptions are uniquely
determined by the values of the input data, the operation, and the destination, all under user control.
This standard defines a family of commercially feasible ways for systems to perform binary and decimal
floating-point arithmetic. Among the desiderata that guided the formulation of this standard were:
a) Facilitate movement of existing programs from diverse computers to those that adhere to this
standard as well as among those that adhere to this standard.
b) Enhance the capabilities and safety available to users and programmers who, although not
expert in numerical methods, might well be attempting to produce numerically sophisticated
programs.
c) Encourage experts to develop and distribute robust and efficient numerical programs that are
portable, by way of minor editing and recompilation, onto any computer that conforms to this
standard and possesses adequate capacity. Together with language controls it should be possible to
write programs that produce identical results on all conforming systems.
d) Provide direct support for
― execution-time diagnosis of anomalies
― smoother handling of exceptions
― interval arithmetic at a reasonable cost.
e) Provide for development of
― common elementary functions such as exp or cos
― high precision (multiword) arithmetic
― coupled numerical and symbolic algebraic computation.
f) Enable rather than preclude further refinements and extensions.
In programming environments, this standard is also intended to form the basis for a dialog between the
numerical community and programming language designers. It is hoped that language-defined methods for
the control of expression evaluation and exceptions might be defined in coming years, so that it will be
possible to write programs that produce identical results on all conforming systems. However, it is
recognized that utility and safety in languages are sometimes antagonists, as are efficiency and portability.
Therefore, it is hoped that language designers will look on the full set of operation, precision, and
exception controls described here as a guide to providing the programmer with the ability to portably
control expressions and exceptions. It is also hoped that designers will be guided by this standard to
provide extensions in a completely portable way.
Informative annexes provide additional information – Annex A lists bibliographical resources, Annex B
suggests programming environment features for debugging support, and Annex C lists all references to the
operations of the standard.
IEEE Std 754-2019 – 11 –
I
Floating-Point Arithmetic
1.0
1. Overview
1.1.0
1.1 Scope
This standard specifies formats and operations for floating-point arithmetic in computer systems. Exception
conditions are defined and handling of these conditions is specified.
1.2.0
1.2 Purpose
This standard provides a method for computation with floating-point numbers that will yield the same
result whether the processing is done in hardware, software, or a combination of the two. The results of the
computation will be identical, independent of implementation, given the same input data. Errors, and error
conditions, in the mathematical processing will be reported in a consistent manner regardless of
implementation.
1.3.0
1.3 Inclusions
This standard specifies:
― Formats for binary and decimal floating-point data, for computation and data interchange.
― Addition, subtraction, multiplication, division, fused multiply add, square root, compare, and
other operations.
― Conversions between integer and floating-point formats.
― Conversions between different floating-point formats.
― Conversions between floating-point formats and external representations as character sequences.
― Floating-point exceptions and their handling, including data that are not numbers (NaNs).
1.4.0
1.4 Exclusions
This standard does not specify:
― Formats of integers.
― Interpretation of the sign and significand fields of NaNs.
IEEE Std 754-2019
– 12 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
1.5.0
1.5 Programming environment considerations
This standard specifies floating-point arithmetic in two radices, 2 and 10. A programming environment
may conform to this standard in one radix or in both.
This standard does not define all aspects of a conforming programming environment. Such behavior should
be defined by a programming language definition supporting this standard, if available, and otherwise by a
particular implementation. Some programming language specifications might permit some behaviors to be
defined by the implementation.
Language-defined behavior should be defined by a programming language standard supporting this
standard. Then all implementations conforming both to this floating-point standard and to that language
standard behave identically with respect to such language-defined behaviors. Standards for languages
intended to reproduce results exactly on all platforms are expected to specify behavior more tightly than do
standards for languages intended to maximize performance on every platform.
Because this standard requires facilities that are not currently available in common programming
languages, the standards for such languages might not be able to fully conform to this standard if they are
no longer being revised. If the language can be extended by a function library or class or package to
provide a conforming environment, then that extension should define all the language-defined behaviors
that would normally be defined by a language standard.
Implementation-defined behavior is defined by a specific implementation of a specific programming
environment conforming to this standard. Implementations define behaviors not specified by this standard
nor by any relevant programming language standard or programming language extension.
Conformance to this standard is a property of a specific implementation of a specific programming
environment, rather than of a language specification.
However a language standard could also be said to conform to this standard if it were constructed so that
every conforming implementation of that language also conformed automatically to this standard.
1.6.0
1.6 Word usage
In this standard three words are used to differentiate between different levels of requirements and
optionality, as follows:
― may indicates a course of action permissible within the limits of the standard with no implied
preference (“may” means “is permitted to”)
― shall indicates mandatory requirements strictly to be followed in order to conform to the standard
and from which no deviation is permitted (“shall” means “is required to”)
― should indicates that among several possibilities, one is recommended as particularly suitable,
without mentioning or excluding others; or that a certain course of action is preferred but not
necessarily required; or that (in the negative form) a certain course of action is deprecated but not
prohibited (“should” means “is recommended to”).
Further:
― might indicates the possibility of a situation that could occur, with no implication of the
likelihood of that situation (“might” means “could possibly”)
― see followed by a number is a cross-reference to the clause or subclause of this standard identified
by that number
― NOTE introduces text that is informative (that is, is not a requirement of this standard).
IEEE Std 754-2019 – 13 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
2.0
2. Definitions, abbreviations, and acronyms
2.1.0
2.1 Definitions
For the purposes of this standard, the following terms and definitions apply.
applicable attribute: The value of an attribute governing a particular instance of execution of a
computational operation of this standard. Languages specify how the applicable attribute is determined.
arithmetic format: A floating-point format that can be used to represent floating-point operands or results
for the operations of this standard.
attribute: An implicit parameter to operations of this standard, which a user might statically set in a
programming language by specifying a constant value. The term attribute might refer to the parameter (as
in “rounding-direction attribute”) or its value (as in “roundTowardZero attribute”).
basic format: One of five floating-point representations, three binary and two decimal, whose encodings
are specified by this standard, and which can be used for arithmetic. One or more of the basic formats is
implemented in any conforming implementation.
biased exponent: The sum of the exponent and a constant (bias) chosen to make the biased exponent’s
range non-negative.
binary floating-point number: A floating-point number with radix two.
block: A language-defined syntactic unit for which a user can specify attributes. Language standards might
provide means for users to specify attributes for blocks of varying scopes, even as large as an entire
program and as small as a single operation.
canonical encoding: A preferred encoding of a floating-point representation in a format. “Canonical
encoding” also applies to declets, significands of finite numbers, infinities, and NaNs, especially in decimal
formats.
cohort: The set of all floating-point representations that represent a given floating-point number in a given
floating-point format. In this context −0 and +0 are considered distinct and are in different cohorts.
computational operation: An operation that produces floating-point results or that might signal floating-
point exceptions. Computational operations produce results in floating-point or other destination formats
by rounding them to fit if necessary.
correct rounding: This standard’s method of converting an infinitely precise result to a floating-point
number, as determined by the applicable rounding direction. A floating-point number so obtained is said to
be correctly rounded.
decimal floating-point number: A floating-point number with radix ten.
declet: An encoding of three decimal digits into ten bits using the densely packed decimal encoding
scheme. Computational operations accept all 1024 possible declets in operands. Most computational
operations produce only the 1000 canonical declets.
denormalized number: See: subnormal number.
destination: The location for the result of an operation upon one or more operands. A destination might be
either explicitly designated by the user or implicitly supplied by the system (for example, intermediate
results in subexpressions or arguments for procedures). Even though some languages place the results of
intermediate calculations in destinations beyond the user’s control, this standard defines the result of an
operation in terms of that destination’s format and the operands’ values.
dynamic mode: An optional method of dynamically setting attributes by means of operations of this
standard to set, test, save, and restore them.
exception: An event that occurs when an operation on some particular operands has no outcome suitable
for every reasonable application. That operation might signal an exception by invoking default exception
handling or alternate exception handling. Exception handling might signal further exceptions. Recognize
that event, exception, and signal are defined in diverse ways in different programming environments.
IEEE Std 754-2019
– 14 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
exponent: The component of a finite floating-point representation that signifies the integer power to which
the radix is raised in determining the value of that floating-point representation. The exponent e is used
when the significand is regarded as an integer digit and fraction field, and the exponent q is used when the
significand is regarded as an integer; e = q + p − 1 where p is the precision of the format in digits.
extendable precision format: A format with precision and range that are defined under user control.
extended precision format: A format that extends a supported basic format by providing wider precision
and range.
external character sequence: A representation of a floating-point datum as a sequence of characters,
including the character sequences in floating-point literals in program text.
flag: See: status flag.
floating-point datum: A floating-point number or non-number (NaN) that is representable in a floating-
point format. In this standard, a floating-point datum is not always distinguished from its representation or
encoding.
floating-point number: A finite or infinite number that is representable in a floating-point format. A
floating-point datum that is not a NaN. All floating-point numbers, including zeros and infinities, are
signed.
floating-point operation: An operation where an operand or result is a floating-point datum.
floating-point representation: An unencoded member of a floating-point format, representing a finite
number, a signed infinity, a quiet NaN, or a signaling NaN. A representation of a finite number has three
components: a sign, an exponent, and a significand; its numerical value is the signed product of its
significand and its radix raised to the power of its exponent.
format: A set of representations of numerical values and symbols, perhaps accompanied by an encoding.
fusedMultiplyAdd: The operation fusedMultiplyAdd(x, y, z) computes (x × y ) + z as if with unbounded
range and precision, rounding only once to the destination format.
generic operation: An operation of this standard that can take operands of various formats, for which the
formats of the results might depend on the formats of the operands.
homogeneous operation: An operation of this standard that takes operands and returns results all in the
same format.
implementation-defined: Behavior defined by a specific implementation of a specific programming
environment conforming to this standard.
integer format: A format not defined in this standard that represents a subset of the integers and perhaps
additional values representing infinities, NaNs, or negative zeros.
interchange format: A format that has a specific fixed-width encoding defined in this standard.
language-defined: Behavior defined by a programming language standard supporting this standard.
NaN: not a number—a symbolic floating-point datum. There are two kinds of NaN representations: quiet
and signaling. Most operations propagate quiet NaNs without signaling exceptions, and signal the invalid
operation exception when given a signaling NaN operand.
narrower/wider format: If the set of floating-point numbers of one format is a proper subset of another
format, the first is called narrower and the second wider. The wider format might have greater precision,
range, or (usually) both.
non-computational operation: An operation that is not computational.
normal number: For a particular format, a finite non-zero floating-point number with magnitude greater
emin
than or equal to a minimum b value, where b is the radix. Normal numbers can use the full precision
available in a format. In this standard, zero is neither normal nor subnormal.
not a number: See: NaN.
operation: this standard defines required and recommended operations which operate on zero or more
operands and produce results or side effects, such as changes in dynamic modes or flags or control flow, or
both. In this standard, operations are written as named functions; in a specific programming environment
IEEE Std 754-2019 – 15 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
they might be represented by operators, or by families of format-specific functions, or by operations or
functions whose names might differ from those in this standard.
payload: The information, which might be diagnostic, contained in a NaN.
precision: The maximum number p of significant digits that can be represented in a format, or the number
of digits to which a result is rounded.
preferred exponent: For the result of a decimal operation, the value of the exponent q which best reflects
the quanta of the operands when the result is exact.
preferredWidth method: A method used by a programming language to determine the destination formats
for generic operations and functions. Some preferredWidth methods take advantage of the extra range and
precision of wide formats without requiring the program to be written with explicit conversions.
quantum: The quantum of a finite floating-point representation is the value of a unit in the last position of
its significand. This is equal to the radix raised to the exponent q, which is used when the significand is
regarded as an integer.
quiet operation: An operation that never signals any floating-point exception.
radix: The base for the representation of binary or decimal floating-point numbers, two or ten.
result: The floating-point representation or encoding that is delivered to the destination.
signal: When an operation on some particular operands has no outcome suitable for every reasonable
application, that operation might signal one or more exceptions by invoking the default handling or, if
explicitly requested, a language-defined alternate handling selected by the user.
significand: A component of a finite floating-point number containing its significant digits. The significand
can be thought of as an integer, a fraction, or some other fixed-point form, by choosing an appropriate
exponent offset. A decimal or subnormal binary significand can also contain leading zeros, which are not
significant.
status flag: A variable that can take two states, raised or lowered. When raised, a status flag might convey
additional system-dependent information, possibly inaccessible to some users. The operations of this
standard, when exceptional, can as a side effect raise some of the following status flags: inexact, underflow,
overflow, divideByZero, and invalid operation.
subnormal number: In a particular format, a non-zero floating-point number with magnitude less than the
magnitude of that format’s smallest normal number. A subnormal number does not use the full precision
available to normal numbers of the same format.
supported format: A floating-point format provided in the programming environment and implemented in
conformance with the requirements of this standard. Thus, a programming environment might provide
more formats than it supports, as only those implemented in accordance with the standard are said to be
supported. Also, an integer format is said to be supported if conversions between that format and supported
floating-point formats are provided in conformance with this standard.
trailing significand field: A component of an encoded binary or decimal floating-point format containing
all the significand digits except the leading digit. In these formats, the biased exponent or combination field
encodes or implies the leading significand digit.
user: Any person, hardware, or program not itself specified by this standard, having access to and
controlling those operations of the programming environment specified in this standard.
width of an operation: The format of the destination of an operation specified by this standard; it will be
one of the supported formats provided by an implementation in conformance to this standard.
2.2.0
2.2 Abbreviations and acronyms
LSB least significant bit
MSB most significant bit
NaN not a number
qNaN quiet NaN
sNaN signaling NaN
IEEE Std 754-2019
– 16 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
3.0
3. Floating-point formats
3.1.0
3.1 Overview
3.1.1.0
3.1.1 Formats
This clause defines floating-point formats, which are used to represent a finite subset of real numbers (see
3.2). Formats are characterized by their radix, precision, and exponent range, and each format can represent
a unique set of floating-point data (see 3.3).
All formats can be supported as arithmetic formats; that is, they may be used to represent floating-point
operands or results for the operations described in later clauses of this standard.
Specific fixed-width encodings for binary and decimal formats are defined in this clause for a subset of the
formats (see 3.4 and 3.5). These interchange formats are identified by their size (see 3.6) and can be used
for the exchange of floating-point data between implementations.
Five basic formats are defined in this clause:
― Three binary formats, with encodings in lengths of 32, 64, and 128 bits.
― Two decimal formats, with encodings in lengths of 64 and 128 bits.
Additional arithmetic formats are recommended for extending these basic formats (see 3.7).
The choice of which of this standard’s formats to support is language-defined or, if the relevant language
standard is silent or defers to the implementation, implementation-defined. The names used for formats in
this standard are not necessarily those used in programming environments.
3.1.2.0
3.1.2 Conformance
A conforming implementation of any supported format shall provide means to initialize that format and
shall provide conversions between that format and all other supported formats.
A conforming implementation of a supported arithmetic format shall provide all the operations of this
standard defined in Clause 5, for that format.
A conforming implementation of a supported interchange format shall provide means to read and write that
format using a specific encoding defined in this clause, for that format.
A programming environment conforms to this standard, in a particular radix, by implementing one or more
of the basic formats of that radix as both a supported arithmetic format and a supported interchange format.
IEEE Std 754-2019 – 17 –
IEEE Std 754-2019
IEEE Standard for Floating-Point Arithmetic
3.2.0
3.2 Specification levels
Floating-point arithmetic is a systematic approximation of real arithmetic, as illustrated in Table 3.1.
Floating-point arithmetic can only represent a finite subset of the continuum of real numbers. Consequently
certain properties of real arithmetic, such as associativity of addition, do not always hold for floating-point
arithmetic.
3.2.0
Table 3.1—Relationships between different specification levels for a particular format
Level
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...