Information technology — Programming languages, their environments and system software interfaces — Technical Report on C++ Performance

The aim of ISO/IEC TR 18015 is to: give the reader a model of time and space overheads implied by use of various C++ language and library features; debunk widespread myths about performance problems in C++; present techniques for use of C++ in applications where performance matters; and present techniques for implementing C++ standard language and library facilities to yield efficient code. The special needs of embedded systems programming are presented, including ROMability and predictability. A separate chapter presents general C and C++ interfaces to the basic hardware facilities of embedded systems.

Technologies de l'information — Langages de programmation, leurs environnements et interfaces du logiciel système — Rapport technique sur la performance C++

General Information

Status
Published
Publication Date
03-Sep-2006
Current Stage
9093 - International Standard confirmed
Completion Date
30-Oct-2013
Ref Project

Buy Standard

Technical report
ISO/IEC TR 18015:2006 - Information technology -- Programming languages, their environments and system software interfaces -- Technical Report on C++ Performance
English language
197 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

TECHNICAL ISO/IEC
REPORT TR
18015
First edition
2006-09-01


Information technology — Programming
languages, their environments and
system software interfaces — Technical
Report on C++ Performance
Technologies de l'information — Langages de programmation, leurs
environnements et interfaces du logiciel système — Rapport technique
sur la performance C++




Reference number
ISO/IEC TR 18015:2006(E)
©
ISO/IEC 2006

---------------------- Page: 1 ----------------------
ISO/IEC TR 18015:2006(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.


©  ISO/IEC 2006
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2006 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance

Contents
Contents . iii
Foreword .v
Introduction .vi
Participants . vii
1 Scope .1
2 Normative References .3
3 Terms and definitions.4
4 Typical Application Areas. 13
4.1 Embedded Systems .13
4.2 Servers .15
5 Language Features:  Overheads and Strategies. 16
5.1 Namespaces.16
5.2 Type Conversion Operators .17
5.3 Classes and Inheritance.18
5.4 Exception Handling .27
5.5 Templates.37
5.6 Programmer Directed Optimizations.41
6 Creating Efficient Libraries.63
6.1 The Standard IOStreams Library – Overview.63
6.2 Optimizing Libraries – Reference Example: "An Efficient Implementation of
Locales and IOStreams".64
7 Using C++ in Embedded Systems.77
7.1 ROMability .77
7.2 Hard Real-Time Considerations.81
8 Hardware Addressing Interface .85
8.1 Introduction to Hardware Addressing .86
8.2 The Interface for C and C++ .102
8.3 The Interface for C++ .108
© ISO/IEC 2006 — All rights reserved iii

---------------------- Page: 3 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
Annex A: Guidelines on Using the Interface.118
A.1 Usage Introduction.118
A.2 Using Hardware Register Designator Specifications.118
A.3 Hardware Access .121
Annex B: Implementing the iohw Interfaces. 124
B.1 General Implementation Considerations.124
B.2 Overview of Hardware Device Connection Options .125
B.3 Hardware Register Designators for Different Device Addressing Methods.128
B.4 Atomic Operation.130
B.5 Read-Modify-Write Operations and Multi-Addressing.130
B.6 Initialization.131
B.7 Intrinsic Features for Hardware Register Access.133
B.8 Implementation Guidelines for the Interface .134
Annex C: A Implementation for the
Interface .149
C.1 Implementation of the Basic Access Functions.149
C.2 Buffer Functions.150
C.3 Group Functionality.151
C.4 Remarks.155
Annex D: Timing Code. 156
D.1 Measuring the Overhead of Class Operations .156
D.2 Measuring Template Overheads.165
D.3 The Stepanov Abstraction Penalty Benchmark .171
D.4 Comparing Function Objects to Function Pointers.177
D.5 Measuring the Cost of Synchronized I/O .181
Annex E: Bibliography . 184
iv  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 4 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International
Electrotechnical Commission) form the specialized system for worldwide
standardization. National bodies that are members of ISO or IEC participate in the
development of International Standards through technical committees established by the
respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work. In the field of information technology, ISO and IEC have
established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the
ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards.
Draft International Standards adopted by the joint technical committee are circulated to
national bodies for voting. Publication as an International Standard requires approval by
at least 75 % of the national bodies casting a vote.
In exceptional circumstances, the joint technical committee may propose the publication
of a Technical Report of one of the following types:
— type 1, when the required support cannot be obtained for the publication of an
International Standard, despite repeated efforts;
— type 2, when the subject is still under technical development or where for any other
reason there is the future but not immediate possibility of an agreement on an
International Standard;
— type 3, when the joint technical committee has collected data of a different kind from
that which is normally published as an International Standard (“state of the art”, for
example).
Technical Reports of types 1 and 2 are subject to review within three years of
publication, to decide whether they can be transformed into International Standards.
Technical Reports of type 3 do not necessarily have to be reviewed until the data they
provide are considered to be no longer valid or useful.
Attention is drawn to the possibility that some of the elements of this document may be
the subject of patent rights. ISO and IEC shall not be held responsible for identifying
any or all such patent rights.
ISO/IEC TR 18015, which is a Technical Report of type 3, was prepared by Joint
Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 22,
Programming languages, their environments and system software interfaces.

© ISO/IEC 2006 — All rights reserved v

---------------------- Page: 5 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

Introduction
“Performance” has many aspects – execution speed, code size, data size, and memory
footprint at run-time, or time and space consumed by the edit/compile/link process. It
could even refer to the time necessary to find and fix code defects. Most people are
primarily concerned with execution speed, although program footprint and memory
usage can be critical for small embedded systems where the program is stored in ROM,
or where ROM and RAM are combined on a single chip.
Efficiency has been a major design goal for C++ from the beginning, as has the principle
of “zero overhead” for any feature that is not used in a program. It has been a guiding
principle from the earliest days of C++ that “you don’t pay for what you don’t use”.
Language features that are never used in a program should not have a cost in extra code
size, memory size, or run-time. If there are places where C++ cannot guarantee zero
overhead for unused features, this Technical Report will attempt to document them. It
will also discuss ways in which compiler writers, library vendors, and programmers can
minimize or eliminate performance penalties, and will discuss the trade-offs among
different methods of implementation.
Programming for resource-constrained environments is another focus of this Technical
Report Typically, programs that run into resource limits of some kind are either very
large or very small. Very large programs, such as database servers, may run into limits of
disk space or virtual memory. At the other extreme, an embedded application may be
constrained to run in the ROM and RAM space provided by a single chip, perhaps a total
of 64K of memory, or even smaller.
Apart from the issues of resource limits, some programs must interface with system
hardware at a very low level. Historically the interfaces to hardware have been
implemented as proprietary extensions to the compiler (often as macros). This has led to
the situation that code has not been portable, even for programs written for a given
environment, because each compiler for that environment has implemented different sets
of extensions.

vi  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 6 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance


Participants

The following people contributed work to this Technical Report:


Dave Abrahams Dietmar Kühl
Mike Ball Jens Maurer
Walter Banks Fusako Mitsuhashi
Greg Colvin Hiroshi Monden
Embedded C++ Technical Committee (Japan) Nathan Myers
Hiroshi Fukutomi Masaya Obata
Lois Goldthwaite Martin O'Riordan
Yenjo Han Tom Plum
John Hauser Dan Saks
Seiji Hayashida Martin Sebor
Howard Hinnant Bill Seymour
Brendan Kehoe Bjarne Stroustrup
Robert Klarer Detlef Vollmann
Jan Kristofferson Willem Wakker


© ISO/IEC 2006 — All rights reserved vii

---------------------- Page: 7 ----------------------
TECHNICAL REPORT ISO/IEC TR 18015:2006(E)

Information technology — Programming languages,
their environments and system software interfaces —
Technical Report on C++ Performance


1 Scope
The aim of this Technical Report is:
● to give the reader a model of time and space overheads implied by use of various
C++ language and library features,
● to debunk widespread myths about performance problems,
● to present techniques for use of C++ in applications where performance matters,
and
● to present techniques for implementing C++ Standard language and library
facilities to yield efficient code.
As far as run-time and space performance are concerned, if you can afford to use C for
an application, you can afford to use C++ in a style that uses C++’s facilities
appropriately for that application.
This Technical Report first discusses areas where performance issues matter, such as
various forms of embedded systems programming and high-performance numerical
computation. After that, the main body of the Technical Report considers the basic cost
of using language and library facilities, techniques for writing efficient code, and the
special needs of embedded systems programming.
Performance implications of object-oriented programming are presented. This discussion
rests on measurements of key language facilities supporting OOP, such as classes, class
member functions, class hierarchies, virtual functions, multiple inheritance, and run-time
type information (RTTI). It is demonstrated that, with the exception of RTTI, current
C++ implementations can match hand-written low-level code for equivalent tasks.
Similarly, the performance implications of generic programming using templates are
discussed. Here, however, the emphasis is on techniques for effective use. Error
handling using exceptions is discussed based on another set of measurements. Both time
and space overheads are discussed. In addition, the predictability of performance of a
given operation is considered.
The performance implications of IOStreams and Locales are examined in some detail
and many generally useful techniques for time and space optimizations are discussed.
© ISO/IEC 2006 — All rights reserved 1

---------------------- Page: 8 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

The special needs of embedded systems programming are presented, including
ROMability and predictability. A separate chapter presents general C and C++ interfaces
to the basic hardware facilities of embedded systems.
Additional research is continuing into techniques for producing efficient C++ libraries
and programs. Please see the WG21 web site at www.open-std.org/jtc1/sc22/wg21 for
example code from this Technical Report and pointers to other sites with relevant
information.
2  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance


2 Normative References
The following referenced documents are indispensable for the application of this
document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.

ISO/IEC 14882:2003, Programming Languages – C++.
Mentions of “the Standard” or “IS” followed by a clause or paragraph number refer to
the above International Standard for C++. Section numbers not preceded by “IS” refer
to locations within this Technical Report.
© ISO/IEC 2006 — All rights reserved 3

---------------------- Page: 10 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
ABC
commonly used shorthand for an Abstract Base Class – a base class (often a virtual base
class) which contains pure virtual member functions and thus cannot be
instantiated (§IS-10.4)
3.2
access method
refers to the way a memory cell or an I/O device is connected to the processor system
and the way in which it is addressed
3.3
addressing range
portion of the total of memory addresses accessible through processor instructions
NOTE A processor has one or more addressing ranges. Program memory, data
memory and I/O devices may have special ranges which can only be
addressed with special processor instructions. A processor's physical
address and data bus may be shared among multiple addressing ranges.
3.4
address interleave
gaps in the addressing range which may occur when a device is connected to a processor
data bus which has a bit width larger than the device data bus
3.5
cache
buffer of high-speed memory used to improve access times to medium-speed main
memory or to low-speed storage devices
NOTE If an item is found in cache memory (a "cache hit"), access is faster than
going to the underlying device. If an item is not found (a "cache miss"),
then it must be fetched from the lower-speed device.
4  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance


3.6
code bloat
generation of excessive amounts of code instructions, for instance, from unnecessary
template instantiations
3.7
code size
portion of a program's memory image devoted to executable instructions
NOTE Sometimes immutable data also is placed with the code.
3.8
cross-cast
cast of an object from one base class subobject to another
NOTE This requires RTTI and the use of the dynamic_cast<.> operator.
3.9
data size
the portion of a program's memory image devoted to data with static storage duration
3.10
device
I/O Device
term used to mean either a discrete I/O chip or an I/O function block in a single chip
processor system
NOTE The data bus bit width is significant in the access method used for the
I/O device.
3.11
device bus
I/O device bus
data bus of a device
NOTE The bit width of the device bus may be less than the width of the
processor data bus, in which case it may influence the way the device is
addressed.
© ISO/IEC 2006 — All rights reserved 5

---------------------- Page: 12 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

3.12
device register
I/O device register
single logical register in a device
NOTE A device may contain multiple registers located at different addresses.
3.13
device register buffer
multiple contiguous registers in a device
3.14
device register endianness
endianness for a logical register in a device
NOTE The device register endianness may be different from the endianness used
by the compiler and processor.
3.15
down-cast
cast of an object from a base class subobject, to a more derived class subobject
NOTE Depending on the complexity of the object's type, this may require RTTI
and the use of the dynamic_cast<.> operator.
3.16
Electrically Erasable Programmable Read-Only Memory
EEPROM
similar to flash memory (sometimes called flash EEPROM), the principal difference is
that EEPROM requires data to be erased and written one byte at a time whereas
flash memory requires data to be erased in blocks and written one byte at a time
NOTE EEPROM retains its contents even when the power is turned off, but can
be erased by exposing it to an electrical charge.
6  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance


3.17
endianness
describes the layout in memory of the 0 and 1 bits which together represent a value
NOTE Big-endian and little-endian refer to whether the most significant byte
or the least significant byte is located on the lowest (first) address. If the
width of a data value is larger than the width of data bus of the device
where the value is stored the data value must be located at multiple
processor addresses.
3.18
embedded system
program which functions as part of a device
NOTE Often the software is burned into firmware instead of loaded from a
storage device. It is usually a freestanding implementation rather than a
hosted one with an operating system (§IS-1.4¶7).
3.19
flash memory
non-volatile memory device type which can be read like ROM
NOTE Flash memory can be updated by the processor system. Erasing and
writing often require special handling. Flash memory is considered to be
ROM in this document.
3.20
heap size
portion of a program's memory image devoted to data with dynamic storage duration,
associated with objects created with operator new
3.21
interleave
see address interleave
3.22
Input/Output
I/O
term used in this TR for reading from and writing to device registers (see §8)
© ISO/IEC 2006 — All rights reserved 7

---------------------- Page: 14 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

3.23
I/O bus
special processor addressing range used for input and output operations on hardware
registers in a device
3.24
I/O device
synonym for device
3.25
locality of reference
heuristic that most programs tend to make most memory and disk accesses to locations
near those accessed in the recent past
NOTE Keeping items accessed together in locations near each other increases
cache hits and decreases page faults.
3.26
logical register
device register treated as a single entity
NOTE A logical register will consist of multiple physical device registers if the
width of the device bus is less than the width of the logical register.
3.27
memory bus
processor addressing range used when addressing data memory and/or program
memory
NOTE Some processor architectures have separate data and program memory
buses.
3.28
memory device
chip or function block intended for holding program code and/or data
3.29
memory mapped I/O
I/O devices connected to the processor addressing range which are also used by data
memory
8  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 15 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance


3.30
Mean-Time Between Failures
MTBF
statistically determined average time a device is expected to operate correctly without
failing, used as a measure of a hardware component's reliability
NOTE The calculation takes into account the MTBF of all devices in a system.
The more devices in a system, the lower the system MTBF.
3.31
non-volatile memory
memory device that retains the data it stores, even when electric power is removed
3.32
overlays
technique for handling programs that are larger than available memory, older than
virtual memory addressing
NOTE Different parts of the program are arranged to share the same memory,
with each overlay loaded on demand when another part of the program
calls into it. The use of overlays has largely been succeeded by virtual
memory addressing where it is available, but it may still be used in
memory-limited embedded environments or where precise programmer
or compiler control of memory usage improves performance.
3.33
page
collection of memory addresses treated as a unit for partitioning memory between
applications or swapping out to disk
3.34
page fault
interrupt triggered by an attempt to access a virtual memory address not currently in
physical memory, and thus the need to swap virtual memory from disk to physical
memory
3.35
Plain Old Data
POD
data type which is compatible with the equivalent data type in C in layout, initialization,
and its ability to be copied with memcpy (§IS-1.8¶5)
© ISO/IEC 2006 — All rights reserved 9

---------------------- Page: 16 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

3.36
Programmable Read Only Memory
PROM
equivalent to ROM in the context of this Technical Report
3.37
Random Access Memory
RAM
memory device type for holding data or code
NOTE The RAM content can be modified by the processor. Content in RAM
can be accessed more quickly than that in ROM, but is not persistent
through a power outage.
3.38
real-time
refers to a system in which average performance and throughput must meet defined
goals, but some variation in performance of individual operations can be tolerated
(also soft real-time)
NOTE Hard real-time means that every operation must meet specified timing
constraints.
3.39
Read Only Memory
ROM
memory device type, normally used for holding program code, but may contain data of
static storage duration as well
NOTE Content in ROM can not be modified by the processor.
3.40
ROMable
refers to entities that are appropriate for placement in ROM in order to reduce usage of
RAM or to enhance performance
3.41
ROMability
refers to the process of placing entities into ROM so as to enhance the performance of
programs written in C++
10  © ISO/IEC 2006 — All rights reserved

---------------------- Page: 17 ----------------------
ISO/IEC TR 18015:2006(E) Technical Report on C++ Performance


3.42
Run-Time Type Information
RTTI
information generated by the compiler which makes it possible to determine at run-time
if an object is of a specified type
3.43
stack size
portion of a program's memory image devoted to data with automatic storage duration,
also with certain bookkeeping information to manage the code's flow of control
when calling and returning from functions
NOTE Sometimes the data structures for exception handling are also stored on
the stack (§5.4.1.1).
3.44
swap
swapped out
swapping
the process of moving part of a program’s code or data from fast RAM to a slower form
of storage such as a hard disk
NOTE See also working set and virtual memory addressing.
3.45
System-on-Chip
SoC
embedded system where most of the functionality of the system is implemented on a
single chip, including the processor(s), RAM and ROM
3.46
text size
common alternative name for code size
3.47
User Defined Conversion
UDC
refers to the use, implicit or explicit, of a class member conversion operator
© ISO/IEC 2006 — All rights reserved 11

---------------------- Page: 18 ----------------------
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)

3.48
up-cast
cast of an object to one of its base class subobjects
NOTE This does not require RTTI and can use the static_cast<.>
operator.
3.49
Virtual Base Class
VBC
base class which exists as a single subobject in the inheritance graph, even though
inherited through multiple paths (§IS-10.1¶4)
NOTE In order to share a single instance of a base class, all derived classes must
use the keyword virtual in their base class specifier referring to that base.
3.50
virtual memory addressing
technique for enabling a program to address more memory space than is physically
available
NOTE Typically, portions of the memory space not currently being addressed by
the processor can be “swapped out" to disk space. A mapping function,
sometimes implemented in specialized hardware, translates program
addresses into physical hardware addresses. When the processor needs to
access an address not currently in physical memory, some of the data in
physical memory is written out to disk and some of the stored memory is
read from disk into physical memory. Since reading and writing to disk is
slower than accessing memory devices, minimizing swaps leads to faster
performance.
3.51
working set
portion of a running program that at any given time is physically in memory and not
swapped out to disk or other form of storage device
3.52
Whole Program Analysis
WPA
term used to refer to the process of examining the fully linked and resolved program for
optimization possibilities
NOTE Traditional analysis is performed on a single translation unit (source file)
at
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.