ISO/IEC TR 18015:2006
(Main)Information technology - Programming languages, their environments and system software interfaces - Technical Report on C++ Performance
Information technology - Programming languages, their environments and system software interfaces - Technical Report on C++ Performance
The aim of ISO/IEC TR 18015 is to: give the reader a model of time and space overheads implied by use of various C++ language and library features; debunk widespread myths about performance problems in C++; present techniques for use of C++ in applications where performance matters; and present techniques for implementing C++ standard language and library facilities to yield efficient code. The special needs of embedded systems programming are presented, including ROMability and predictability. A separate chapter presents general C and C++ interfaces to the basic hardware facilities of embedded systems.
Technologies de l'information — Langages de programmation, leurs environnements et interfaces du logiciel système — Rapport technique sur la performance C++
General Information
- Status
- Published
- Publication Date
- 03-Sep-2006
- Technical Committee
- ISO/IEC JTC 1/SC 22 - Programming languages, their environments and system software interfaces
- Drafting Committee
- ISO/IEC JTC 1/SC 22/WG 21 - C++
- Current Stage
- 9093 - International Standard confirmed
- Start Date
- 30-Oct-2013
- Completion Date
- 30-Oct-2025
Overview
ISO/IEC TR 18015:2006 - Technical Report on C++ Performance - is a guidance document from ISO that analyzes the time and space overheads of C++ language and library features and provides techniques for writing and implementing efficient C++ code. The report explains the principle of “zero overhead” and shows how to use C++ in performance-critical contexts, with special emphasis on embedded systems, predictability, and hardware interfacing.
Key Topics
- Performance model: a framework for understanding run-time overhead, code size, and memory footprint implied by C++ features.
- Language features: analysis of namespaces, type conversions, classes and inheritance (including virtual functions and RTTI), exception handling, and templates - covering their time/space costs and optimization strategies.
- Libraries and IOStreams: techniques for creating efficient libraries, with a focused discussion on optimizing IOStreams and Locales.
- Embedded systems: guidance on ROMability, predictability, and memory-constrained programming patterns.
- Hardware interfaces: standardized interfaces and examples for low-level access - notably the <iohw.h> C/C++ interface and the C++ interface, including implementation and usage guidance.
- Measurement and benchmarking: annexes describing timing methodologies, benchmarks (abstraction penalty, function-object vs function-pointer comparisons), and best practices for measuring overhead.
- Implementation guidance: tips for compiler and library implementers to yield efficient code and avoid code bloat from templates or unnecessary instantiations.
Applications
- Designing high-performance server and numerical applications where execution speed and memory footprint matter.
- Developing embedded systems with tight ROM/RAM budgets or real-time constraints where predictability and ROMability are required.
- Implementing or optimizing C++ standard libraries, IO systems, and hardware access layers to be portable and efficient.
- Creating benchmarks and measurements to quantify C++ feature costs in a particular toolchain.
Who should use this standard
- C++ application developers focused on performance or embedded targets.
- Compiler and library implementers optimizing for time, space, and predictable behavior.
- System architects and firmware engineers needing standardized C/C++ hardware access patterns.
- Performance engineers and QA teams measuring and tuning C++ applications.
Related Standards
- ISO/IEC 14882:2003 - the C++ language standard referenced by this Technical Report.
- WG21 resources (C++ standards committee) for example code and ongoing performance research.
Keywords: ISO/IEC TR 18015, C++ performance, embedded systems, IOStreams, templates, exception handling, ROMability, hardware interface, performance optimization, code size, run-time overhead.
Frequently Asked Questions
ISO/IEC TR 18015:2006 is a technical report published by the International Organization for Standardization (ISO). Its full title is "Information technology - Programming languages, their environments and system software interfaces - Technical Report on C++ Performance". This standard covers: The aim of ISO/IEC TR 18015 is to: give the reader a model of time and space overheads implied by use of various C++ language and library features; debunk widespread myths about performance problems in C++; present techniques for use of C++ in applications where performance matters; and present techniques for implementing C++ standard language and library facilities to yield efficient code. The special needs of embedded systems programming are presented, including ROMability and predictability. A separate chapter presents general C and C++ interfaces to the basic hardware facilities of embedded systems.
The aim of ISO/IEC TR 18015 is to: give the reader a model of time and space overheads implied by use of various C++ language and library features; debunk widespread myths about performance problems in C++; present techniques for use of C++ in applications where performance matters; and present techniques for implementing C++ standard language and library facilities to yield efficient code. The special needs of embedded systems programming are presented, including ROMability and predictability. A separate chapter presents general C and C++ interfaces to the basic hardware facilities of embedded systems.
ISO/IEC TR 18015:2006 is classified under the following ICS (International Classification for Standards) categories: 35.060 - Languages used in information technology. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase ISO/IEC TR 18015:2006 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
TECHNICAL ISO/IEC
REPORT TR
First edition
2006-09-01
Information technology — Programming
languages, their environments and
system software interfaces — Technical
Report on C++ Performance
Technologies de l'information — Langages de programmation, leurs
environnements et interfaces du logiciel système — Rapport technique
sur la performance C++
Reference number
©
ISO/IEC 2006
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2006
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2006 – All rights reserved
Contents
Contents . iii
Foreword .v
Introduction .vi
Participants . vii
1 Scope .1
2 Normative References .3
3 Terms and definitions.4
4 Typical Application Areas. 13
4.1 Embedded Systems .13
4.2 Servers .15
5 Language Features: Overheads and Strategies. 16
5.1 Namespaces.16
5.2 Type Conversion Operators .17
5.3 Classes and Inheritance.18
5.4 Exception Handling .27
5.5 Templates.37
5.6 Programmer Directed Optimizations.41
6 Creating Efficient Libraries.63
6.1 The Standard IOStreams Library – Overview.63
6.2 Optimizing Libraries – Reference Example: "An Efficient Implementation of
Locales and IOStreams".64
7 Using C++ in Embedded Systems.77
7.1 ROMability .77
7.2 Hard Real-Time Considerations.81
8 Hardware Addressing Interface .85
8.1 Introduction to Hardware Addressing .86
8.2 The Interface for C and C++ .102
8.3 The Interface for C++ .108
© ISO/IEC 2006 — All rights reserved iii
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
Annex A: Guidelines on Using the Interface.118
A.1 Usage Introduction.118
A.2 Using Hardware Register Designator Specifications.118
A.3 Hardware Access .121
Annex B: Implementing the iohw Interfaces. 124
B.1 General Implementation Considerations.124
B.2 Overview of Hardware Device Connection Options .125
B.3 Hardware Register Designators for Different Device Addressing Methods.128
B.4 Atomic Operation.130
B.5 Read-Modify-Write Operations and Multi-Addressing.130
B.6 Initialization.131
B.7 Intrinsic Features for Hardware Register Access.133
B.8 Implementation Guidelines for the Interface .134
Annex C: A Implementation for the
Interface .149
C.1 Implementation of the Basic Access Functions.149
C.2 Buffer Functions.150
C.3 Group Functionality.151
C.4 Remarks.155
Annex D: Timing Code. 156
D.1 Measuring the Overhead of Class Operations .156
D.2 Measuring Template Overheads.165
D.3 The Stepanov Abstraction Penalty Benchmark .171
D.4 Comparing Function Objects to Function Pointers.177
D.5 Measuring the Cost of Synchronized I/O .181
Annex E: Bibliography . 184
iv © ISO/IEC 2006 — All rights reserved
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International
Electrotechnical Commission) form the specialized system for worldwide
standardization. National bodies that are members of ISO or IEC participate in the
development of International Standards through technical committees established by the
respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work. In the field of information technology, ISO and IEC have
established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the
ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards.
Draft International Standards adopted by the joint technical committee are circulated to
national bodies for voting. Publication as an International Standard requires approval by
at least 75 % of the national bodies casting a vote.
In exceptional circumstances, the joint technical committee may propose the publication
of a Technical Report of one of the following types:
— type 1, when the required support cannot be obtained for the publication of an
International Standard, despite repeated efforts;
— type 2, when the subject is still under technical development or where for any other
reason there is the future but not immediate possibility of an agreement on an
International Standard;
— type 3, when the joint technical committee has collected data of a different kind from
that which is normally published as an International Standard (“state of the art”, for
example).
Technical Reports of types 1 and 2 are subject to review within three years of
publication, to decide whether they can be transformed into International Standards.
Technical Reports of type 3 do not necessarily have to be reviewed until the data they
provide are considered to be no longer valid or useful.
Attention is drawn to the possibility that some of the elements of this document may be
the subject of patent rights. ISO and IEC shall not be held responsible for identifying
any or all such patent rights.
ISO/IEC TR 18015, which is a Technical Report of type 3, was prepared by Joint
Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 22,
Programming languages, their environments and system software interfaces.
© ISO/IEC 2006 — All rights reserved v
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
Introduction
“Performance” has many aspects – execution speed, code size, data size, and memory
footprint at run-time, or time and space consumed by the edit/compile/link process. It
could even refer to the time necessary to find and fix code defects. Most people are
primarily concerned with execution speed, although program footprint and memory
usage can be critical for small embedded systems where the program is stored in ROM,
or where ROM and RAM are combined on a single chip.
Efficiency has been a major design goal for C++ from the beginning, as has the principle
of “zero overhead” for any feature that is not used in a program. It has been a guiding
principle from the earliest days of C++ that “you don’t pay for what you don’t use”.
Language features that are never used in a program should not have a cost in extra code
size, memory size, or run-time. If there are places where C++ cannot guarantee zero
overhead for unused features, this Technical Report will attempt to document them. It
will also discuss ways in which compiler writers, library vendors, and programmers can
minimize or eliminate performance penalties, and will discuss the trade-offs among
different methods of implementation.
Programming for resource-constrained environments is another focus of this Technical
Report Typically, programs that run into resource limits of some kind are either very
large or very small. Very large programs, such as database servers, may run into limits of
disk space or virtual memory. At the other extreme, an embedded application may be
constrained to run in the ROM and RAM space provided by a single chip, perhaps a total
of 64K of memory, or even smaller.
Apart from the issues of resource limits, some programs must interface with system
hardware at a very low level. Historically the interfaces to hardware have been
implemented as proprietary extensions to the compiler (often as macros). This has led to
the situation that code has not been portable, even for programs written for a given
environment, because each compiler for that environment has implemented different sets
of extensions.
vi © ISO/IEC 2006 — All rights reserved
Participants
The following people contributed work to this Technical Report:
Dave Abrahams Dietmar Kühl
Mike Ball Jens Maurer
Walter Banks Fusako Mitsuhashi
Greg Colvin Hiroshi Monden
Embedded C++ Technical Committee (Japan) Nathan Myers
Hiroshi Fukutomi Masaya Obata
Lois Goldthwaite Martin O'Riordan
Yenjo Han Tom Plum
John Hauser Dan Saks
Seiji Hayashida Martin Sebor
Howard Hinnant Bill Seymour
Brendan Kehoe Bjarne Stroustrup
Robert Klarer Detlef Vollmann
Jan Kristofferson Willem Wakker
© ISO/IEC 2006 — All rights reserved vii
TECHNICAL REPORT ISO/IEC TR 18015:2006(E)
Information technology — Programming languages,
their environments and system software interfaces —
Technical Report on C++ Performance
1 Scope
The aim of this Technical Report is:
● to give the reader a model of time and space overheads implied by use of various
C++ language and library features,
● to debunk widespread myths about performance problems,
● to present techniques for use of C++ in applications where performance matters,
and
● to present techniques for implementing C++ Standard language and library
facilities to yield efficient code.
As far as run-time and space performance are concerned, if you can afford to use C for
an application, you can afford to use C++ in a style that uses C++’s facilities
appropriately for that application.
This Technical Report first discusses areas where performance issues matter, such as
various forms of embedded systems programming and high-performance numerical
computation. After that, the main body of the Technical Report considers the basic cost
of using language and library facilities, techniques for writing efficient code, and the
special needs of embedded systems programming.
Performance implications of object-oriented programming are presented. This discussion
rests on measurements of key language facilities supporting OOP, such as classes, class
member functions, class hierarchies, virtual functions, multiple inheritance, and run-time
type information (RTTI). It is demonstrated that, with the exception of RTTI, current
C++ implementations can match hand-written low-level code for equivalent tasks.
Similarly, the performance implications of generic programming using templates are
discussed. Here, however, the emphasis is on techniques for effective use. Error
handling using exceptions is discussed based on another set of measurements. Both time
and space overheads are discussed. In addition, the predictability of performance of a
given operation is considered.
The performance implications of IOStreams and Locales are examined in some detail
and many generally useful techniques for time and space optimizations are discussed.
© ISO/IEC 2006 — All rights reserved 1
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
The special needs of embedded systems programming are presented, including
ROMability and predictability. A separate chapter presents general C and C++ interfaces
to the basic hardware facilities of embedded systems.
Additional research is continuing into techniques for producing efficient C++ libraries
and programs. Please see the WG21 web site at www.open-std.org/jtc1/sc22/wg21 for
example code from this Technical Report and pointers to other sites with relevant
information.
2 © ISO/IEC 2006 — All rights reserved
2 Normative References
The following referenced documents are indispensable for the application of this
document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 14882:2003, Programming Languages – C++.
Mentions of “the Standard” or “IS” followed by a clause or paragraph number refer to
the above International Standard for C++. Section numbers not preceded by “IS” refer
to locations within this Technical Report.
© ISO/IEC 2006 — All rights reserved 3
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
ABC
commonly used shorthand for an Abstract Base Class – a base class (often a virtual base
class) which contains pure virtual member functions and thus cannot be
instantiated (§IS-10.4)
3.2
access method
refers to the way a memory cell or an I/O device is connected to the processor system
and the way in which it is addressed
3.3
addressing range
portion of the total of memory addresses accessible through processor instructions
NOTE A processor has one or more addressing ranges. Program memory, data
memory and I/O devices may have special ranges which can only be
addressed with special processor instructions. A processor's physical
address and data bus may be shared among multiple addressing ranges.
3.4
address interleave
gaps in the addressing range which may occur when a device is connected to a processor
data bus which has a bit width larger than the device data bus
3.5
cache
buffer of high-speed memory used to improve access times to medium-speed main
memory or to low-speed storage devices
NOTE If an item is found in cache memory (a "cache hit"), access is faster than
going to the underlying device. If an item is not found (a "cache miss"),
then it must be fetched from the lower-speed device.
4 © ISO/IEC 2006 — All rights reserved
3.6
code bloat
generation of excessive amounts of code instructions, for instance, from unnecessary
template instantiations
3.7
code size
portion of a program's memory image devoted to executable instructions
NOTE Sometimes immutable data also is placed with the code.
3.8
cross-cast
cast of an object from one base class subobject to another
NOTE This requires RTTI and the use of the dynamic_cast<.> operator.
3.9
data size
the portion of a program's memory image devoted to data with static storage duration
3.10
device
I/O Device
term used to mean either a discrete I/O chip or an I/O function block in a single chip
processor system
NOTE The data bus bit width is significant in the access method used for the
I/O device.
3.11
device bus
I/O device bus
data bus of a device
NOTE The bit width of the device bus may be less than the width of the
processor data bus, in which case it may influence the way the device is
addressed.
© ISO/IEC 2006 — All rights reserved 5
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
3.12
device register
I/O device register
single logical register in a device
NOTE A device may contain multiple registers located at different addresses.
3.13
device register buffer
multiple contiguous registers in a device
3.14
device register endianness
endianness for a logical register in a device
NOTE The device register endianness may be different from the endianness used
by the compiler and processor.
3.15
down-cast
cast of an object from a base class subobject, to a more derived class subobject
NOTE Depending on the complexity of the object's type, this may require RTTI
and the use of the dynamic_cast<.> operator.
3.16
Electrically Erasable Programmable Read-Only Memory
EEPROM
similar to flash memory (sometimes called flash EEPROM), the principal difference is
that EEPROM requires data to be erased and written one byte at a time whereas
flash memory requires data to be erased in blocks and written one byte at a time
NOTE EEPROM retains its contents even when the power is turned off, but can
be erased by exposing it to an electrical charge.
6 © ISO/IEC 2006 — All rights reserved
3.17
endianness
describes the layout in memory of the 0 and 1 bits which together represent a value
NOTE Big-endian and little-endian refer to whether the most significant byte
or the least significant byte is located on the lowest (first) address. If the
width of a data value is larger than the width of data bus of the device
where the value is stored the data value must be located at multiple
processor addresses.
3.18
embedded system
program which functions as part of a device
NOTE Often the software is burned into firmware instead of loaded from a
storage device. It is usually a freestanding implementation rather than a
hosted one with an operating system (§IS-1.4¶7).
3.19
flash memory
non-volatile memory device type which can be read like ROM
NOTE Flash memory can be updated by the processor system. Erasing and
writing often require special handling. Flash memory is considered to be
ROM in this document.
3.20
heap size
portion of a program's memory image devoted to data with dynamic storage duration,
associated with objects created with operator new
3.21
interleave
see address interleave
3.22
Input/Output
I/O
term used in this TR for reading from and writing to device registers (see §8)
© ISO/IEC 2006 — All rights reserved 7
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
3.23
I/O bus
special processor addressing range used for input and output operations on hardware
registers in a device
3.24
I/O device
synonym for device
3.25
locality of reference
heuristic that most programs tend to make most memory and disk accesses to locations
near those accessed in the recent past
NOTE Keeping items accessed together in locations near each other increases
cache hits and decreases page faults.
3.26
logical register
device register treated as a single entity
NOTE A logical register will consist of multiple physical device registers if the
width of the device bus is less than the width of the logical register.
3.27
memory bus
processor addressing range used when addressing data memory and/or program
memory
NOTE Some processor architectures have separate data and program memory
buses.
3.28
memory device
chip or function block intended for holding program code and/or data
3.29
memory mapped I/O
I/O devices connected to the processor addressing range which are also used by data
memory
8 © ISO/IEC 2006 — All rights reserved
3.30
Mean-Time Between Failures
MTBF
statistically determined average time a device is expected to operate correctly without
failing, used as a measure of a hardware component's reliability
NOTE The calculation takes into account the MTBF of all devices in a system.
The more devices in a system, the lower the system MTBF.
3.31
non-volatile memory
memory device that retains the data it stores, even when electric power is removed
3.32
overlays
technique for handling programs that are larger than available memory, older than
virtual memory addressing
NOTE Different parts of the program are arranged to share the same memory,
with each overlay loaded on demand when another part of the program
calls into it. The use of overlays has largely been succeeded by virtual
memory addressing where it is available, but it may still be used in
memory-limited embedded environments or where precise programmer
or compiler control of memory usage improves performance.
3.33
page
collection of memory addresses treated as a unit for partitioning memory between
applications or swapping out to disk
3.34
page fault
interrupt triggered by an attempt to access a virtual memory address not currently in
physical memory, and thus the need to swap virtual memory from disk to physical
memory
3.35
Plain Old Data
POD
data type which is compatible with the equivalent data type in C in layout, initialization,
and its ability to be copied with memcpy (§IS-1.8¶5)
© ISO/IEC 2006 — All rights reserved 9
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
3.36
Programmable Read Only Memory
PROM
equivalent to ROM in the context of this Technical Report
3.37
Random Access Memory
RAM
memory device type for holding data or code
NOTE The RAM content can be modified by the processor. Content in RAM
can be accessed more quickly than that in ROM, but is not persistent
through a power outage.
3.38
real-time
refers to a system in which average performance and throughput must meet defined
goals, but some variation in performance of individual operations can be tolerated
(also soft real-time)
NOTE Hard real-time means that every operation must meet specified timing
constraints.
3.39
Read Only Memory
ROM
memory device type, normally used for holding program code, but may contain data of
static storage duration as well
NOTE Content in ROM can not be modified by the processor.
3.40
ROMable
refers to entities that are appropriate for placement in ROM in order to reduce usage of
RAM or to enhance performance
3.41
ROMability
refers to the process of placing entities into ROM so as to enhance the performance of
programs written in C++
10 © ISO/IEC 2006 — All rights reserved
3.42
Run-Time Type Information
RTTI
information generated by the compiler which makes it possible to determine at run-time
if an object is of a specified type
3.43
stack size
portion of a program's memory image devoted to data with automatic storage duration,
also with certain bookkeeping information to manage the code's flow of control
when calling and returning from functions
NOTE Sometimes the data structures for exception handling are also stored on
the stack (§5.4.1.1).
3.44
swap
swapped out
swapping
the process of moving part of a program’s code or data from fast RAM to a slower form
of storage such as a hard disk
NOTE See also working set and virtual memory addressing.
3.45
System-on-Chip
SoC
embedded system where most of the functionality of the system is implemented on a
single chip, including the processor(s), RAM and ROM
3.46
text size
common alternative name for code size
3.47
User Defined Conversion
UDC
refers to the use, implicit or explicit, of a class member conversion operator
© ISO/IEC 2006 — All rights reserved 11
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
3.48
up-cast
cast of an object to one of its base class subobjects
NOTE This does not require RTTI and can use the static_cast<.>
operator.
3.49
Virtual Base Class
VBC
base class which exists as a single subobject in the inheritance graph, even though
inherited through multiple paths (§IS-10.1¶4)
NOTE In order to share a single instance of a base class, all derived classes must
use the keyword virtual in their base class specifier referring to that base.
3.50
virtual memory addressing
technique for enabling a program to address more memory space than is physically
available
NOTE Typically, portions of the memory space not currently being addressed by
the processor can be “swapped out" to disk space. A mapping function,
sometimes implemented in specialized hardware, translates program
addresses into physical hardware addresses. When the processor needs to
access an address not currently in physical memory, some of the data in
physical memory is written out to disk and some of the stored memory is
read from disk into physical memory. Since reading and writing to disk is
slower than accessing memory devices, minimizing swaps leads to faster
performance.
3.51
working set
portion of a running program that at any given time is physically in memory and not
swapped out to disk or other form of storage device
3.52
Whole Program Analysis
WPA
term used to refer to the process of examining the fully linked and resolved program for
optimization possibilities
NOTE Traditional analysis is performed on a single translation unit (source file)
at a time.
12 © ISO/IEC 2006 — All rights reserved
4 Typical Application Areas
Since no computer has infinite resources, all programs have some kind of limiting
constraints. However, many programs never encounter these limits in practice. Very
small and very large systems are those most likely to need effective management of
limited resources.
4.1 Embedded Systems
Embedded systems have many restrictions on memory-size and timing requirements that
are more significant than are typical for non-embedded systems. Embedded systems are
used in various application areas as follows :
• Scale:
♦ Small
These systems typically use single chips containing both ROM and RAM.
Single-chip systems (System-on-Chip or SoC) in this category typically
hold approximately 32 KBytes for RAM and 32, 48 or 64 KBytes for
ROM .
Examples of applications in this category are:
ƒ engine control for automobiles
ƒ hard disk controllers
ƒ consumer electronic appliances
ƒ smart cards, also called Integrated Chip (IC) cards – about the
size of a credit card, they usually contain a processor system with
code and data embedded in a chip which is embedded (in the
literal meaning of the word) in a plastic card. A typical size is
4 KBytes of RAM, 96 KBytes of ROM and 32 KBytes
EEPROM. An even more constrained smart card in use contains
12 KBytes of ROM, 4 KBytes of flash memory and only 600
Bytes of RAM data storage.
Typical systems during the year 2004.
These numbers are derived from the popular C8051 chipset.
© ISO/IEC 2006 — All rights reserved 13
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
♦ Medium
These systems typically use separate ROM and RAM chips to execute a
fixed application, where size is limited. There are different kinds of
memory device, and systems in this category are typically composed of
several kinds to achieve different objectives for cost and speed.
Examples of applications in this category are:
ƒ hand-held digital VCR
ƒ printer
ƒ copy machine
ƒ digital still camera – one common model uses 32 MBytes of flash
memory to hold pictures, plus faster buffer memory for
temporary image capture, and a processor for on-the-fly image
compression.
♦ Large
These systems typically use separate ROM and RAM devices, where the
application is flexible and the size is relatively unlimited. Examples of
applications in this category are:
ƒ personal digital assistant (PDA) – equivalent to a personal
computer without a separate screen, keyboard, or hard disk
ƒ digital television
ƒ set-top box
ƒ car navigation system
ƒ central controllers for large production lines of manufacturing
machines
• Timing:
Of course, systems with soft real-time or hard real-time constraints are not
necessarily embedded systems; they may run on hosted environments.
♦ Critical (soft real-time and hard real-time systems)
Examples of applications in this category are:
ƒ motor control
ƒ nuclear power plant control
ƒ hand-held digital VCR
ƒ mobile phone
ƒ CD or DVD player
ƒ electronic musical instruments
ƒ hard disk controllers
ƒ digital television
ƒ digital signal processing (DSP) applications
14 © ISO/IEC 2006 — All rights reserved
♦ Non-critical
Examples of applications in this category are:
ƒ digital still camera
ƒ copy machine
ƒ printer
ƒ car navigation system
4.2 Servers
For server applications, the performance-critical resources are typically speed (e.g.
transactions per second), and working-set size (which also impacts throughput
and speed). In such systems, memory and data storage are measured in terms of
megabytes, gigabytes or even terabytes.
Often there are soft real-time constraints bounded by the need to provide service
to many clients in a timely fashion. Some examples of such applications include
the central computer of a public lottery where transactions are heavy, or large
scale high-performance numerical applications, such as weather forecasting,
where the calculation must be completed within a certain time.
These systems are often described in terms of dozens or even hundreds of
multiprocessors, and the prime limiting factor may be the Mean Time Between
Failure (MTBF) of the hardware (increasing the amount of hardware results in a
decrease of the MTBF – in such a case, high-efficiency code would result in
greater robustness).
© ISO/IEC 2006 — All rights reserved 15
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
5 Language Features:
Overheads and Strategies
Does the C++ language have inherent complexities and overheads which make it
unsuitable for performance-critical applications? For a program written in the C-
conforming subset of C++, will penalties in code size or execution speed result from
using a C++ compiler instead of a C compiler? Does C++ code necessarily result in
“unexpected” functions being called at run-time, or are certain language features, like
multiple inheritance or templates, just too expensive (in size or speed) to risk using? Do
these features impose overheads even if they are not explicitly used?
This Technical Report examines the major features of the C++ language that are
perceived to have an associated cost, whether real or not:
• Namespaces
• Type Conversion Operators
• Inheritance
• Run-Time Type Information (RTTI)
• Exception handling (EH)
• Templates
• The Standard IOStreams Library
5.1 Namespaces
Namespaces do not add any significant space or time overheads to code. They do,
however, add some complexity to the rules for name lookup. The principal advantage of
namespaces is that they provide a mechanism for partitioning names in large projects in
order to avoid name clashes.
Namespace qualifiers enable programmers to use shorter identifier names when
compared with alternative mechanisms. In the absence of namespaces, the programmer
has to explicitly alter the names to ensure that name clashes do not occur. One common
approach to this is to use a canonical prefix on each name:
static char* mylib_name = "My Really Useful Library";
static char* mylib_copyright = "June 15, 2002";
std::cout << "Name: " << mylib_name << std::endl
<< "Copyright: " << mylib_copyright << std::endl;
16 © ISO/IEC 2006 — All rights reserved
Another common approach is to place the names inside a class and use them in their
qualified form:
class ThisLibInfo {
static char* name;
static char* copyright;
};
char* ThisLibInfo::name = "Another Useful Library";
char* ThisLibInfo::copyright = "August 17, 2004";
std::cout << "Name: " << ThisLibInfo::name << std::endl
<< "Copyright: " << ThisLibInfo::copyright << std::endl;
With namespaces, the number of characters necessary is similar to the class alternative,
but unlike the class alternative, qualification can be avoided with using declarations
which move the unqualified names into the current scope, thus allowing the names to be
referenced by their shorter form. This saves the programmer from having to type those
extra characters in the source program, for example:
namespace ThisLibInfo {
char* name = "Yet Another Useful Library";
char* copyright = "December 18, 2003";
};
using ThisLibInfo::name;
using ThisLibInfo::copyright;
std::cout << "Name: " << name << std::endl
<< "Copyright: " << copyright << std::endl;
When referencing names from the same enclosing namespace, no using declaration or
namespace qualification is necessary.
With all names, longer names take up more space in the program’s symbol table and may
add a negligible amount of time to dynamic linking. However, there are tools which will
strip the symbol table from the program image and reduce this impact.
5.2 Type Conversion Operators
C and C++ permit explicit type conversion using cast notation (§IS-5.4), for example:
int i_pi = (int)3.14159;
Standard C++ adds four additional type conversion operators, using syntax that looks like
function templates, for example:
int i = static_cast(3.14159);
© ISO/IEC 2006 — All rights reserved 17
Technical Report on C++ Performance ISO/IEC TR 18015:2006(E)
The four syntactic forms are:
const_cast(expression) // §IS-5.2.11
static_cast(expression) // §IS-5.2.9
reinterpret_cast(expression) // §IS-5.2.10
dynamic_cast(expression) // §IS-5.2.7
The semantics of cast notation (which is still recognized) are the same as the type
conversion operators, but the latter distinguish between the different purposes for which
the cast is being used. The type conversion operator syntax is easier to identify in source
code, and thus contributes to writing programs that are more likely to be correct . It
should be noted that as in C, a cast may create a temporary object of the desired type, so
casting can have run-time implications.
The first three forms of type conversion operator have no size or speed penalty versus the
equivalent cast notation. Indeed, it is typical for a compiler to transform cast notation into
one of the other type conversion operators when generating object code. However,
dynamic_cast may incur some overhead at run-time if the required conversion
involves using RTTI mechanisms such as cross-casting (§5.3.8).
5.3 Classes and Inheritance
Programming in the object-oriented style often involves heavy use of class hierarchies.
This section examines the time and space overheads imposed by the primitive operations
using classes and class hierarchies. Often, the alternative to using class hierarchies is to
perform similar operations using lower-level facilities. For example, the obvious
alternative to a virtual function call is an indirect function call. For this reason, the costs
of primitive operations of classes and class hierarchies are compared to those of similar
functionality implemented without classes. See “Inside the C++ Object Model”
[BIBREF-17] for further information.
Most comments about run-time costs are based on a set of simple measurements
performed on three different machine architectures using six different compilers run with
a variety of optimization options. Each test was run multiple times to ensure that the
results were repeatable. The code is presented in Annex D:. The aim of these
measurements is neither to get a precise statement of optimal performance of C++ on a
given machine nor to provide a comparison between compilers or machine architectures.
Rather, the aim is to give developers a view of relative costs of common language
constructs using current compilers, and also to show what is possible (what is achieved in
one compiler is in principle possible for all). We know – from specialized compilers not
in this study and reports from people using unreleased beta versions of popular
compilers – that better results are possible.
If the compiler does not provide the type conversion operators natively, it is possible to implement them using function templates. Indeed,
prototype implementations of the type conversion operators were often implemented this way.
18 © ISO/IEC 2006 — All rights reserved
In general, the statements about implementation techniques and performance are
believed to be true for the vast majority of current implementations, but are not meant to
cover experimental implementation techniques, which might produce better – or just
different – results.
5.3.1 Representation Overheads
A class without a virtual function requires exactly as much space to represent as a
struct with the same data members. That is, no space overhead is introduced from
using a class compared to a C struct. A class object does not contain any data that
the programmer does not explicitly request (apart from possible padding to achieve
appropriate alignment, which may also be present in C structs). In particular, a non-
virtual function does not take up any space in an object of its class, and neither does a
static data or function member of the class.
A polymorphic class (a class that has one or more virtual functions) incurs a per-
object space overhead of one pointer, plus a per-class space overhead of a “virtual
function table” consisting of one or two words per virtual function. In addition, a per-
class space overhead of a “type information object” (also called “run-time type
information” or RTTI) is typically about 40 bytes per class, consisting of a name string, a
couple of words of other information and another couple of words for each base class.
Whole program analysis (WPA) can be used to eliminate unused virtual function tables
and RTTI data. Such analysis is particularly suitable for relatively small programs that do
not use dynamic linking, and which have to operate in a resource-constrained
environment such as an embedded system.
Some current C++ implementations share data structures between RTTI support and
exception handling support, thereby avoiding representation overhead specifically for
RTTI.
Aggregating data items into a small class or struct can impose a run-time overhead if
the compiler does not use registers effectively, or in other ways fails to take advantage of
possible optimizations when class objects are used. The overheads incurred through
the failure to optimize in such cases are referred to as “the abstraction penalty” and are
usually measured by a benchmark produced by Alex Stepanov (D.3). For example, if
accessing a value through a trivial smart pointer is significantly slower than accessing it
through an ordinary pointer, the compiler is inefficiently handling the abstraction. In the
past, most compilers had significant abstraction penalties and several current compilers
still do. However, at least two compilers have been reported to have abstraction
penalties below 1% and another a penalty of 3%, so eliminating this kind of overhead is
well within the state of the art.
These are production compilers, not just experimental ones.
© ISO/IEC 2006 — All rights reserved 19
--
...
ISO/IEC TR 18015:2006은 C++ 프로그래밍 언어와 관련된 성능에 초점을 맞춘 기술 보고서입니다. 이 보고서는 다양한 C++ 언어 및 라이브러리 기능의 사용으로 인해 발생하는 시간 및 공간 오버헤드를 모델로 제시하고, C++에서의 성능 문제에 대한 일반적인 오해를 해소하며, 고성능을 요구하는 응용 프로그램에서 C++을 사용하는 기법을 제시합니다. 또한, 효율적인 코드를 얻기 위해 C++ 표준 언어 및 라이브러리 기능의 구현 기법도 소개합니다. 이 보고서에는 ROM 가능성과 예측 가능성과 같은 임베디드 시스템 프로그래밍의 특수 요구 사항도 다루고 있습니다. 또한, 별도의 챕터에서는 임베디드 시스템의 기본 하드웨어 기능과의 일반적인 C 및 C++ 인터페이스에 대해 설명하고 있습니다.
ISO/IEC TR 18015:2006 is a technical report that focuses on the performance of C++ programming language and its related features. The report aims to provide readers with insights into the time and space overheads associated with the use of different C++ language and library features. It also aims to dispel common misconceptions about performance issues in C++ and offers techniques for using C++ effectively in applications that require high performance. Additionally, the report includes techniques for implementing C++ standard language and library functionalities to achieve efficient code. The special requirements of embedded systems programming, such as ROMability and predictability, are also addressed in the report. Furthermore, a separate chapter is dedicated to discussing the general interfaces of C and C++ with the fundamental hardware capabilities of embedded systems.
ISO/IEC TR 18015:2006は、C++プログラミング言語とその関連機能の性能に焦点を当てた技術レポートです。このレポートの目的は、さまざまなC++の言語やライブラリの機能使用によって引き起こされる時間と空間のオーバーヘッドを読者に提供し、C++におけるパフォーマンスの問題に関する一般的な誤解を払拭し、高性能を必要とするアプリケーションでのC++の効果的な使用技術を提供することです。また、効率的なコードを得るためのC++の標準言語やライブラリの機能の実装技術も紹介します。 さらに、レポートでは、ROM能力や予測可能性などの組み込みシステムプログラミングの特別な要件にも言及しています。また、別の章では、組み込みシステムの基本的なハードウェア機能との一般的なCおよびC++のインターフェースについても解説しています。










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...