Programming Languages — Technical Specification for C++ Extensions for Parallelism

This document describes requirements for implementations of an interface that computer programs written in the C++ programming language can use to invoke algorithms with parallel execution. The algorithms described by this document are realizable across a broad class of computer architectures. There is a possibility of a subset of the functionality described by this document being standardized in a future version of C++, but it is not currently part of any C++ standard. There is a possibility of some of the functionality in this document never being standardized, or of it being standardized in a substantially changed form. The goal of this document is to build widespread existing practice for parallelism in the C++ programming language. It gives advice on extensions to those vendors who wish to provide them.

Langages de programmation — Spécification technique pour les extensions C++ relatives au parallélisme

General Information

Status
Published
Publication Date
14-Nov-2018
Current Stage
9093 - International Standard confirmed
Start Date
09-Jun-2025
Completion Date
30-Oct-2025
Ref Project

Relations

Technical specification
ISO/IEC TS 19570:2018 - Programming Languages — Technical Specification for C++ Extensions for Parallelism Released:11/15/2018
English language
48 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


TECHNICAL ISO/IEC TS
SPECIFICATION 19570
Second edition
2018-11
Programming Languages — Technical
Specification for C++ Extensions for
Parallelism
Langages de programmation — Spécification technique pour les
extensions C++ relatives au parallélisme
Reference number
©
ISO/IEC 2018
© ISO/IEC 2018
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2018 – All rights reserved

Contents
Foreword iv
1 Scope 1
2 Normative references 2
3 Terms and definitions 3
4 General 4
4.1 Namespaces and headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Feature-testing recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 Parallel exceptions 5
5.1 Header synopsis . . . . . . . . . . . . . . . . . . . . . . . 5
6 Execution policies 6
6.1 Header synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.2 Unsequenced execution policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.3 Vector execution policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.4 Execution policy objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7 Parallel algorithms 8
7.1 Wavefront Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.2 Non-Numeric Parallel Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8 Task Block 16
8.1 Header synopsis . . . . . . . . . . . . . . . . . . . . . . . . 16
8.2 Class task_cancelled_exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.3 Class task_block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.4 Function template define_task_block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
8.5 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9 Data-Parallel Types 20
9.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
9.2 Header synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
9.3 simd ABI tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
9.4 simd type traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.5 Where expression class templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.6 Class template simd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9.7 simd non-member operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.8 Class template simd_mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.9 Non-member operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
c
CONTENTS �ISO/IEC 2018 – All rights reserved iii

Foreword [parallel.foreword]
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members
of ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different
types of document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on the
ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions re-
lated to conformity assessment, as well as information about ISO’s adherence to the World Trade Organization
(WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee
SC 22, Programming languages, their environments and system software interfaces.
This second edition cancels and replaces the first edition (ISO/IEC 19570:2015) which has been technically
revised.
The main changes compared to the previous edition are as follows:
(8.1)
— Eliminate previously standardized functionality.
(8.2)
— Introduce task block.
(8.3)
— Introduce vector and wavefront policies.
(8.4)
— Introduce a template library for parallel for loops.
(8.5)
— Introduce data-parallel vector types.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
c
Foreword �ISO/IEC 2018 – All rights reserved iv

TECHNICAL SPECIFICATION ISO/IEC TS 19570:2018(E)
Programming Languages — Technical Specification for C++
Extensions for Parallelism
1 Scope [parallel.scope]
This document describes requirements for implementations of an interface that computer programs written
++
in the C programming language can use to invoke algorithms with parallel execution. The algorithms
described by this document are realizable across a broad class of computer architectures.
There is a possibility of a subset of the functionality described by this document being standardized in a
++ ++
future version of C , but it is not currently part of any C standard. There is a possibility of some of
the functionality in this document never being standardized, or of it being standardized in a substantially
changed form.
3 ++
The goal of this document is to build widespread existing practice for parallelism in the C programming
language. It gives advice on extensions to those vendors who wish to provide them.
c
Scope �ISO/IEC 2018 – All rights reserved 1

2 Normative references [parallel.references]
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
(1.1)
— ISO/IEC 14882:2017, Programming languages — C++
++ ++
ISO/IEC 14882:2017 is herein called the C Standard. References to clauses within the C Standard are
++
written as “C 17 §20”. The library described in ISO/IEC 14882:2017 clauses 20-33 is herein called the C++
Standard Library. The C++ Standard Library components described in ISO/IEC 14882:2017 clauses 28, 29.8
and 23.10.10 are herein called the C++ Standard Algorithms Library.
3 ++
Unless otherwise specified, the whole of the C++ Standard’s Library introduction (C 17 §20) is included
into this document by reference.
c
Normative references �ISO/IEC 2018 – All rights reserved 2

3 Terms and definitions [parallel.defns]
For the purposes of this document, the terms, definitions, and symbols given in ISO/IEC 14882:2017 apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
(2.1)
— ISO Online browsing platform: available at https://www.iso.org/obp
(2.2)
— IEC Electropedia: available at http://www.electropedia.org
c
Terms and definitions �ISO/IEC 2018 – All rights reserved 3

4 General [parallel.general]
4.1 Namespaces and headers [parallel.general.namespaces]
++
Since the extensions described in this document are experimental and not part of the C Standard Library,
they should not be declared directly within namespace std. Unless otherwise specified, all components
described in this document are declared in namespace std::experimental::parallelism_v2.
[Note: Once standardized, the components described by this document are expected to be promoted to
namespace std.—end note]
Each header described in this document shall import the contents of std::experimental::parallelism_v2
into std::experimental as if by
namespace std::experimental {
inline namespace parallelism_v2 {}
}
Unless otherwise specified, references to such entities described in this document are assumed to be qualified
++
with std::experimental::parallelism_v2, and references to entities described in the C Standard
Library are assumed to be qualified with std::.
Extensions that are expected to eventually be added to an existing header are provided inside the
header, which shall include the standard contents of as if by
#include
4.2 Feature-testing recommendations [parallel.general.features]
An implementation that provides support for this document shall define the feature test macro(s) in Table 1.
Table 1 — Feature-test macro(s)
Title Subclause Macro Name Value Header
Task Block 8 __cpp_lib_experimental_- 201711
parallel_task_block
Vector and 6.2 __cpp_lib_experimental_- 201711
Wavefront execution_vector_policy
Policies
Template 7.2.2, __cpp_lib_experimental_- 201711
Library for 7.2.3, parallel_for_loop
Parallel For 7.2.4
Loops
Data-Parallel 9 __cpp_lib_experimental_- 201803
Vector Types parallel_simd
c
§ 4.2 �ISO/IEC 2018 – All rights reserved 4

5 Parallel exceptions [parallel.exceptions]
5.1 Header synopsis [parallel.exceptions.synopsis]
namespace std::experimental {
inline namespace parallelism_v2 {
class exception_list : public exception {
public:
using iterator = unspecified;
size_t size() const noexcept;
iterator begin() const noexcept;
iterator end() const noexcept;
const char* what() const noexcept override;
};
}
}
The class exception_list owns a sequence of exception_ptr objects.
exception_list::iterator is an iterator which meets the forward iterator requirements and has a value
type of exception_ptr.
size_t size() const noexcept;
Returns: The number of exception_ptr objects contained within the exception_list.
Complexity: Constant time.
iterator begin() const noexcept;
Returns: An iterator referring to the first exception_ptr object returned within the exception_list.
iterator end() const noexcept;
Returns: An iterator that is past the end of the owned sequence.
const char* what() const noexcept override;
Returns: An implementation-defined NTBS.
c
§ 5.1 �ISO/IEC 2018 – All rights reserved 5

6 Execution policies [parallel.execpol]
6.1 Header synopsis [parallel.execpol.synopsis]
#include
namespace std::experimental {
inline namespace parallelism_v2 {
namespace execution {
// 6.2, Unsequenced execution policy
class unsequenced_policy;
// 6.3, Vector execution policy
class vector_policy;
// 6.4, Execution policy objects
inline constexpr unsequenced_policy unseq{ unspecified };
inline constexpr vector_policy vec{ unspecified };
}
}
}
6.2 Unsequenced execution policy [parallel.execpol.unseq]
class unsequenced_policy { unspecified };
The class unsequenced_policy is an execution policy type used as a unique type to disambiguate parallel
algorithm overloading and indicate that a parallel algorithm’s execution may be vectorized, e.g., executed on
a single thread using instructions that operate on multiple data items.
The invocations of element access functions in parallel algorithms invoked with an execution policy of type
unsequenced_policy are permitted to execute in an unordered fashion in the calling thread, unsequenced
with respect to one another within the calling thread. [Note: This means that multiple function object
invocations may be interleaved on a single thread.—end note]
3 ++ ++
[Note: This overrides the usual guarantee from the C Standard, C 17 §4.6 that function executions do
not overlap with one another. —end note]
During the execution of a parallel algorithm with the experimental::execution::unsequenced_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate() will be
called.
6.3 Vector execution policy [parallel.execpol.vec]
class vector_policy { unspecified };
The class vector_policy is an execution policy type used as a unique type to disambiguate parallel
algorithm overloading and indicate that a parallel algorithm’s execution may be vectorized. Additionally,
such vectorization will result in an execution that respects the sequencing constraints of wavefront application
(7.1). [Note: The implementation thus makes stronger guarantees than for unsequenced_policy, for
example.—end note]
The invocations of element access functions in parallel algorithms invoked with an execution policy of type
vector_policy are permitted to execute in unordered fashion in the calling thread, unsequenced with respect
c
§ 6.3 �ISO/IEC 2018 – All rights reserved 6

to one another within the calling thread, subject to the sequencing constraints of wavefront application (7.1)
for the last argument to for_loop, for_loop_n, for_loop_strided, or for_loop_strided_n.
During the execution of a parallel algorithm with the experimental::execution::vector_policy policy,
if the invocation of an element access function exits via an uncaught exception, terminate() will be called.
6.4 Execution policy objects [parallel.execpol.objects]
inline constexpr execution::unsequenced_policy unseq { unspecified };
inline constexpr execution::vector_policy vec { unspecified };
The header declares a global object associated with each type of execution
policy defined by this document.
c
§ 6.4 �ISO/IEC 2018 – All rights reserved 7

7 Parallel algorithms [parallel.alg]
7.1 Wavefront Application [parallel.alg.wavefront]
For the purposes of this subclause, an evaluation is a value computation or side effect of an expression, or an
execution of a statement. Initialization of a temporary object is considered a subexpression of the expression
that necessitates the temporary object.
An evaluation A contains an evaluation B if:
(2.1)
++
— A and B are not potentially concurrent (C 17 §4.7.1); and
(2.2)
— the start of A is the start of B or the start of A is sequenced before the start of B; and
(2.3)
— the completion of B is the completion of A or the completion of B is sequenced before the completion
of A.
[Note: This includes evaluations occurring in function invocations. —end note]
An evaluation A is ordered before an evaluation B if A is deterministically sequenced before B. [Note: If A is
indeterminately sequenced with respect to B or A and B are unsequenced, then A is not ordered before B
and B is not ordered before A. The ordered before relationship is transitive.—end note]
For an evaluation A ordered before an evaluation B, both contained in the same invocation of an element
access function, A is a vertical antecedent of B if:
(4.1)
— there exists an evaluation S such that:
(4.1.1)
— S contains A, and
(4.1.2)
— S contains all evaluations C (if any) such that A is ordered before C and C is ordered before B,
(4.1.3)
— but S does not contain B, and
(4.2)
— control reached B from A without executing any of the following:
(4.2.1)
— a goto statement or asm declaration that jumps to a statement outside of S, or
(4.2.2)
— a switch statement executed within S that transfers control into a substatement of a nested
selection or iteration statement, or
(4.2.3)
— a throw [Note: Even if caught—end note] , or
(4.2.4)
— a longjmp.
[Note: Vertical antecedent is an irreflexive, antisymmetric, nontransitive relationship between two evaluations.
Informally, A is a vertical antecedent of B if A is sequenced immediately before B or A is nested zero or
more levels within a statement S that immediately precedes B. —end note]
In the following, X and X refer to evaluations of the same expression or statement contained in the
i j
th th
application of an element access function corresponding to the i and j elements of the input sequence.
[Note: There can be several evaluations X , Y , etc. of a single expression or statement in application k, for
k k
example, if the expression or statement appears in a loop within the element access function.—end note]
Horizontally matched is an equivalence relationship between two evaluations of the same expression. An
evaluation B is horizontally matched with an evaluation B if:
i j
(6.1)
— both are the first evaluations in their respective applications of the element access function, or
c
§ 7.1 �ISO/IEC 2018 – All rights reserved 8

(6.2)
— there exist horizontally matched evaluations A and A that are vertical antecedents of evaluations B
i j i
and B , respectively.
j
[Note: Horizontally matched establishes a theoretical lock-step relationship between evaluations in different
applications of an element access function.—end note]
Let f be a function called for each argument list in a sequence of argument lists. Wavefront application of f
requires that evaluation A be sequenced before evaluation B if i i j
(7.1)
— A is sequenced before some evaluation B and B is horizontally matched with B , or
i i i j
(7.2)
— A is horizontally matched with some evaluation A and A is sequenced before B .
i j j j
[Note: Wavefront application guarantees that parallel applications i and j execute such that progress on
application j never gets ahead of application i. —end note] [Note: The relationships between A and B
i i
and between A and B are sequenced before, not vertical antecedent. —end note]
j j
7.2 Non-Numeric Parallel Algorithms [parallel.alg.ops]
7.2.1 Header synopsis [parallel.alg.ops.synopsis]
#include
namespace std::experimental {
inline namespace parallelism_v2 {
namespace execution {
// 7.2.5, No vec
template
auto no_vec(F&& f) noexcept -> decltype(std::forward(f)());
// 7.2.6, Ordered update class
template
class ordered_update_t;
// 7.2.7, Ordered update function template
template
ordered_update_t ordered_update(T& ref) noexcept;
}
// Exposition only: Suppress template argument deduction.
template struct type_identity { using type = T; };
template using type_identity_t = typename type_identity::type;
// 7.2.2, Reductions
template
unspecified reduction(T& var, const T& identity, BinaryOperation combiner);
template
unspecified reduction_plus(T& var);
template
unspecified reduction_multiplies(T& var);
template
unspecified reduction_bit_and(T& var);
template
unspecified reduction_bit_or(T& var);
template
unspecified reduction_bit_xor(T& var);
template
unspecified reduction_min(T& var);
c
§ 7.2.1 �ISO/IEC 2018 – All rights reserved 9

template
unspecified reduction_max(T& var);
// 7.2.3, Inductions
template
unspecified induction(T&& var);
template
unspecified induction(T&& var, S stride);
// 7.2.4, For loop
template
void for_loop(type_identity_t start, I finish, Rest&&. rest);
template class I, class. Rest>
void for_loop(ExecutionPolicy&& exec,
type_identity_t start, I finish, Rest&&. rest);
template
void for_loop_strided(type_identity_t start, I finish,
S stride, Rest&&. rest);
template class I, class S, class. Rest>
void for_loop_strided(ExecutionPolicy&& exec,
type_identity_t start, I finish,
S stride, Rest&&. rest);
template
void for_loop_n(I start, Size n, Rest&&. rest);
template class I, class Size, class. Rest>
void for_loop_n(ExecutionPolicy&& exec,
I start, Size n, Rest&&. rest);
template
void for_loop_n_strided(I start, Size n, S stride, Rest&&. rest);
template class I, class Size, class S, class. Rest>
void for_loop_n_strided(ExecutionPolicy&& exec,
I start, Size n, S stride, Rest&&. rest);
}
}
7.2.2 Reductions [parallel.alg.reductions]
Each of the function templates in this subclause (7.2.2) returns a reduction object of unspecified type having
a reduction value type and encapsulating a reduction identity value for the reduction, a combiner function
object, and a live-out object from which the initial value is obtained and into which the final value is stored.
An algorithm uses reduction objects by allocating an unspecified number of instances, known as accumulators,
of the reduction value type. [Note: An implementation can, for example, allocate an accumulator for each
thread in its private thread pool.—end note] Each accumulator is initialized with the object’s reduction
identity, except that the live-out object (which was initialized by the caller) comprises one of the accumulators.
The algorithm passes a reference to an accumulator to each application of an element-access function, ensuring
that no two concurrently executing invocations share the same accumulator. An accumulator can be shared
between two applications that do not execute concurrently, but initialization is performed only once per
accumulator.
Modifications to the accumulator by the application of element access functions accrue as partial results. At
some point before the algorithm returns, the partial results are combined, two at a time, using the reduction
c
§ 7.2.2 �ISO/IEC 2018 – All rights reserved 10

object’s combiner operation until a single value remains, which is then assigned back to the live-out object.
[Note: In order to produce useful results, modifications to the accumulator should be limited to commutative
operations closely related to the combiner operation. For example if the combiner is plus, incrementing
the accumulator would be consistent with the combiner but doubling it or assigning to it would not.—end
note]
template
unspecified reduction(T& var, const T& identity, BinaryOperation combiner);
Requires: T shall meet the requirements of CopyConstructible and MoveAssignable. The expression
var = combiner(var, var) shall be well-formed.
Returns: A reduction object of unspecified type having reduction value type T, reduction identity
identity, combiner function object combiner, and using the object referenced by var as its live-out
object.
template
unspecified reduction_plus(T& var);
template
unspecified reduction_multiplies(T& var);
template
unspecified reduction_bit_and(T& var);
template
unspecified reduction_bit_or(T& var);
template
unspecified reduction_bit_xor(T& var);
template
unspecified reduction_min(T& var);
template
unspecified reduction_max(T& var);
Requires: T shall meet the requirements of CopyConstructible and MoveAssignable.
Returns: A reduction object of unspecified type having reduction value type T, reduction identity and
combiner operation as specified in Table 2 and using the object referenced by var as its live-out object.
Table 2 — Reduction identities and combiner operations
Function Reduction Identity Combiner Operation
reduction_plus T() x + y
reduction_multiplies T(1) x * y
reduction_bit_and (~T()) x & y
reduction_bit_or T() x | y
reduction_bit_xor T() x ˆ y
reduction_min var min(x, y)
reduction_max var max(x, y)
[Example: The following code updates each element of y and sets s to the sum of the squares.
extern int n;
extern float x[], y[], a;
float s = 0;
for_loop(execution::vec, 0, n,
reduction(s, 0.0f, plus<>()),
[&](int i, float& accum) {
y[i] += a*x[i];
c
§ 7.2.2 �ISO/IEC 2018 – All rights reserved 11

accum += y[i]*y[i];
}
);
—end example]
7.2.3 Inductions [parallel.alg.inductions]
Each of the function templates in this subclause return an induction object of unspecified type having an
induction value type and encapsulating an initial value i of that type and, optionally, a stride.
For each element in the input range, an algorithm over input sequence S computes an induction value from
an induction variable and ordinal position p within S by the formula i+p∗stride if a stride was specified or
i+p otherwise. This induction value is passed to the element access function.
An induction object may refer to a live-out object to hold the final value of the induction sequence. When
the algorithm using the induction object completes, the live-out object is assigned the value i+n∗stride,
where n is the number of elements in the input range.
template
unspecified induction(T&& var);
template
unspecified induction(T&& var, S stride);
Returns: An induction object with induction value type remove_cv_t>,
initial value var, and (if specified) stride stride. If T is an lvalue reference to non-const type, then
the object referenced by var becomes the live-out object for the induction object; otherwise there is no
live-out object.
7.2.4 For loop [parallel.alg.forloop]
template
void for_loop(type_identity_t start, I finish, Rest&&. rest);
template
void for_loop(ExecutionPolicy&& exec, type_identity_t start, I finish, Rest&&. rest);
template
void for_loop_strided(type_identity_t start, I finish, S stride, Rest&&. rest);
template
void for_loop_strided(ExecutionPolicy&& exec, type_identity_t start, I finish, S stride,
Rest&&. rest);
template
void for_loop_n(I start, Size n, Rest&&. rest);
template
void for_loop_n(ExecutionPolicy&& exec, I start, Size n, Rest&&. rest);
template
void for_loop_n_strided(I start, Size n, S stride, Rest&&. rest);
template
void for_loop_n_strided(ExecutionPolicy&& exec, I start, Size n, S stride, Rest&&. rest);
Requires: For the overloads with an ExecutionPolicy, I shall be an integral type or meet the
requirements of a forward iterator type; otherwise, I shall be an integral type or meet the requirements
of an input iterator type. Size shall be an integral type and n shall be non-negative. S shall have
integral type and stride shall have non-zero value. stride shall be negative only if I has integral type
or meets the requirements of a bidirectional iterator. The rest parameter pack shall have at least one
c
§ 7.2.4 �ISO/IEC 2018 – All rights reserved 12

element, comprising objects returned by invocations of reduction (7.2.2) and/or induction (7.2.3)
function templates followed by exactly one invocable element-access function, f. For the overloads with
an ExecutionPolicy, f shall meet the requirements of CopyConstructible; otherwise, f shall meet
the requirements of MoveConstructible.
Effects: Applies f to each element in the input sequence, as described below, with additional arguments
corresponding to the reductions and inductions in the rest parameter pack. The length of the input
sequence is:
(2.1)
— n, if specified,
(2.2)
— otherwise finish - start if neither n nor stride is specified,
(2.3)
— otherwise 1 + (finish-start-1)/stride if stride is positive,
(2.4)
— otherwise 1 + (start-finish-1)/-stride.
The first element in the input sequence is start. Each subsequent element is generated by adding
stride to the previous element, if stride is specified, otherwise by incrementing the previous element.
++ ++
[Note: As described in the C Standard, C 17 §28.1, arithmetic on non-random-access iterators
is performed using advance and distance.—end note] [Note: The order of the elements of the
input sequence is important for determining ordinal position of an application of f, even though the
applications themselves may be unordered.—end note]
The first argument to f is an element from the input sequence.[Note: If I is an iterator type, the
iterators in the input sequence are not dereferenced before being passed to f.—end note] For each
member of the rest parameter pack excluding f, an additional argument is passed to each application
of f as follows:
(2.5)
— If the pack member is an object returned by a call to a reduction function listed in subclause 7.2.2,
then the additional argument is a reference to an accumulator of that reduction object.
(2.6)
— If the pack member is an object returned by a call to induction, then the additional argument is
the induction value for that induction object corresponding to the position of the application of f
in the input sequence.
Complexity: Applies f exactly once for each element of the input sequence.
Remarks: If f returns a result, the result is ignored.
7.2.5 No vec [parallel.alg.novec]
template
auto no_vec(F&& f) noexcept -> decltype(std::forward(f)());
Effects: Evaluates std::forward(f)(). When invoked within an element access function in a
parallel algorithm using vector_policy, if two calls to no_vec are horizontally matched within a
wavefront application of an element access function over input sequence S, then the execution of f in
the application for one element in S is sequenced before the execution of f in the application for a
subsequent element in S; otherwise, there is no effect on sequencing.
Returns: The result of f.
Notes: If f exits via an exception, then terminate will be called, consistent with all other potentially-
throwing operations invoked with vector_policy execution.
[Example:
extern float y[];
extern int* p;
for_loop(vec, 0, n, [&](int i) {
y[i] += y[i+1];
if (y[i] < 0) {
c
§ 7.2.5 �ISO/IEC 2018 – All rights reserved 13

no_vec([]{
*p++ = i;
});
}
});
The updates *p++ = i will occur in the same order as if the policy were seq. —end example]
7.2.6 Ordered update class [parallel.alg.ordupdate.class]
template
class ordered_update_t {
T& ref_; // exposition only
public:
ordered_update_t(T& loc) noexcept
: ref_(loc) {}
ordered_update_t(const ordered_update_t&) = delete;
ordered_update_t& operator=(const ordered_update_t&) = delete;
template
auto operator=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ = std::move(rhs); }); }
template
auto operator+=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ += std::move(rhs); }); }
template
auto operator-=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ -= std::move(rhs); }); }
template
auto operator*=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ *= std::move(rhs); }); }
template
auto operator/=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ /= std::move(rhs); }); }
template
auto operator%=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ %= std::move(rhs); }); }
template
auto operator>>=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ >>= std::move(rhs); }); }
template
auto operator<<=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ <<= std::move(rhs); }); }
template
auto operator&=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ &= std::move(rhs); }); }
template
auto operator^=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ ^= std::move(rhs); }); }
template
auto operator|=(U rhs) const noexcept
{ return no_vec([&]{ return ref_ |= std::move(rhs); }); }
auto operator++() const noexcept
{ return no_vec([&]{ return ++ref_; }); }
auto operator++(int) const noexcept
c
§ 7.2.6 �ISO/IEC 2018 – All rights reserved 14

{ return no_vec([&]{ return ref_++; }); }
auto operator--() const noexcept
{ return no_vec([&]{ return --ref_; }); }
auto operator--(int) const noexcept
{ return no_vec([&]{ return ref_--; }); }
};
An object of type ordered_update_t is a proxy for an object of type T intended to be used within a
parallel application of an element access function using a policy object of type vector_policy. Simple
increments, assignments, and compound assignments to the object are forwarded to the proxied object, but
are sequenced as though executed within a no_vec invocation. [Note: The return-value deduction of the
forwarded operations results in these operations returning by value, not reference. This formulation prevents
accidental collisions on accesses to the return value.—end note]
7.2.7 Ordered update function template [parallel.alg.ordupdate.func]
template
ordered_update_t ordered_update(T& loc) noexcept;
Returns: { loc }.
c
§ 7.2.7 �ISO/IEC 2018 – All rights reserved 15

8 Task Block [parallel.taskblock]
8.1 Header synopsis [parallel.taskblock.synopsis]
namespace std::experimental {
inline namespace parallelism_v2 {
class task_cancelled_exception;
class task_block;
template
void define_task_block(F&& f);
template
void define_task_block_restore_thread(F&& f);
}
}
8.2 Class task_cancelled_exception [parallel.taskblock.task_cancelled_exception]
namespace std::experimental {
inline namespace parallelism_v2 {
class task_cancelled_exception : public exception
{
public:
task_cancelled_exception() noexcept;
virtual const char* what() const noexcept override;
};
}
}
The class task_cancelled_exception defines the type of objects thrown by task_block::run or task_-
block::wait if they detect than an exception is pending within the current parallel block. See 8.5, below.
virtual const char* what() const noexcept;
Returns: An implementation-defined NTBS.
8.3 Class task_block [parallel.taskblock.class]
namespace std::experimental {
inline namespace parallelism_v2 {
class task_block
{
private:
task_block();
~
public:
task_block(const task_block&) = delete;
task_block& operator=(const task_block&) = delete;
void operator&() const = delete;
c
§ 8.3 �ISO/IEC 2018 – All rights reserved 16

template
void run(F&& f);
void wait();
};
}
}
The class task_block defines an interface for forking and joining parallel tasks. The define_task_block
and define_task_block_restore_thread function templates create an object of type task_block and pass
a reference to that object to a user-provided function object.
An object of class task_block cannot be constructed, destroyed, copied, or moved except by the implemen-
tation of the task block library. Taking the address of a task_block object via operator& is ill-formed.
Obtaining its address by any other means (including addressof) results in a pointer with an unspecified
value; dereferencing such a pointer results in undefined behavior.
A task_block is active if it was created by the nearest enclosing task block, where task block refers to an
invocation of define_task_block or define_task_block_restore_thread and nearest enclosing means
the most recent invocation that has not yet completed. Code designated for execution in another thread by
means other than the facilities in this subclause (e.g., using thread or async) are not enclosed in the task
block and a task_block passed to (or captured by) such code is not active within that code. Performing any
operation on a task_block that is not active results in undefined behavior.
When the argument to task_block::run is called, no task_block is active, not even the task_block on
which run was called. (The function object should not, therefore, capture a task_block from the surrounding
block.)
[Example:
define_task_block([&](auto& tb) {
tb.run([&]{
tb.run([] { f(); }); // Error: tb is not active within run
define_task_block([&](auto& tb2) { // Define new task block
tb2.run(f);
...
});
});
...
});
—end example]
[Note: Implementations are encouraged to diagnose the above error at translation time. —end note]
template void run(F&& f);
Requires: F shall be MoveConstructible. DECAY_COPY(std::forward(f))() shall be a valid
expression.
Requires: *this shall be the active task_block.
Effects: Evaluates DECAY_COPY(std::forward(f))(), where DECAY_COPY(std::forward(f))
is evaluated synchronously within the current thread. The call to the resulting copy of the function
object is permitted to run on an unspecified thread created by the implementation in an unordered
fashion relative to the sequence of operations following the call to run(f) (the continuation), or
indeterminately sequenced within the same thread as the continuation. The call to run synchronizes
with the call to the function object. The completion of the call to the function object synchronizes with
c
§ 8.3 �ISO/IEC 2018 – All rights reserved 17

the next invocation of wait on the same task_block or completion of the nearest enclosing task block
(i.e., the define_task_block or define_task_block_restore_thread that created this task_block).
Throws: task_cancelled_exception, as described in 8.5.
Remarks: The run function may return on a thread other than the one on which it was called; in such
cases, completion of the call to run synchronizes with the continuation. [Note: The return from run is
ordered similarly to an ordinary function call in a single thread.—end note]
Remarks: The invocation of the user-supplied function object f may be immediate or may be delayed
until compute resources are available. run might or might not return before the invocation of f
completes.
void wait();
Preconditions: *this shall be the active task_block.
Effects: Blocks until the tasks spawned using this task_block have completed.
Throws: task_cancelled_exception, as described in 8.5.
Postconditions: All tasks spawned by the nearest enclosing task block have completed.
Remarks: The wait function may return on a thread other than the one on which it was called; in such
cases, completion of the call to wait synchronizes with subsequent operations. [Note: The return from
wait is ordered similarly to an ordinary function call in a single thread.—end note]
[Example:
define_task_block([&](auto& tb) {
tb.run([&]{ process(a, w, x); }); // Process a[w] through a[x]
if (y < x) tb.wait(); // Wait if overlap between [w,x) and [y,z)
process(a, y, z); // Process a[y] through a[z]
});
—end example]
8.4 Function template define_task_block [parallel.taskblock.define_task_block]
template void define_task_block(F&& f);
template void define_task_block_restore_thread(F&& f);
Requires: Given an lvalue tb of type task_block, the expression f(tb) shall be well-formed.
Effects: Constructs a task_block tb and calls f(tb).
Throws: exception_list, as specified in 8.5.
Postconditions: All tasks spawned from f have finished execution.
Remarks: The define_task_block function may return on a thread other than the one on which it
was called unless there are no task blocks active on entry to define_task_block (see 8.3), in which
case the function returns on the original thread. When define_task_block returns on a different
thread, it synchronizes with operations following the call. [Note: The return from define_task_block
is ordered similarly to an ordinary function call in a single thread.—end note] The define_task_-
block_restore_thread function always returns on the same thread as the one on which it was called.
Notes: It is expected (but not mandated) that f will (directly or indirectly) call tb.run(function-
object).
c
§ 8.4 �ISO/IEC 2018 – All rights reserved 18

8.5 Exception Handling [parallel.taskblock.exceptions]
Every task_block has an associated exception list. When the task block starts, its associated exception list
is empty.
When an exception is thrown from the user-provided function object passed to define_task_block or
define_task_block_restore_thread, it is added to the exception list for that task block. Similarly, when
an exception is thrown from the user-provided function object passed into task_block::run, the exception
object is added to the exception list associated with the nearest enclosing task block. In both cases, an
implementation may discard any pending tasks that have not yet been invoked. Tasks that are already in
progress are not interrupted except at a call to task_block::run or task_block::wait as described below.
If the implementation is able to detect that an exception has been thrown by another task within the same
nearest enclosing task block, then task_block::run or task_block::wait may throw task_canceled_-
exception; these instances of task_canceled_exception are not added to the exception list of the corre-
sponding task block.
When a task block finishes with a non-empty exception list, the exceptions are aggregated into an exception_-
list object, which is then thrown from the task block.
The order of the exceptions in the exception_list object is unspecified.
c
§ 8.5 �ISO/IEC 2018 – All rights reserved 19

9 Data-Parallel Types [parallel.simd]
9.1 General [parallel.simd.general]
The data-parallel library consists of data-parallel types and operations on these types. A data-parallel type
consists of elements of an underlying arithmetic type, called the element type. The number of elements is a
constant for each data-parallel type and called the width of that type.
Throughout this Clause, the term data-parallel type refers to all supported (9.6.1) specializations of the simd
and simd_mask class templates. A data-parallel object is an object of data-parallel type.
An element-wise operation applies a specified operation to the elements of one or more data-parallel objects.
Each such application is unsequenced with respect to the others. A unary element-wise operation is an
element-wise operation that applies a unary operation to
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...