ASTM E2849-18
(Practice)Standard Practice for Professional Certification Performance Testing
Standard Practice for Professional Certification Performance Testing
SIGNIFICANCE AND USE
3.1 This practice for performance testing provides guidance to performance test sponsors, developers, and delivery providers for the planning, design, development, administration, and reporting of high-quality performance tests. This practice assists stakeholders from both the user and consumer communities in determining the quality of performance tests. This practice includes requirements, processes, and intended outcomes for the entities that are issuing the performance test, developing, delivering and evaluating the test, users and test takers interpreting the test, and the specific quality characteristics of performance tests. This practice provides the foundation for both the recognition and accreditation of a specific entity to issue and use effectively a quality performance test.
3.2 Accreditation agencies are presently evaluating performance tests with criteria that were developed primarily or exclusively for multiple-choice examinations. The criteria by which performance tests shall be evaluated and accredited are ones appropriate to performance testing. As accreditation becomes more critical for acceptance by federal and state governments, insurance companies, and international trade, it becomes more critical that appropriate standards of quality and application be developed for performance testing.
SCOPE
1.1 This practice covers both the professional certification performance test itself and specific aspects of the process that produced it.
1.2 This practice does not include management systems. In this practice, the test itself and its administration, psychometric properties, and scoring are addressed.
1.3 This practice primarily addresses individual professional performance certification examinations, although it may be used to evaluate exams used in training, educational, and aptitude contexts. This practice is not intended to address on-site evaluation of workers by supervisors for competence to perform tasks.
1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.
1.5 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
General Information
Relations
Buy Standard
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E2849 −18 An American National Standard
Standard Practice for
1
Professional Certification Performance Testing
This standard is issued under the fixed designation E2849; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2.1.3 construct validity, n—degree to which the test evalu-
ates an underlying theoretical idea resulting from the orderly
1.1 This practice covers both the professional certification
arrangement of facts.
performance test itself and specific aspects of the process that
2.1.4 differential system responsiveness, n—measurable dif-
produced it.
ference in response latency between two systems.
1.2 This practice does not include management systems. In
2.1.5 examinee, n—candidate in the process of taking a test.
this practice, the test itself and its administration, psychometric
properties, and scoring are addressed.
2.1.6 gating item, n—unit of evaluation that shall be passed
to pass a test.
1.3 This practice primarily addresses individual profes-
sional performance certification examinations, although it may
2.1.7 inter-rater reliability, n—measurement of rater consis-
be used to evaluate exams used in training, educational, and tency with other raters.
aptitude contexts. This practice is not intended to address
2.1.7.1 Discussion—See rater reliability.
on-site evaluation of workers by supervisors for competence to
2.1.8 item, n—scored response unit.
perform tasks.
2.1.8.1 Discussion—See task.
1.4 This standard does not purport to address all of the
2.1.9 item observer, n—human or computer element that
safety concerns, if any, associated with its use. It is the
observes and records a candidate’s performance on a specific
responsibility of the user of this standard to establish appro-
item.
priate safety, health, and environmental practices and deter-
2.1.10 on the job, n—another term for “target context.”
mine the applicability of regulatory limitations prior to use.
2.1.10.1 Discussion—See target context.
1.5 This international standard was developed in accor-
dance with internationally recognized principles on standard- 2.1.11 performance test, n—examination in which the re-
sponse modality mimics or reflects the response modality
ization established in the Decision on Principles for the
required in the target context.
Development of International Standards, Guides and Recom-
mendations issued by the World Trade Organization Technical
2.1.12 power test, n—examination in which virtually all
Barriers to Trade (TBT) Committee.
candidates have time to complete all items.
2.1.13 practitioners, n—people who practice the contents of
2. Terminology
the test in the target context.
2.1 Definitions—Some of the terms defined in this section
2.1.14 raterreliability,n—measurementofraterconsistency
are unique to the performance testing context. Consequently,
with a uniform standard.
terms defined in other standards may vary slightly from those
2.1.14.1 Discussion—See inter-rater reliability.
defined in the following.
2.1.15 reconfiguration, n—modificationoftheuserinterface
2.1.1 automatic item generation (AIG), n—a process of
for a process, device, or software application.
computationally generating multiple forms of an item.
2.1.15.1 Discussion—Reconfiguration ranges from adjust-
2.1.2 candidate, n—someone who is eligible to be evaluated
ing the seat in a crane to importing a set of macros into a
through the use of the performance test; a person who is or will
programming environment.
be taking the test.
2.1.16 reliability, n—degree to which the test will make the
same prediction with the same examinee on another occasion
with no training occurring during the intervening interval.
1
This practice is under the jurisdiction of ASTM Committee E36 on Accredi-
2.1.17 rubric, n—set of rules by which performance will be
tation & Certification and is the direct responsibility of Subcommittee E36.80 on
Personnel Performance Testing and Assessment.
judged.
Current edition approved Nov. 1, 2018. Published November 2018. Originally
2.1.18 speeded test, n—examinationthatistime-constrained
approved in 2013. Last previous edition approved in 2013 as E2849 – 13. DOI:
10.1520/E2849-18. so that more than 10 % of candidates do not finish all items.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1
---------------------- Page: 1 ----------------------
E2849 − 18
2.1.19 target context, n—s
...
This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E2849 − 13 E2849 − 18 An American National Standard
Standard Practice for
1
Professional Certification Performance Testing
This standard is issued under the fixed designation E2849; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope
1.1 This practice covers both the professional certification performance test itself and specific aspects of the process that
produced it.
1.2 This practice does not include management systems. In this practice, the test itself and its administration, psychometric
properties, and scoring are addressed.
1.3 This practice primarily addresses individual professional performance certification examinations, although it may be used
to evaluate exams used in training, educational, and aptitude contexts. This practice is not intended to address on-site evaluation
of workers by supervisors for competence to perform tasks.
1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety safety, health, and healthenvironmental practices and determine the
applicability of regulatory limitations prior to use.
1.5 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Terminology
2.1 Definitions—Some of the terms defined in this section are unique to the performance testing context. Consequently, terms
defined in other standards may vary slightly from those defined in the following.
2.1.1 automatic item generation (AIG), n—a process of computationally generating multiple forms of an item.
2.1.2 candidate, n—someone who is eligible to be evaluated through the use of the performance test; a person who is or will
be taking the test.
2.1.3 construct validity, n—degree to which the test evaluates an underlying theoretical idea resulting from the orderly
arrangement of facts.
2.1.4 differential system responsiveness, n—measurable difference in response latency between two systems.
2.1.5 examinee, n—candidate in the process of taking a test.
2.1.6 gating item, n—unit of evaluation that shall be passed to pass a test.
2.1.7 inter-rater reliability, n—measurement of rater consistency with other raters.
2.1.7.1 Discussion—
See rater reliability.
2.1.8 item, n—scored response unit.
2.1.8.1 Discussion—
See task.
1
This practice is under the jurisdiction of ASTM Committee E36 on Accreditation & Certification and is the direct responsibility of Subcommittee E36.80 on Personnel
Performance Testing and Assessment.
Current edition approved Dec. 1, 2013Nov. 1, 2018. Published December 2013November 2018. Originally approved in 2013. Last previous edition approved in 2013 as
E2849 – 13. DOI: 10.1520/E2849-13.10.1520/E2849-18.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1
---------------------- Page: 1 ----------------------
E2849 − 18
2.1.9 item observer, n—human or computer element that observes and records a candidate’s performance on a specific item.
2.1.10 on the job, n—another term for “target context.”
2.1.10.1 Discussion—
See target context.
2.1.11 performance test, n—examination in which the response modality mimics or reflects the response modality required in
the target context.
2.1.12 power test, n—examination in which virtually all candidates have time to complete all items.
2.1.13 practitioners, n—people who practice the contents of the test in the target context.
2.1.14 rater reliability, n—measurement of rater consistency with a uniform standard.
2.1.14.1 Discussion—
See inter-rater reliability.
2.1.15 reconfiguration, n—modification of the user interface for a process, device, or software application.
2.1.15.1 Discussion—
Reconfiguration ranges from adjusting the seat in a crane to importing a set of macros into a programming environment.
2.1.16 reliability, n—degree to which the test will make the same prediction with the same examinee on another occasion with
no training occurring during the intervening interval.
2.1.17 rubric, n—set of rules by
...
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2849 − 18 An American National Standard
Standard Practice for
1
Professional Certification Performance Testing
This standard is issued under the fixed designation E2849; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 2.1.3 construct validity, n—degree to which the test evalu-
ates an underlying theoretical idea resulting from the orderly
1.1 This practice covers both the professional certification
arrangement of facts.
performance test itself and specific aspects of the process that
2.1.4 differential system responsiveness, n—measurable dif-
produced it.
ference in response latency between two systems.
1.2 This practice does not include management systems. In
2.1.5 examinee, n—candidate in the process of taking a test.
this practice, the test itself and its administration, psychometric
properties, and scoring are addressed.
2.1.6 gating item, n—unit of evaluation that shall be passed
to pass a test.
1.3 This practice primarily addresses individual profes-
sional performance certification examinations, although it may 2.1.7 inter-rater reliability, n—measurement of rater consis-
be used to evaluate exams used in training, educational, and
tency with other raters.
aptitude contexts. This practice is not intended to address 2.1.7.1 Discussion—See rater reliability.
on-site evaluation of workers by supervisors for competence to
2.1.8 item, n—scored response unit.
perform tasks.
2.1.8.1 Discussion—See task.
1.4 This standard does not purport to address all of the
2.1.9 item observer, n—human or computer element that
safety concerns, if any, associated with its use. It is the
observes and records a candidate’s performance on a specific
responsibility of the user of this standard to establish appro-
item.
priate safety, health, and environmental practices and deter-
2.1.10 on the job, n—another term for “target context.”
mine the applicability of regulatory limitations prior to use.
2.1.10.1 Discussion—See target context.
1.5 This international standard was developed in accor-
2.1.11 performance test, n—examination in which the re-
dance with internationally recognized principles on standard-
sponse modality mimics or reflects the response modality
ization established in the Decision on Principles for the
required in the target context.
Development of International Standards, Guides and Recom-
mendations issued by the World Trade Organization Technical
2.1.12 power test, n—examination in which virtually all
Barriers to Trade (TBT) Committee.
candidates have time to complete all items.
2.1.13 practitioners, n—people who practice the contents of
2. Terminology
the test in the target context.
2.1 Definitions—Some of the terms defined in this section
2.1.14 rater reliability, n—measurement of rater consistency
are unique to the performance testing context. Consequently,
with a uniform standard.
terms defined in other standards may vary slightly from those
2.1.14.1 Discussion—See inter-rater reliability.
defined in the following.
2.1.15 reconfiguration, n—modification of the user interface
2.1.1 automatic item generation (AIG), n—a process of
for a process, device, or software application.
computationally generating multiple forms of an item.
2.1.15.1 Discussion—Reconfiguration ranges from adjust-
2.1.2 candidate, n—someone who is eligible to be evaluated
ing the seat in a crane to importing a set of macros into a
through the use of the performance test; a person who is or will
programming environment.
be taking the test.
2.1.16 reliability, n—degree to which the test will make the
same prediction with the same examinee on another occasion
with no training occurring during the intervening interval.
1
This practice is under the jurisdiction of ASTM Committee E36 on Accredi-
2.1.17 rubric, n—set of rules by which performance will be
tation & Certification and is the direct responsibility of Subcommittee E36.30 on
Personnel Certificate Programs. judged.
Current edition approved Nov. 1, 2018. Published November 2018. Originally
2.1.18 speeded test, n—examination that is time-constrained
approved in 2013. Last previous edition approved in 2013 as E2849 – 13. DOI:
10.1520/E2849-18. so that more than 10 % of candidates do not finish all items.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1
---------------------- Page: 1 ----------------------
E2849 − 18
2.1.19 target context, n—situation within which a test is 4.2.2 The examinee shall not be provided so much informa-
designed to predict performance. tion about the scor
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.