ASTM F3516-22
(Guide)Standard Guide for Testing Interpreting Performance
Standard Guide for Testing Interpreting Performance
SIGNIFICANCE AND USE
4.1 Intended Use:
4.1.1 This guide is intended to assist in the design or evaluation of screening and interpreting tests, or both.
4.1.2 This guide also satisfies the need for testing interpreting performance identified in other relevant ASTM standards (see Practice F2889 and Practice F2089).
4.2 Compliance with the Guide:
4.2.1 Compliance requires the user to identify which sections of this guide apply to their specific use and circumstances. The decision to not adhere to any sections should be fully explained.
SCOPE
1.1 Purpose:
1.1.1 This guide describes factors to be considered for the development and use of language interpreting performance tests, referencing the Interagency Language Roundtable (ILR) scale. It is intended to help people commission, develop, or select assessment tools for the evaluation of interpreting skills.
1.1.2 The purpose of any test developed following this guide is to rate a candidate's interpreting skills according to the Interagency Language Roundtable Skill Level Descriptions for Interpreting Performance (ILR SLDs for Interpreting). Any pass/fail rating assigned should reference the specific ILR level at which the candidate has tested.
1.1.3 The objectives for all tests should be clearly defined and convincing evidence presented to justify any claims, inferences, and decisions.
1.1.4 This guide focuses on two types of assessment; one is for screening candidates, and the other is for evaluating actual interpreting skills. It also outlines the appropriate characteristics and uses of each.
1.1.5 When evaluating actual interpreting skills, it should be noted that according to ILR, it is at the Professional Performance Level 3 that all necessary skills align to enable a reasonably accurate, reliable, and trustworthy interpretation.
1.2 Limitations:
1.2.1 This guide is not intended to address test development for use in the following areas:
1.2.1.1 Translation,
1.2.1.2 Audio Translation,
1.2.1.3 Transcription/Translation,
1.2.1.4 Diagnostic Assessments,
1.2.1.5 Less-commonly tested languages, and
1.2.1.6 Other job-specific language performance tests.
1.2.2 This guide also does not purport to prescribe definitive descriptions of every possible approach for testing interpreting performance, nor does it prescribe the exact parameters that must be used in a valid and reliable test of interpreting skills. It does, however, suggest approaches to help test designers and users determine whether the use of a test is appropriate and justifiable.
1.2.3 This guide is not intended to address ancillary processes and procedures governing how organizations provide interpreting services.
1.3 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
General Information
- Status
- Published
- Publication Date
- 31-Mar-2022
- Technical Committee
- F43 - Language Services and Products
- Drafting Committee
- F43.04 - Language Testing
Relations
- Effective Date
- 01-Apr-2020
- Effective Date
- 15-Mar-2015
- Effective Date
- 01-May-2011
- Effective Date
- 01-Apr-2007
- Effective Date
- 10-Mar-2001
Overview
ASTM F3516-22: Standard Guide for Testing Interpreting Performance is a comprehensive guide developed by ASTM International. This standard provides essential guidance for designing, developing, selecting, and evaluating language interpreting performance assessments. It is intended to support organizations and test developers in the creation and validation of interpreting performance tests that align with recognized proficiency scales, specifically referencing the Interagency Language Roundtable (ILR) Skill Level Descriptions for Interpreting Performance.
Targeting both screening and in-depth assessment applications, ASTM F3516-22 outlines factors to ensure reliable, valid, and ethical measurement of interpreting skills. The guide is particularly useful for agencies commissioning assessments, test developers, and decision-makers responsible for personnel qualification or hiring.
Key Topics
Purpose and Scope
- Assists in developing or evaluating interpreting and screening tests for spoken and signed languages.
- Centers on the ILR proficiency scale for measuring and assigning candidate interpreting skill levels.
- Clearly distinguishes between screening assessments (to identify baseline capability) and comprehensive interpreting performance assessments.
Test Development and Design
- Calls for a structured needs analysis involving stakeholders to tailor tests to organizational and client requirements.
- Specifies the documentation and planning needed to develop, administer, and refresh tests.
- Emphasizes quality assurance (QA) and quality control (QC) throughout the test lifecycle.
Reliability and Validity
- Details the importance of test reliability and validity, ensuring consistent and appropriate skill measurement.
- Encourages the use of both holistic and analytic scoring methods, referencing ILR guidelines.
Ethical and Practical Considerations
- Stresses the importance of ethical testing practices and the protection of examinees’ rights.
- Recommends documentation to support transparency and justifiability of test use, interpretation, and outcomes.
Applications
ASTM F3516-22 is applicable in a variety of settings where interpreting proficiency testing is required:
Government Agencies and International Organizations
- To evaluate interpreting personnel for diplomatic, security, or immigration services.
- To set consistent standards for multilingual communication.
Healthcare and Legal Services
- For qualification and certification of interpreters tasked with critical communications.
Educational Institutions
- To assess student progress or validate readiness for professional interpreting roles.
Language Service Providers
- To ensure interpreters meet client and contractual requirements through standardized performance metrics.
Talent Management and HR
- To inform hiring, placement, and advancement decisions for interpreting professionals.
By adhering to ASTM F3516-22, organizations can demonstrate commitment to internationally recognized best practices in interpreter assessment, promoting confidence in the validity and fairness of their evaluation tools.
Related Standards
- ASTM F2889 - Practice for Assessing Language Proficiency
- ASTM F2089 - Practice for Language Interpreting
- ILR Skill Level Descriptions for Interpreting - Reference scale for skill assignment
These related standards collectively form a robust framework for language proficiency testing, interpreter qualification, and performance assessment across industries.
Keywords: interpreting performance, interpreter testing, language proficiency, ILR scale, language assessment, ASTM F3516-22, interpreting skills, interpreter screening, test development, quality assurance in interpreting tests
Buy Documents
ASTM F3516-22 - Standard Guide for Testing Interpreting Performance
Frequently Asked Questions
ASTM F3516-22 is a guide published by ASTM International. Its full title is "Standard Guide for Testing Interpreting Performance". This standard covers: SIGNIFICANCE AND USE 4.1 Intended Use: 4.1.1 This guide is intended to assist in the design or evaluation of screening and interpreting tests, or both. 4.1.2 This guide also satisfies the need for testing interpreting performance identified in other relevant ASTM standards (see Practice F2889 and Practice F2089). 4.2 Compliance with the Guide: 4.2.1 Compliance requires the user to identify which sections of this guide apply to their specific use and circumstances. The decision to not adhere to any sections should be fully explained. SCOPE 1.1 Purpose: 1.1.1 This guide describes factors to be considered for the development and use of language interpreting performance tests, referencing the Interagency Language Roundtable (ILR) scale. It is intended to help people commission, develop, or select assessment tools for the evaluation of interpreting skills. 1.1.2 The purpose of any test developed following this guide is to rate a candidate's interpreting skills according to the Interagency Language Roundtable Skill Level Descriptions for Interpreting Performance (ILR SLDs for Interpreting). Any pass/fail rating assigned should reference the specific ILR level at which the candidate has tested. 1.1.3 The objectives for all tests should be clearly defined and convincing evidence presented to justify any claims, inferences, and decisions. 1.1.4 This guide focuses on two types of assessment; one is for screening candidates, and the other is for evaluating actual interpreting skills. It also outlines the appropriate characteristics and uses of each. 1.1.5 When evaluating actual interpreting skills, it should be noted that according to ILR, it is at the Professional Performance Level 3 that all necessary skills align to enable a reasonably accurate, reliable, and trustworthy interpretation. 1.2 Limitations: 1.2.1 This guide is not intended to address test development for use in the following areas: 1.2.1.1 Translation, 1.2.1.2 Audio Translation, 1.2.1.3 Transcription/Translation, 1.2.1.4 Diagnostic Assessments, 1.2.1.5 Less-commonly tested languages, and 1.2.1.6 Other job-specific language performance tests. 1.2.2 This guide also does not purport to prescribe definitive descriptions of every possible approach for testing interpreting performance, nor does it prescribe the exact parameters that must be used in a valid and reliable test of interpreting skills. It does, however, suggest approaches to help test designers and users determine whether the use of a test is appropriate and justifiable. 1.2.3 This guide is not intended to address ancillary processes and procedures governing how organizations provide interpreting services. 1.3 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
SIGNIFICANCE AND USE 4.1 Intended Use: 4.1.1 This guide is intended to assist in the design or evaluation of screening and interpreting tests, or both. 4.1.2 This guide also satisfies the need for testing interpreting performance identified in other relevant ASTM standards (see Practice F2889 and Practice F2089). 4.2 Compliance with the Guide: 4.2.1 Compliance requires the user to identify which sections of this guide apply to their specific use and circumstances. The decision to not adhere to any sections should be fully explained. SCOPE 1.1 Purpose: 1.1.1 This guide describes factors to be considered for the development and use of language interpreting performance tests, referencing the Interagency Language Roundtable (ILR) scale. It is intended to help people commission, develop, or select assessment tools for the evaluation of interpreting skills. 1.1.2 The purpose of any test developed following this guide is to rate a candidate's interpreting skills according to the Interagency Language Roundtable Skill Level Descriptions for Interpreting Performance (ILR SLDs for Interpreting). Any pass/fail rating assigned should reference the specific ILR level at which the candidate has tested. 1.1.3 The objectives for all tests should be clearly defined and convincing evidence presented to justify any claims, inferences, and decisions. 1.1.4 This guide focuses on two types of assessment; one is for screening candidates, and the other is for evaluating actual interpreting skills. It also outlines the appropriate characteristics and uses of each. 1.1.5 When evaluating actual interpreting skills, it should be noted that according to ILR, it is at the Professional Performance Level 3 that all necessary skills align to enable a reasonably accurate, reliable, and trustworthy interpretation. 1.2 Limitations: 1.2.1 This guide is not intended to address test development for use in the following areas: 1.2.1.1 Translation, 1.2.1.2 Audio Translation, 1.2.1.3 Transcription/Translation, 1.2.1.4 Diagnostic Assessments, 1.2.1.5 Less-commonly tested languages, and 1.2.1.6 Other job-specific language performance tests. 1.2.2 This guide also does not purport to prescribe definitive descriptions of every possible approach for testing interpreting performance, nor does it prescribe the exact parameters that must be used in a valid and reliable test of interpreting skills. It does, however, suggest approaches to help test designers and users determine whether the use of a test is appropriate and justifiable. 1.2.3 This guide is not intended to address ancillary processes and procedures governing how organizations provide interpreting services. 1.3 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
ASTM F3516-22 is classified under the following ICS (International Classification for Standards) categories: 01.020 - Terminology (principles and coordination). The ICS classification helps identify the subject area and facilitates finding related standards.
ASTM F3516-22 has the following relationships with other standards: It is inter standard links to ASTM F2889-11(2020), ASTM F2089-15, ASTM F2889-11, ASTM F2089-01(2007), ASTM F2089-01. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
ASTM F3516-22 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: F3516 − 22
Standard Guide for
Testing Interpreting Performance
This standard is issued under the fixed designation F3516; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope It does, however, suggest approaches to help test designers and
users determine whether the use of a test is appropriate and
1.1 Purpose:
justifiable.
1.1.1 This guide describes factors to be considered for the
1.2.3 This guide is not intended to address ancillary pro-
development and use of language interpreting performance
cesses and procedures governing how organizations provide
tests, referencing the Interagency Language Roundtable (ILR)
interpreting services.
scale. It is intended to help people commission, develop, or
1.3 This international standard was developed in accor-
select assessment tools for the evaluation of interpreting skills.
dance with internationally recognized principles on standard-
1.1.2 The purpose of any test developed following this
ization established in the Decision on Principles for the
guide is to rate a candidate’s interpreting skills according to the
Development of International Standards, Guides and Recom-
Interagency Language Roundtable Skill Level Descriptions for
mendations issued by the World Trade Organization Technical
Interpreting Performance (ILR SLDs for Interpreting). Any
Barriers to Trade (TBT) Committee.
pass/fail rating assigned should reference the specific ILR level
at which the candidate has tested.
2. Referenced Documents
1.1.3 The objectives for all tests should be clearly defined
2.1 ASTM Standards:
and convincing evidence presented to justify any claims,
F2089 Practice for Language Interpreting
inferences, and decisions.
F2889 Practice for Assessing Language Proficiency
1.1.4 This guide focuses on two types of assessment; one is
for screening candidates, and the other is for evaluating actual
3. Terminology
interpreting skills. It also outlines the appropriate characteris-
3.1 Definitions:
tics and uses of each.
3.1.1 adaptive tests, n—tests in which the selection of the
1.1.5 Whenevaluatingactualinterpretingskills,itshouldbe
next item depends upon the rating assigned to previously taken
noted that according to ILR, it is at the Professional Perfor-
items.
mance Level 3 that all necessary skills align to enable a
3.1.1.1 Discussion—In computer-adaptive tests, for
reasonably accurate, reliable, and trustworthy interpretation.
example,candidateswhodonotshowmasteryatonelevelmay
1.2 Limitations:
not be asked to respond to higher-level prompts, but may be
1.2.1 This guide is not intended to address test development
given lower ones to determine their ability. In human-delivered
for use in the following areas:
adaptive tests (such as Oral Proficiency interviews), testers
1.2.1.1 Translation,
select the next prompt based upon how well or badly they
1.2.1.2 Audio Translation,
believe the candidate handles a previous prompt.
1.2.1.3 Transcription/Translation,
3.1.2 analytic scoring, n—results in the assignment of
1.2.1.4 Diagnostic Assessments,
particular values to each individual element of the candidate’s
1.2.1.5 Less-commonly tested languages, and
performance, which may or may not result in a final overall
1.2.1.6 Other job-specific language performance tests.
rating; analytic rating breaks down performance into discrete
1.2.2 This guide also does not purport to prescribe definitive
features and assigns separate ratings or values to each.
descriptions of every possible approach for testing interpreting
3.1.3 decision-tree guidelines, n—describe the paths a can-
performance, nor does it prescribe the exact parameters that
must be used in a valid and reliable test of interpreting skills. didate may take through an adaptive test; they suggest which
items should be delivered next, based on measurements of
current performance.
This guide is under the jurisdiction of ASTM Committee F43 on Language
Services and Products and is the direct responsibility of Subcommittee F43.04 on For referenced ASTM standards, visit the ASTM website, www.astm.org, or
Language Testing. contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Current edition approved April 1, 2022. Published April 2022. DOI: 10.1520/ Standards volume information, refer to the standard’s Document Summary page on
F3516-22. the ASTM website.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
F3516 − 22
3.1.4 holistic scoring, n—requires raters to assign a rating which the evidence (arguments) supports the conclusions,
based on the overall quality of the candidate’s performance, interpretations, uses, and inferences of test scores. A validity
based upon a set of criteria describing typical performance at a argument demonstrates the appropriateness and defensibility of
particular level; it results in a final rating which does not a test’s conclusions, interpretations, and inferences for a
provide individualized feedback on the discrete elements of specific use in a given situation.The validity argument is based
performance; contrast with analytic scoring. on the fact that a test is developed for specific uses and users
and includes, but is not limited to, a description and justifica-
3.1.5 interpreting, n—the process of first fully
tion of test uses, effects, audiences, and content. Different
understanding, analyzing, and processing a spoken or signed
statistical procedures can be applied to estimate the validity of
message and then faithfully rendering it into another spoken or
a test. Such procedures generally seek to determine what the
signed language.
test measures, and how well it does so. The rigor and strength
3.1.5.1 Discussion—Interpreting is different from transla-
of the validity argument should increase as the stakes associ-
tion which results in the creation of a written target text.
ated with the test (consequences for the individual or
3.1.6 language proficiency, n—the degree of skill with
organization, or both) increase.
which a person can use a language for communicative pur-
poses.
4. Significance and Use
3.1.6.1 Discussion—Language proficiency encompasses a
4.1 Intended Use:
person’s ability to read, write, speak, or understand a language.
4.1.1 This guide is intended to assist in the design or
3.1.7 performance, n—the ability of candidates to perform
evaluation of screening and interpreting tests, or both.
particular tasks, usually associated with job or study require-
4.1.2 This guide also satisfies the need for testing interpret-
ments.
ing performance identified in other relevant ASTM standards
3.1.8 quality assurance, n—the process of ensuring that the
(see Practice F2889 and Practice F2089).
test planning and development phases are executed properly
4.2 Compliance with the Guide:
and satisfy the needs of all stakeholders.
4.2.1 Compliance requires the user to identify which sec-
3.1.8.1 Discussion—Quality Assurance (QA) applies when
tions of this guide apply to their specific use and circum-
(1) a new test is being created, (2) an existing test is being
stances. The decision to not adhere to any sections should be
repurposed or revised, or (3) new personnel is being trained to
fully explained.
develop or administer a test, the latter in accordance with
uniformly acceptable standards.
5. Overarching Considerations
3.1.9 quality control, n—the system of post-development
5.1 This guide combines expertise from the fields of lan-
evaluations used at the point of product acceptance and
guagetestingandinterpretinganddescribesbestpracticesfrom
following product use to determine whether the test and testing
each.
practices implemented by an organization continue to meet and
5.2 Test Purpose:
adheretoallestablishedstandardsandrelevanttestingpolicies;
5.2.1 An interpreting performance test developed in accor-
Quality Control (QC) is part of the test maintenance process.
dance with this guide should place candidates within the range
3.1.9.1 Discussion—Quality Control (QC) is used at the
of interpreting performance described in the ILR Skill Level
point of product acceptance and any time after product use. QC
Descriptions for Interpreting Performance.
verifies the continued validity and reliability of the test and
demonstrates that the test is being used in an appropriate
5.3 Reliability (See also 3.1.10):
manner on an ongoing basis.
5.3.1 Without measurement consistency, decisions based on
test scores or ratings may be incorrect. Any assessment
3.1.10 reliability, n—the consistency with which a test
developed should include an explanation of how reliability will
measures a skill or activity throughout the life of the test or the
be ensured.
degree to which it does so without deviation each time it is
used (repeatability).
5.4 Validity (See also 3.1.11):
3.1.10.1 Discussion—Consistency is the essential idea in
5.4.1 Atest is considered valid to the extent that it measures
classical reliability. Reliability is defined as the extent to which
what it is intended to measure.Ascreening test should measure
separate measurements (items, scales, test administrations, and
whether a candidate possesses some or all of the prerequisite
interviews) yield comparable results under the same or similar
abilities required of interpreters. It is considered valid if it
conditions. Test items measuring the same construct should
effectively excludes candidates who do not possess the inter-
yield similar results when administered to the same group of
preting skills required for the interpreting assessment.
test-takers under comparable testing situations. Simply put,
5.4.2 An interpreting test is considered valid if it measures
reliability is the extent to which an item, scale, procedure, or
the interpreting ability of a candidate accurately. It can be
test will yield the same value when administered under similar
developed for use in a specific area of interpreting or it can be
or dissimilar conditions
intended for more general use.
3.1.11 validity, n—the degree to which a test measures what 5.4.3 It is important that the test be used in a manner
it is intended to measure or can be used to successfully achieve consistent with what it actually measures. For example, it may
its ultimate purpose.
not be valid to use a test designed to assess interpreting ability
3.1.11.1 Discussion—Validity is a judgment of the degree to in the medical domain to infer ability in the legal domain.Any
F3516 − 22
validity argument should be rigorous enough to justify the 6.1.1 Based on the results of the Needs Analysis, tests can
decisions made on the basis of test ratings and the potential be used to measure general language or be domain-specific.
consequences of those decisions.
6.1.2 Screening tests are easier to administer and may prove
cost-effectivebyeliminatingcandidateswithlittleornochance
5.5 Practicality:
of attaining the desired level on the ILR scale for interpreting
5.5.1 The development of valid, reliable tests requires that
performance.
resourcesbeallocatedforthedevelopment,administration,and
6.1.3 The purpose of the screening tests is to identify
periodic evaluation and improvement of the assessment. Nec-
individuals who are unlikely to perform well on interpreting
essary resources may include the following:
performance tests. The tests can be used to assess the source or
5.5.1.1 Personnel to develop, administer, rate/score and
the target languages, or both. While language proficiency is a
report results, ensure security, and provide ongoing improve-
prerequisite, it is not enough to ensure a successful interpreting
ment;
performance.
5.5.1.2 Funding for assessment development, the
6.1.4 Regardless of the nature of the screening test, it is
compensation, training, and maintenance of raters and
critical that empirical evidence be provided demonstrating that
administrators, ongoing improvements, and operations and
the screening test is an effective indicator of how well a
security management; and
candidate will perform on an Interpreting PerformanceAssess-
5.5.1.3 Sufficient time to plan and execute test development
ment.
and maintenance processes.
6.1.5 While the method of test delivery is largely irrelevant
5.5.2 During the test design and development phases, it is
as long as it does not affect test validity, a written test format
often necessary to make tradeoffs between the validity and
for the screening test can be justified for practical consider-
reliability of a test, and the practical constraints of time, money
ations.
and other resources. In such cases, it is important to recognize
6.1.6 An Interpreting Performance Test should require that
the extent to which validity or reliability, or both, may be
candidates demonstrate that they can interpret effectively in the
compromised.
interpreting mode required.
5.6 Technical Documentation:
6.2 Screening Assessments:
5.6.1 Technical documentation covering the entire test life-
6.2.1 A screening test measures whether or not a candidate
cycle includes, but is not limited to, the following:
possesses some of the prerequisite skills required of interpret-
5.6.1.1 Needs Analysis,
ers. Ideally, it may test language proficiency using the ILR
5.6.1.2 Test Specifications,
Skill Level Descriptions for Proficiency in the following:
(1) Test use,
6.2.1.1 Speaking,
(2) Test design, and
6.2.1.2 Listening comprehension,
(3) Test scoring/rating.
6.2.1.3 Reading comprehension,
5.6.1.3 Test Validation,
6.2.1.4 Writing,
5.6.1.4 Test Administration,
6.2.1.5 American Sign Language (ASL) comprehension,
5.6.1.5 Test Security, and
and
5.6.1.6 Test Refreshment.
6.2.1.6 American Sign Language (ASL) production.
5.6.2 Documentation should serve to assure interested par-
6.2.2 There are additional elements which may be assessed:
ties of the applicability and rigor of the approach, processes,
6.2.2.1 Written translations,
methodologies, findings, decisions, and deliverables at each
6.2.2.2 Grammar and vocabulary,
stage of the lifecycle.
6.2.2.3 Specialized terminology,
5.7 Ethics:
6.2.2.4 Interpreting protocols,
5.7.1 This guide addresses the ethical considerations that
6.2.2.5 Ethics, and
must be part of any assessment of interpreting performance in
6.2.2.6 Situational decision-making.
keeping with good testing practice. Several organizations have
6.3 Interpreting Assessments:
created ethical codes of practice designed to safeguard the
6.3.1 An interpreting assessment measures the candidate’s
rights of test takers by focusing on professional test
development, administration, and rating practices. These in- integrated ability to interpret, conveying meaning and exhibit-
ing the conduct appropriate to the level(s) being tested.
struments can also serve as guides to ethical behavior in
interpreting performance testing. Depending on the interpreting mode being assessed, the test
should evaluate both receptive (listening, reading, or ASL
5.7.2 The development and use of an interpreting test entail
comprehension) and productive skills (speaking or signing).
ethical responsibilities for contracting agencies, testing
organizations, test developers, and test users who must con- 6.3.2 One or multiple modes of interpreting (simultaneous
interpreting, consecutive interpreting, and sight translation)
sider the implications of their own actions as well as those of
others during all phases of testing. may be tested, either unidirectionally or bidirectionally.
6.4 Test Planning Requirements:
6. Test Planning
6.4.1 Prior to test development, a series of planning steps
6.1 Test Types: should be considered to produce a document which will be
F3516 − 22
used to guide the development and use of an assessment. It (1) Scoring specifications explaining in detail how both
would include the following elements: raw and scaled scores are generated (as applicable), and how
cut scores are set and interpreted;
6.4.1.1 Needs Analysis,
(2) Partial credit scoring models and criteria for evaluating
6.4.1.2 Test Specifications,
and rating constructed responses by human raters should be
(1) Test use,
described in detail (as applicable);
(2) Test design, and
(3) Rating specifications should include explanations of
(3) Test scoring/rating.
howratersaretrainedandtheratingscalebeingusedforrating;
6.4.1.3 Test Validation,
(4) Any key used to assist in the generation of scores or
6.4.1.4 Test Administration,
ratings should be described in detail; and
6.4.1.5 Test Security, and
(5) Individual testing and reporting of each modality.
6.4.1.6 Test Refreshment.
6.7 Test Validation:
6.5 Needs Analysis:
6.7.1 A test is valid if it tests what it purports to test.
6.5.1 The development, commissioning, or selection of an
Accordingly, care should be taken that the tests asks candidates
interpreting test should be based on the needs of the organiza-
to perform authentic tasks which closely mimic the types of
tion commissioning or selecting the test. To ensure that the test
interpretations they will have to perform in the real world.
is appropriate for its intended use, the organization should
6.7.2 Ideally, test validation is performed by an independent
perform a Needs Analysis
...




Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...