Educational Methods & Psychometrics (EMP)

ISSN: 2943-873X

Svend Kreiner"/> Marianne Mülle"/> Tine Nielsen

"/>

Abstract


Svend Kreiner, Marianne Mülle, & Tine Nielsen
Keywords: Criterion-referenced interpretation, scale anchoring, IRT and Rasch models, Log-linear Rasch models, PIRLS

ABSTRACT

Measurement by IRT and Rasch models is quantitative and interval-scaled, which is useful for statistical applications studying associations between educational test results and other variables. Quantitative measurement is also useful for ranking of students, but rarely provides information that is useful during formative classroom testing, since it is difficult to interpret quantitative test results in terms of what students can and cannot do. It has been suggested that information of that kind needs so-called criterion-referenced tests with items that are different from the kind of items of conventional educational tests. This paper disagrees. It argues that formative classroom testing needs interpretation of test results from conventional tests and describes how to provide and validate criterion-referenced interpretation by analysis of estimates of so-called scale-anchored probabilities of responses to items. Many studies, including the study of Progress in International Reading Literacy (PIRLS), interpret test results by scale anchoring, but interpretations are not criterion-referenced. This paper describes the assumptions and requirements of criterion-referenced interpretation and illustrates it on data from PIRLS.

PUBLISHED

30-05-2026

ISSUE

Vol. 4,2026

SECTION

Research Article