ABSTRACT
Intelligence tests have a long tradition across different fields, yet little attention has been paid to how their measurement properties may change over time and impact selection outcomes and intelligence trends. This study illustrates these gaps and consequences, using a synonym test from the Swedish Enlistment Battery (SEB) as an example. The test aims to measure lexical ability, a narrow verbal ability. First, the measurement properties of the synonym test across years (2014, 2018, and 2022) and between subgroups were evaluated using Rasch analysis. Secondly, the implications for changes in the synonym test's measurement properties were evaluated for basic military training in Sweden. Misfitting items and invariance issues were identified. With the removal of misfitting items, improved measurement properties were achieved, but this did not significantly impact the possibilities of identifying group differences. However, the lexical ability of some test takers at the lower end of the scale was likely overestimated, whereas it was likely underestimated at the upper end of the scale if all original items were used. These findings underscore the importance of ensuring unbiased measurement properties and invariance over time, highlighting the need for regular evaluations and refinements of intelligence tests to maintain fairness and accuracy.