Adaptive Testing

One potential use for IRT in QoL research is the development of "tailored" or adaptive tests. If a group of patients are known to be severely limited in their physical ability, it may be felt unnecessary to ask them many questions relating to difficult or strenuous tasks. Conversely, other patients may be fit and healthy, and it is less relevant to ask them detailed questions about easy tasks. Therefore specific variants of a questionnaire may be used for different subgroups of patients. Logistic IRT modelling provides the means by which these variants can be standardised, so that they all relate to different segments within one ability scale.

One extension of this approach that has been adopted in some fields of research is to adapt the questions dynamically in the light of respondents' previous replies. Nunnally and Bernstein (1994) describe the use of computer-assisted questionnaires, in which questions of appropriate difficulty are selected on the basis of earlier responses. This can result in more precise grading of ability, whilst at the same time reducing the number of questions each person needs to answer.

Example from the literature

Fisher (1993) illustrated an Assessment of Motor and Process Skills (AMPS) instrument with more than 50 tasks. This assesses the ability of persons to perform activities of daily living. Rasch models were used to calibrate the tasks in terms of their relative difficulty with respect to the motor and process skills.

This enabled future assessments to be made using only a few tasks. The person being evaluated is provided with the opportunity to choose and perform two or three familiar tasks, out of the total set of more than 50. They are then rated on 15 motor skills and 20 process skills. Since the task difficulties are known, the person-ability measures could be adjusted to account for the differing challenges of the tasks.

Another application for tailored tests relates to DIF. In some scales it may be difficult to avoid item bias, and the investigators may even decide deliberately to include DIF items. As a rather contrived example, suppose an instrument is required for use in a clinical trial that will be entering patients aged from 10 to 70. It might be desired to obtain a single indicator of physical function even though it can be argued that "good physical functioning" will take on a different meaning for children as opposed to adults. In such a situation the investigator might have one question for adults about going to work, a different question for children about going to school, and possibly other questions aimed at other subgroups of patients such as the retired. Then each question would be relevant only for its own target subgroup, and would function differently for other patients. The results might be analysed by converting the individually targeted questions into the equivalent of the single compound question. Of course in such a simple example one could in principle have a compound question instead; for example: "Do you have trouble going to school/to work/doing housework/performing retirement activities?" However, this could become confusing and easily misunderstood.

We are not aware of examples from QoL literature where instruments target several subgroups of patients using questions that are specific to individual subgroups. However, paediatric assessment is one area in which this would be relevant, with different sets of questions applicable to different age groups and a standardised score providing a measure of QoL irrespective of age.

