Jennifer Petrillo, Stefan J. Cano, Lori D. McLeod, Cheryl D. Coon
Value in Health. 2015.18(1):25–34.
To provide comparisons and a worked example of item- and scale-level evaluations based on three psychometric methods used in patient-reported outcome development—classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)—in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25).
Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison.
Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories.
Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results.
Ethan Basch, John Spertus, R. Adams Dudley, Albert Wu, Cynthia Chuahan, Perry Cohen, Mary Lou Smith, Nick Black
Value in Health. 2015.18(4):493–504.
To recommend methods for assessing quality of care via patient-reported outcome-based performance measures (PRO-PMs) of symptoms, functional status, and quality of life.
A Technical Expert Panel was assembled by the American Medical Association–convened Physician Consortium for Performance Improvement. An environmental scan and structured literature review were conducted to identify quality programs that integrate PRO-PMs. Key methodological considerations in the design, implementation, and analysis of these PRO-PM data were systematically identified. Recommended methods for addressing each identified consideration were developed on the basis of published patient-reported outcome (PRO) standards and refined through public comment. Literature review focused on programs using PROs to assess performance and on PRO guidance documents.
Thirteen PRO programs and 10 guidance documents were identified. Nine best practices were developed, including the following: provide a rationale for measuring the outcome and for using a PRO-PM; describe the context of use; select a measure that is meaningful to patients with adequate psychometric properties; provide evidence of the measure's sensitivity to differences in care; address missing data and risk adjustment; and provide a framework for implementation, interpretation, dissemination, and continuous refinement.
Methods for integrating PROs into performance measurement are available.
Margaret Rothman, Ari Gnanaskathy, Paul Wicks, Elektra J. Papadopoulos
Value in Health. 2015.18(1):1–4.
We report a panel designed to open a dialog between pharmaceutical sponsors, regulatory reviewers, and other stakeholders regarding the use of social media to collect data to support the content validity of patient-reported outcome instruments in the context of medical product labeling. Multiple stakeholder perspectives were brought together to better understand the issues encountered in pursuing social media as a form of data collection to support content validity. Presenters represented a pharmaceutical sponsor of clinical trials, a regulatory reviewer from the Food and Drug Administration, and an online data platform provider. Each presenter shared its perspective on the advantages and disadvantages of using social media to collect this type of information. There was consensus that there is great potential for using social media for this purpose. There remain, however, unanswered questions that need to be addressed such as identifying which type of social media is most appropriate for data collection and ensuring that participants are representative of the target population while maintaining the advantages of anonymity provided by online platforms. The use of social media to collect evidence of content validity holds much promise. Clarification of issues that need to be addressed and accumulation of empirical evidence to address these questions are essential to moving forward.
Astrid Janssens, Jo Thompson Coon, Morwenna Rogers, Karen Allen, Colin Green, Crispin Jenkinson, Alan Tennant, Stuart Logan
Value in Health. 2015.18(2):315–333.
To identify generic, multidimensional patient-reported outcome measures (PROMs) for children up to 18 years old and describe their characteristics and content assessed using the International Classification of Functioning, Disability and Health Children and Youth version (ICF-CY).
The search strategy, developed by an information specialist, included four groups of terms related to "measure," "health," "children and young people," and "psychometric performance." The search was limited to publications from 1992. Five electronic databases and two online-specific PROM databases were searched. Two groups of reviewers independently screened all abstracts for eligible PROMs. Descriptive characteristics of the eligible PROMs were collected, and items and domains of each questionnaire were mapped onto the ICF-CY chapters.
We identified 35 PROMs, of which 29 were generic PROMs and 6 were preference-based measures. Many PROMs cover a range of aspects of health; however, social functioning is represented most often. Content covered differs both in which aspects of health are assessed and whether individual questions focus on functioning (what the subject can or does do) and/or well-being (how the subject feels about a certain aspect of his or her health).
A broad variety of PROMs is available to assess children's health. Nevertheless, only a few PROMs can be used across all age ranges to 18 years. When mapping their content on the ICF-CY, it seems that most PROMs exclude at least one major domain, and all conflate aspects of functioning and well-being in the scales.
Astrid Janssens, Morwenna Rogers, Jo Thompson Coon, Karen Allen, Colin Green, Crispin Jenkinson, Alan Tennant, Stuart Logan, Christopher Morris
Value in Health. 2015.18(2):334–345.
The objectives of this systematic review were 1) to identify studies that assess the psychometric performance of the English-language version of 35 generic multidimensional patient-reported outcome measures (PROMs) for children and young people in general populations and evaluate their quality and 2) to summarize the psychometric properties of each PROM.
MEDLINE, EMBASE, and PsycINFO were searched. The methodological quality of the articles was assessed using the COnsensus-based Standards for selection of health Measurement INstruments checklist. For each PROM, extracted evidence of content validity, construct validity, internal consistency, test-retest reliability, proxy reliability, responsiveness, and precision was judged against standardized reference criteria.
We found no evidence for 14 PROMs. For the remaining 21 PROMs, 90 studies were identified. The methodological quality of most studies was fair. Quality was generally rated higher in more recent studies. Not reporting how missing data were handled was the most common reason for downgrading the quality. None of the 21 PROMs has had all psychometric properties evaluated; data on construct validity and internal consistency were most frequently reported.
Overall, consistent positive findings for at least five psychometric properties were found for Child Health and Illness Profile, Healthy Pathways, KIDSCREEN, and Multi-dimensional Student Life Satisfaction Scale. None of the PROMs had been evaluated for responsiveness to detect change in general populations. Further well-designed studies with transparent reporting of methods and results are required.
Robert L. Askew, Karon F. Cook, Francis J. Keefe, Cindy J. Nowinski, David Cella, Dennis A. Revicki, Esi M. Morgan DeWitt, Kaleb Michaud
Value in Health. 2016.19(5):623–630.
Neuropathic pain (NP) is a consequence of many chronic conditions. This study aimed to develop an unidimensional NP scale with scores that represent levels of NP and distinguish between individuals with NP and non-NP conditions.
A candidate item pool of 42 pain quality descriptors was administered to participants with osteoarthritis, rheumatoid arthritis, diabetic neuropathy, and cancer chemotherapy-induced peripheral neuropathy. A subset of pain quality descriptors (items) that best distinguished between participants with and those without NP conditions were identified. Dimensionality of pain descriptors was evaluated in a development sample and cross-validated in a holdout sample. Item responses were calibrated using an item response theory model, and scores were generated on a T-score metric. NP scale scores were evaluated in terms of the reliability, validity, and ability to distinguish between participants with and without conditions typically associated with NP.
Of the 42 initial items, 5 were identified for the Patient-Reported Outcome Measurement Information System (PROMIS) Neuropathic Pain Quality Scale. T scores exhibited good discriminatory ability on the basis of receiver-operator characteristic analysis. Score thresholds that optimize sensitivity and specificity were identified. Construct, criterion, and discriminant validity, and reliability of scale scores were supported.
The five-item Patient-Reported Outcome Measurement Information System (PROMIS PQ-Neuro) Neuropathic Pain Quality Scale is a short and practical measure that can be used to identify patients more likely to have NP and to distinguish levels of NP. The data collected will support future research that targets other unidimensional pain quality domains (e.g., nociceptive pain).