USING MIXED MODES AND/OR METHODS TO COLLECT PATIENT-REPORTED OUTCOMES DATA IN CLINICAL TRIALS
- Introduction
The incorporation of the patient perspective in the evaluation of medical products (i.e., drugs, biologicals, devices) has been increasingly viewed as important, if not essential. Medical products aimed at relieving patients’ symptoms and/or improving levels of self-reported functioning will require patient-reported outcomes (PROs) as endpoints in clinical trials. As stated in the US Food and Drug Administration (FDA) Guidance for Industry titled Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims, “Use of a PRO instrument is advised when measuring a concept best known by the patient or best measured from the patient perspective” (FDA, 2009).
There is no doubt that the release of this “PRO Guidance” by the FDA has focused increased attention on the scientifically sound measurement of PRO endpoints in clinical trials. Contemporaneous with the increasing attention on the appropriate use of PRO measures as efficacy endpoints, the use of technology in clinical trials expanded substantially. There is growing recognition of the many advantages of electronic technologies for PRO measures (i.e., ePROs), including less subject burden, avoidance of secondary data entry errors, easier implementation of skip patterns, date and time stamping, and more accurate and complete data (Stone et al., 2002). Migrating from paper to electronic data collection is one of the most significant movements in the PRO measurement field.
Alongside the emergence of many technology-based ways of capturing PRO data is the need to assure measurement equivalence across and among these methods and modes of administration (Coons et al., 2009). This becomes especially important if multiple methods or modes are utilized within a single trial. For the purpose of this paper, we have adopted the distinction made in the PRO Guidance between PRO instrument administration modes and data collection methods. Administration mode refers to self- vs. interviewer-administered, while method refers to the tool used for capturing the data such as paper-based questionnaires, web-based data entry, interactive voice response systems (IVRS), or any of the other ePRO devices. However, regardless of whether different modes or methods are being considered, the comparability of the data obtained via the original and alternative data collection approaches must be assured. If a new or different data collection approach introduces response bias, the appropriateness of pooling data within a trial or comparing data across trials is in question.
The methodological purists among us recommend that PRO data capture modes and/or methods not be varied within a single clinical trial or between trials that seek to provide comparable data. In general, anything that has the potential to introduce measurement error into a trial should be avoided (Streiner and Norman, 2006). Measurement error is, in essence, noise (error variance) that reduces statistical power and attenuates the ability of the trial to detect real change (i.e., treatment effect) in the trial endpoint. Clinical trial designs should avoid as many sources of error variance in the data as possible. In the context of this paper, potential error variance can be introduced into the trial design by different data collection modes or methods used within the trial that do not provide comparable data (i.e., the methods and/or modes lack sufficient measurement equivalence.)
However, although it may not be optimal, mixing of PRO data collection methods and modes within trials does occur and has to be addressed pragmatically. It is clear from the FDA’s PRO Guidance that this situation is anticipated to occur within clinical trials. Specifically, the Guidance states that “We intend to review the comparability of data obtained when using multiple data collection methods or administration modes within a single clinical trial to determine whether the treatment effect varies by methods or modes” (FDA, 2009). The Guidance does not, however, discuss ways for clinical trial designs to ensure the comparability of the data when mixed methods or modes are used.
A prior ISPOR task force report addressed the evidence necessary to demonstrate measurement equivalence between electronic and paper-based PRO measures (Coons et al, 2009). Although the task force recommendations focused on the migration of paper-based PRO measures to electronic platforms, the same principles apply to the assessment of measurement equivalence across and among all PRO data collection methods and modes. However, here again, the use of mixed modes and/methods of data capture within clinical trials was not substantively addressed.
The purpose of this paper is to provide recommendations regarding good research practices for studies in which PRO data collection methods and/or administration modes are mixed within a single trial, or across trials that are intended for direct comparison. The objective is to address issues that must be considered to avoid sources of measurement error that materially impact the measurement properties of the instrument being used to capture PRO endpoints in clinical trials. Our intent is to provide practical means of optimizing data integrity when mixed modes and/or methods are unavoidable or when the benefit of mixed modes or methods is perceived to justify the risk.
Research settings: While mixed methods and modes may occur in all research settings, this article will focus on their use in clinical trials, in particular, as the stakes and risks are highest for sponsors and investigators in this setting. In clinical trials, mixed methods and modes may occur within and between the following levels:
- Drug development programs
- Clinical trials within a program
- Countries within a clinical trial
- Sites within trial or a country if multinational trial
- Patients within a site
Process: to address this complex issue, ISPOR has initiated a working group to develop good research practice recommendations for the implementation of mixed methods and modes in clinical trials and the analysis of the resulting data.
- What are the issues with using mixed methods and modes in clinical trials? (Suggestion: to list the types of modes and methods and potential issues with each)
- Mixed modes of administration (i.e., self vs. interviewer)
- Brief review of the psychology of self-report to help the reader understand the source of mode effects, e.g., question comprehension, recall, sensitivity of questions, comfort, etc.;
- Mixed methods of data collection
- Paper – Handheld device
- Paper – Tablet
- Paper – Web
- Paper – IVR
- IVR – Web
- Handheld device – Tablet – Web
- Impact of mixed method/mode effects
- How method/mode effects manifest themselves, differences in mean, variance, reliability; or alternatively, differential item functioning
- Reported magnitude of the various method/mode effects to put into context the appropriate analytic approaches;
- How likely is it that method/mode effects have caused incorrect conclusions to be drawn in clinical trials? For example, if both treatment arm efficacy estimates have an upward bias, then mathematically that bias is accounted for automatically in any comparative analysis.
- Given the risks, why use mixed methods or modes in clinical trials?
- Minimize missing data due to device loss or failure or patient not using the preferred method or mode
- Accommodate technology-challenged countries, sites or patients
- Improve effect estimates by using triangulation of results across sources.
- Literature review (accounting for study populations (pediatric, adults, geriatric), disease areas (chronic, acute, oncology, etc…), type of PRO endpoints, etc. The mixed methods/modes issue should be addressed in well-defined contexts to answer the question: Does the context make a difference?)
- Mixed modes of administration studies
- Mixed methods of data collection studies (within self-report mode)
- Review of FDA Guidance and other regulatory documents
- Strategies for appropriate use of mixed modalities in clinical trials
- Non-statistical approaches to minimizing method or mode effects
- Staff training
- Respondent training and instructions
- Efforts to minimize differences in questionnaire completion rates
- Sound survey design practices (Dilman and Salant)
- Considerations by level starting with drug development programs
- Older studies most likely used paper, newer studies more likely to use ePRO
- Fewer issues between programs for same compound
- Clinical trials within a program
- Between Phases
- Changing methods between Phase II to III within a program
- Changing methods between Phase III and IV
- Within Phase
- Issues specific to Phase II or III trials
- Using different methods between same phase trials especially Phase III
- Within a trial:
- Baseline in one method (at clinic) and follow up in another (at home)
- Countries within a clinical trial
- Between countries
- Mixing methods between countries may be unavoidable in ePRO studies where technology may not be suitable for specific countries
- Recommendation: mixing between countries to control for nesting effect of language and method acceptable
- Within a country (from here down not recommended in clinical trial except for emergency needs)
- Between sites
- Within a site
- Between patients
- Within patient (Backup data collection methods specified in protocol if device is lost or fails or patient can’t get to IVR)
- Within instrument and administration (should patient be able to start in one method and finish in another, in the same administration?)
- Considerations when mixing modes as well as methods in a trial
- Analytical approaches for evaluating mixed methods/modes data using specific examples
- Background
- Brief lit review of statistical approaches to analysis of data from mixed methods/modes studies. This can draw largely from methods to examine order effect and alternative formatting (e.g., Grunberg, et al. QOLR). There are methods to avoid/assess the amount of cascading errors in general. These methods are (or are not) different from what is typically done to analyze PRO data.
- Instrument development time point: prior to use in a trial, look at different methods/modes and then how to address them in a subsequent trial (this should be mentioned earlier as well, indicate that both item and scale development is involved)
- A comprehensive statistical analysis plan outline for trial data
- Discussion of the type of mixing observed and the implications. Pick up the point in main paper about different methods/modes within a patient as opposed to different methods/modes across sites/countries that are nevertheless consistent within patients.
- Discuss effect of administration interval and how it would impact effect size. Solutions here would be case-specific, there is no blanket recommendation that we can make.
- Diagnostics
- Variance stabilizing and interpretability transformations. (small section on visual analogue scales)
- advantages and disadvantages of both approaches. brief section.
- t-scores (use example of the SF tools; PROMIS)
- 0-100 comparable scaling (use example from cancer control studies with multiple measures of fatigue, qol).
- Comparing scales (indicate how to carry out the diagnostic analysis and discuss the implications)
- Equivalence (mean, variance, Bland-Altman, regression model)
- What level of evidence should a sponsor/investigator consider providing about equivalence between methods or modes in a pivotal trial vis a vis the FDA PRO Guidance and recommendations of the first task force
- Recommendations for standards of equivalence for levels described above (i.e., highest standard for within patient mixing, lower for site, country, etc.)
- Reiterate recommendations from previous document. Indicate that mixing modes/methods might add to the error variance and hence reduce reliability. Demonstrate using power calculation examples that you will need to adjust sample size up if you are assuming increased variability due to mixed methods/modes.
- Baseline in one mode/method, follow up in another
- implications for responder analyses: indicate importance of defining cutpoints and defending
- implications if eligibility criteria defined based on PROs –there are examples of mixed modes/methods in this respect in the literature
- Sensitivity Analysis with and without the alternative mode or method
- to determine effect size impact
- effect of methods or modes on responsiveness, reliability, validity
- Latent variable modeling/factor analysis
- use as a diagnostic for impact of mixed modes/methods
- Differential functioning item analysis
- describe how it can be used to assess impact of mixed modes/methods
- Combining data
- simply combine the data as if it came from the same source if diagnostics so indicate
- if diagnostics indicate non-equivalence
- add in a factor for the source of the data;
- deal with it as we would with any other source of variation.
- Analogous to treatment by center interaction for example.
- Missing data considerations (small section giving refs)
-
Indicate imputation approaches, a particular advantage of mixed methods or modes to provide different sources for the same info perhaps or just leave as the methods are the same for one method or mode (which for most applications they would be).
-
Discussion of reliability and increased number of observations. Increase in precision maybe with illustration table if values were MAR. Indicate that the most common situation may be equal, or higher, reliability and increased observations, but the observations that are missing (which were suggested to be MNAR) might be more problematic (than those MAR) if a systematic bias is introduced because of mixing modes/methods.
- A template for presenting results: summarize i-iii from (b) in an example report (supplemental appendix)
- use the example of a clinical trial to compare effects of a dummy agent to a placebo to drive the points. Nextamethosone was an example used for ASCO in this manner (Stockler et al, ASCO 2006). In this way we can go between the general procedure and the specific example (pain, anxiety, mood, tx schedule, provides contextual help for readers)
- Discussion of analytics section
- mixed methods and modes will occur, so methods for best practices are needed
- discuss the balance between statistical, theoretical purity and empirical practicality
- put into context the order of magnitude of the likely effect and size of the “problem”. Does the evidence suggest that the problems induced by mixed methods/modes are trivial, prohibitive, or somewhere in between? (it depends)
We discussed the tonality of the end result within a context of comparable analogies such as:
a. mixing modes of blood pressure cuff (person versus machine)?
b. Another example is glucose counts by machine versus quickie tests - Conclusions of analytics section
- the size of the error is (or is not) a major concern
- methods are (or are not) readily available to correct for such bias
- investigators and sponsors are encouraged to use (or warned off using) mixed methods/modes
V. Conclusion
VI. References
|