Regulatory Guidance Issues


Quality of Life Promotional and Label Claim Issues for Drugs, Biologicals, and Devices


Issue 1: Terminology - Do we need to use the term ‘quality of life’ for labeling purposes? While it is in the vernacular today, most measures stem from health status assessments, and this may be a more accurate description of the measures.

Summary-Terminology

Various terms have been used to describe a range of functional and health status outcome measures. These terms vary in specificity and without additional information to describe the effects being described, it is often unclear what such terms mean. A broad and expansive term, such as‘Quality of Life" can connote a wide array of outcomes. More narrow terminology, such as‘Health-related Quality of Life" denotes that the outcomes fall within a range of‘health-related domains."

The use of various terms in labeling and advertising should be considered on a case-by-case basis. Terms describing the outcomes measured should be descriptive, narrowly tailored, and non-misleading. In general, the term ‘Health-Related Quality of Life’ should be retained because of it meaningfully communicates an important aspect of pharmaceutical outcome measurement. To assure that the audience is not misled, however, this term must be qualified by describing the specific domains covered by the entire set of measure(s). Unqualified use of the term marketing and promotion should be avoided unless the sponsor provides evidence of positive HRQL outcomes on a wide variety of HRQL measures covering both general and disease-specific outcomes.

  • Responder 1
    The term 'quality of life' is neither necessary for labeling nor even appropriate. Because of its simultaneous use in the lay press as a general, emotional-laden term, it is probably a poor choice to embrace specific measurements based on explicit methodology and metrics. Health status assessment may more adequately describe the quantitative measurement of individual perceptions of health and disease, particularly if health is defined in the encompassing sense of the 1947 WHO definition. What is important is to remember is that this field is built around an effort to describe value based on the individual's perceptions using items that are important to him. It is not clear that this sense is more adequately captured with the term 'quality of life' than with 'health status assessment'. Perhaps a different term altogether is needed as 'health status assessment' could include assessment by a clinician or other observer.

  • Responder 2
    There are problems with both ‘quality of life’ and ‘health status’ as terms for this type of data. If asked to choose one of them for labeling purposes, I would recommend that ‘quality of life’ be the term. One pragmatic reason for this choice is that ‘quality of life’ is the more appealing term for marketing purposes. In addition, I believe that researchers in this area are going beyond what the public would perceive as ‘health status.’ We are attempting to measure not only self-reported health status but also the impact of health status on functioning and well-being in important health-related domains of life. My primary concern is that the term ‘quality of life’ will be used too liberally, particulary if the changes reported or promoted are only in one or two specific domains of health-related quality of life (HRQoL). Unless a global HRQoL measure was being used, authors of claims should not be able to use the term ‘quality of life’ unconditionally. Statements should be made that reflect the specific areas of improvement (i.e., ‘Quality of life was enhanced in the areas of mental health and physical functioning …).

  • Responder 3
    It seems that the QOL world is moving to a more standard terminology of HRQOL as opposed to simply QOL. Functional Status crops up at times as well, and it may be useful to define what those terms mean, but any labeling/wording is likely to be discussed for a specific product, so while general guidance ‘use the term HRQOL instead of QOL’ would be useful, it is not essential.

  • Responder 4
    It is absolutely essential that a clear distinction is drawn between the widely used (and abused) term ‘quality of life’ and the more precise term ‘health status’. The is simply no justification for using QoL as some kind of shorthand description for the domains in which health care interventions impact. A preferred reference to health-related quality of life (HrQoL) might be tolerated, as having broad equivalence to health status.

  • Responder 5
    While most of the empirical "quality-of-life" research currently reported in professional literature, whether clinical trials, epidemiologic investigations, or other types of studies, incorporates measures that more appropriately might be called health status measures, new approaches to instrument design have been tested and are gradually being applied in various settings. Among the features of these new measures that set them apart from those developed earlier and that lead us to think of them as quality of life rather than health status measures are the following: direct solicitation from the respondent (or patient) of domains and subdomains that the he or she identifies as important to his or her quality of life; evaluative response options that indicate satisfaction with performance or feelings or that indicate deviation between expected and actual performance or feelings; and development of operational definitions of HRQoL that recognize this as a measurable, single construct, rather than as an aggregate of many domains to be combined using scaling methods derived from either psychometric or utility theory.

    Thus, the field of HRQoL now encompasses diverse strategies, generally referred to as instruments, to distinctly measure the concepts of health status and health-related quality of life. Further, as usage of these measures expands, additional types of instruments can be expected thereby making the distinction more important. At this point, developers and users of these instruments need to specify what is being measured, whether health status or health-related quality of life.

    The tendency has been to over use the term health-related quality of life because it has had more cache. Researchers and those responsible for developing labelling statements need to be encouraged to accurately represent the studies that support labelling claims. That is, measuring health status can be just as valid as measuring health-related quality of life, depending on the nature of the condition being treated and the treatment itself.

  • Responder 6
    It is an advantage to use quality of life in labeling, the connotations associated with health status do not emphasize that the assessment is subjective, not does it refer to the multidimensionality inherent in the quality of life concept.

    The term needs to be defined precisely by FDA. Something like‘a health-related quality of life change as measured by subjective questionnaire". And a caveat that QoL changes are health effects deemed important to a large group of persons with the disease in question, but may not apply to every individual with that disease.


Issue 2: Stand-alone claims - Can a study that is powered solely to detect QoL be enough for labeling/promotion when there is not enough power to detect a (secondary) clinical endpoint?

Summary: HRQL as an Endpoint

Most often, HRQL measures are included in clinical trials as secondary end-points. It is conceivable, and in some cases desirable, to engage in trials where HRQL are primary endpoints. Generally, there should be sufficient power in the analyses to assure that actual differences among groups can be detected, regardless of whether the analysis includes primary or secondary endpoints.

To make advertising and labeling claims, it is imperative that HRQL claims be fully understood and characterized in terms of the natural history of the disease. This often requires data analyzing HRQL effects in light of clinical outcomes. Thus, in most cases, it is desirable to obtain outcomes data on both HRQL and clinical outcomes.

  • Responder 1
    Statistical power (and statistical significance) is based on the relationship between an estimate of central tendency and an estimate of variability. If a study is adequately powered for QoL and not for a clinical endpoint, it indicates that there is greater variation (in relation to changes expected) for the clinical endpoint than for the QoL endpoint. A high level of variability in relation to observed changes brings into question the value of any measurement. Thus, if there is power to detect a QoL endpoint and not a clinical endpoint, the more robust QoL endpoint is probably the more meaningful. Alternatively, it may be that the action of the therapy in question may only alter perceptions, which should or should not be the basis for a labeling claim depending on the therapy and the disease. If so, should there first be a trial elsewhere providing that clinical evidence? If there is a change that is perceptible to the individual and this change has been measured by a valid and reliable questionnaire in a randomized blinded trial, it may indicate a need to look further for a better (more reliable) clinical measurement. If a claim of improvement in patient's perceived health is all that is requested, then a trial to provide clinical evidence is probably not necessary. If there is a desire to make a claim regarding a change in the natural history of the disease, then additional evidence based on more objective measures should be required.

  • Responder 2
    It is possible that there will be cases where HRQoL is the primary endpoint in the pivotal trials. Hence, the‘clinical" data issue may be moot. However, this issue must be referring to the situation in which data is being collected for an HRQoL claim for a product whose primary endpoint in the pivotal trials is a clinical indicator. The optimal situation would be to collect the data simultaneously in a trial that is powered for both the primary and secondary endpoints. If that is not possible or was not done, an appropriately designed (including adequately powered) study with HRQoL as the primary endpoint could be sufficient for a claim. However, before this would be acceptable, there must be adequate existing evidence of the clinical safety and efficacy of the product.

  • Responder 3
    It is not inconceivable that an HRQOL type of measure would be used as a primary efficacy endpoint. Consider treatment of depression, arthritis, some cancers, irritable bowel..anything where you have to collect information directly from the patient for management of symptoms, not necessarily a cure. In those cases, the disease-specific-HRQOL measure may be the efficacy measure. In that case though, I would suspect the agency would want to see preliminary data (e.g. from phase II trials) that supports the endpoint. In other cases, such as migraine, where HRQOL is collected in addition to efficacy measures, the sponsor is likely to have adequate efficacy data such that they may measure efficacy in the study, but it doesn’t need to be a primary endpoint, just supporting.

  • Responder 4
    There is a serious technical problem that needs to be addressed here - namely the powering of studies that use HrQoL (NOT QoL) as a primary endpoint. This problem relates to the technical design defects of profile measures, the high degree of variance observed in HrQoL measurement, and the confusion of ‘significant’ differences as differentially defined by the researcher, clinician and patient. However, nothwithstanding these problems, suppose that a study were to be powered on the back of a HrQoL measure, then it seems likely that the power calculations would tend towards an overspecification of the sample size, hence insuring the status of any clinical (secondary) observations that were included. Where the power was insufficient, then clinical results would have to be treated with a corresponding degree of caution.

    I would not as a matter of routine, require a separate trial to establish independent clinical evidence, prior to an HrQoL-led study. When it comes to reporting any of this to a lay-public, I would suggest that some cautionary words might be added to patient information, indicating that the presence of HrQoL effects is no guarantee of positive clinical effect, but since the information is based (mainly) on patient self-report it does serve as an indicator of subjective, patient-assessed benefit.

  • Responder 6
    In most cases the sample size required for the assessment of quality of life is larger than for clinical parameters. Whatever the case is, a study should be powered to accommodate a sufficient number of patient for the primary and secondary end-points. Unless the drug under study is not a quality of life promoting drug, evidence of the clinical efficacy is priority number one. In conditions lacking an objective marker of disease activity, quality of life could be identified as a primary end-point provided several quality of life domains are identified in the protocol as end-points.

    A study powered to detect QoL changes will have a larger number of participants than a study of clinical endpoints. The claim study must complement studies used to prove clinical efficacy. There should always be a efficacy study preferably conducted prior to the QoL claim study. Efficacy studies and QoL claim studies can be run simultaneously or separately. QoL claims can be submitted with an NDA filing or separately subsequent to the filing.


Issue 3: Safety

Issue 3a: Should QoL be a substitute for adverse events (AE) reports?

Summary: Safety — HRQL as an Adverse Event

HRQL should never be substituted for or included in the reporting of adverse events. Adverse event reporting should be considered a separate and essential component of drug testing. HRQL measures subjective perceptions of a person’s health, which may be influenced by a variety of factors, including the disease and its treatment. Adverse event reporting solicits a specific set of drug-related outcomes. These are not equivalent measures.

  • Responder 1
    No. Current measurements of QoL are based on the individual's perception and to greater or lesser degree depending on questionnaire used, involve some integration of multiple symptoms and feelings into a score or score(s). The purpose of AE reports is to provide a signal for potential safety problems. Using only the perceptions of the patient and allowing integration of some improvement and possibly some decrement in health in a single response will obscure many signals.

  • Responder 4
    These are not isomorphs. Side-effects / ADRs will impact on health status in different patients to different degrees. I have developed a 0-1 scaling system based on the 65 most commonly reported ADRs logged by WHO, so that we can ‘weight’ side effect profiles independently of HrQoL. We should maintain a clear separation of these two distinct phenomena.

  • Responder 5
    As the above suggests, health-related quality of life, especially as operationalized in the generation of instruments currently being developed, is a different concept than is that of an adverse event. This distinction may be unclear from clinical trials, usually designed and implemented some years ago, that include symptom checklists as part of a battery of health-related quality-of-life questionnaires. While, instruments used to collect adverse event and health-related quality-of-life data may appear to be similar, the nature of the data differs in terms of question phrasing and response options. Thus, the two types of information are not direct substitutes.

  • Responder 6
    Quality of life should never be a substitute for AE.s. Most symptom questionnaires (i.e. tapping one quality of life domain) tend to trigger a higher incidence of symptom reporting than the traditional AE reporting, or active questioning. This has been nicely demonstrated in some publications. One needs to consider why quality of life is being measured - in many cases this is to ensure that a slight in crease in AEs does not compromise the quality of life in terms of well-being and functioning. Therefore, I do not think that one should drop a quality of life question, one should keep quality of life separate from AEs. The only instance when a question in a quality of life questionnaire might trigger an adverse event reporting is if the patient jokingly volunteers extra information in order to explain why he or she feels bad (a note that hospitalization has occurred needs to be cross-checked with the AE reporting).

Issue 3b: When does a question inside a QoL instrument trigger AE reporting?

Summary: HRQL Questions and AE Events

Prior to using any HRQL form, questions should be reviewed and appropriate decision rules adopted for the reporting of adverse events. If there are concerns that responses to specific HRQL questions might be considered an AE event, there should be alternative measures that more directly measure the outcome. If for any reason a sponsor believes that a scale response signals the occurrence of an AE there should be an a priori method established for reporting.

Removing individual scale items to avoid AE reporting is not acceptable. Deleting individual scale items can negatively influence the psychometric properties of a scale. Scales with deleted items may need to be revalidated prior to use to assure that the‘new scale" sufficiently measures the HRQL concept.

  • Responder 1
    Because QoL questions are designed to be broad and explore the individual's perception of how these symptoms and changes affect their functioning and roles, responses to QoL questions are likely to be affected by many events beyond those related to the clinical trial and the therapies involved. There is probably no reason to report an individual response or change on a QoL question as an AE. However, negative changes on a total score or a domain score for the treatment as whole warrant further investigation and consideration as an AE.

  • Responder 4
    If a HrQoL instrument contains items (or combinations of items) that are suggestive of an adverse event, then such items should have been identified prior to the study, and appropriate decision rules for patient management ought to be in existence. If no such prior decision rules have been formulated, then I believe that the ‘accidental’ suggestion of an AE secondary to inspection of the HrQoL data (itself likely to be a rare event) lies outside reporting requirements.There is no legitimate reason for censoring HrQoL measures.

Issue 3c: Can one ‘drop’ a QoL question if it would be the type to trigger AE reporting?

  • Responder 1
    Questionnaires should be used in their entirety. Questions should not be dropped for any reason once a questionnaire is completed and determined to be 'psychometrically valid'. Specifically, to drop a question because it may point to an adverse event due to a drug is equivalent to saying that there is no need to probe for specific symptoms as they may trigger adverse events.

  • Responder 2
    Even though I am not sufficiently informed about AE reporting, I would have to say no. It would be very difficult to justify dropping an item from an instrument for that reason. Especially since the basis for the instrument’s selection for the trial is likely to be psychometric evidence with that item included.

  • Responder 3
    The agency seems to feel comfortable with how safety gets reported currently. Regardless of whether HRQOL is included in a study, there are mechanisms to collect and report adverse events. To that end, I don’t think safety is a general issue, although it may come up on individual cases. Companies should have SOPs in place to manage them. QOL is clearly not a substitute for AEs, so a QOL questionnaire does not require additional reporting, and from what I learned in school, one should not be‘cherry picking questions" from specific instruments unless the appropriate work has been done to verify that the item reduction is warranted.

  • Responder 5
    From a technical perspective, modifying a health status or health-related quality-of-life instrument, for example, by removing one or more questions that might trigger the reporting of an adverse event, can be expected to have negative impacts on the instrument's measurement properties. For example, removing items will almost always result in a lower level of internal consistency. If the observed reliability level is near the minimally acceptable level of 0.70, then removing one or more items may result in an unacceptably low level of reliability. Similarly, removing items will lead to lower estimates of validity.

    Removing items that are considered to result in findings that are unfavorable to the investigator's perspective raises ethical concerns as well.


Issue 4: Domains

Issue 4a: Three domains are useful, but is there concurrence that these three are it (i.e. physical, psychological, social)?

Summary: Domains

The inherent multidimensionality of HRQL measurement often results in HRQL scores for a variety of domains. Although there are often at least two-to-three broad, summary domains measured in general HRQL scales (e.g., physical, mental, and social functioning), there may be additional domains that are subcategories of these summary domains or separate domains, not included in these summary measures. For disease-specific HRQL scales, there are often a variety of domains covered by each instrument.

The number of domains covered by any HRQL instrument should be determined either empirically and/or theoretically. Importantly, each domain should be represented by a sufficient sample of items that validly measures each domain. In selecting names for each domain, care should be taken to accurately communicate the meaning of the measurement items for that domain.

  • Responder 1
    There are probably two underlying domains for health related quality of life — physical health and mental health. Both interact with social functioning, as does the individual's social setting. In turn social situation can have major bearing on an individual's perception of physical and mental health. However, there are many other possible domains which may be worth measuring because of their effects onf physical health, mental health and social functioning (for example, pain, sleep, energy, fears, body image, sexual functioning). Discrimination of questions into specific domains is often data-driven and may also be somewhat dependent on developer's own views as well. What is most important is that a rigorous effort be made to identify as many items as possible that are important to patients. These items will usually go through some item reduction exercise in which the end result should be to capture the most important items in a much smaller number of questions. Whether the remaining questions segregate into three domains or ten is probably less relevant than the fact that all important effects of the disease (and therapy) are captured in the questionnaire.

  • Responder 2
    These are generally accepted to be the primary domains for generic HRQoL assessment. I would consider them necessary but not sufficient in most clinical trials. It is likely that condition or treatment targeted domains (e.g., disease burden, sexual functioning, sleep) will be critical in assessing changes that may be unique to the condition and/or its treatment.

  • Responder 3
    It appears that the agency has recognized general and disease-specific HRQOL instruments. As long as disease-specific instruments are accepted, the issue of domains is likely to be more condition-specific. Domains such as sleep, sexual function, etc. don’t necessarily fall under the three bigger buckets listed, but in some cases they are going to be important and will likely be considered as long as appropriately developed.

  • Responder 4
    The HrQoL universe is made up of an unknown (but probably finite) set of domains over which a scientific/research consensus is lacking. Whilst disagreeing profoundly with the ‘within the skin’ approach adopted elsewhere, I nevertheless recognise its legitimacy within its defined territory. One way out of the quagmire of indecision in this area would be to simply adopt pro temp, the elements of the WHO definition (mental, physical and social). Just as there is no technical upper bound to the domain choice, then there should be no lower bound. However, where observations are made on a single domain only, then this should not be described as HrQoL measurement.

  • Responder 5
    Health-related quality of life is a complex concept that may be measured by using a single, global item or by using a multi-item instrument developed to tap function or perceptions related to two or more domains. For example, the two domains of health perceptions and activity limitation have been shown to be adequate for monitoring the health of the U.S. general population over time. For measuring health outcomes in medical care settings, however, more domains are generally needed. The Medical Outcomes Study 36-item short form, an instrument that is widely used in clinical research and clinical practice settings, assesses eight domains. To date, no recommendations exist for determining the number and nature of the domains to be contained in a given instrument for use in a particular setting. Rather, the included domains have been selected to reflect the purpose of measurement.

  • Responder 6
    Several domains are inherent in the multidimensionality of the quality of life concept. If physical, social and psychological are the core domains can be discussed. In the elderly social functioning tends to be the least affected area (friends have died, patients adapt to their disease, with aging there are less expectations to be able to go out socializing etc.). Satisfaction is certainly related to quality of life. If the quality of life is improved satisfaction can be expected to improve. But that may be the case even if the quality of life remains the same.

Issue 4b: How does satisfaction play in - or is that not a part of QoL?

Summary: The Relationship Between HRQL and Satisfaction

Satisfaction, like quality of life, is an expansive concept that includes a large variety of measures. Satisfaction is often conceived as assessing the difference between an individual’s expectations and an evaluation of actual performance or outcomes. The application of a satisfaction measurement can vary considerably depending on what is being evaluated. Although there may be some overlap in the questions or the domains covered by a HRQL and a satisfaction instrument, these two types of instrument should be considered as measuring different types of outcomes.

  • Responder 1
    Satisfaction, like quality of life, is also an expression of an individual's perception. Reported satisfaction with a particular therapy, health care, or provider may have an impact on quality of life, but this satisfaction also may not make much difference in an individual's perception of their QoL. Thus, until proven otherwise, it should not be assumed that such satisfaction questions measure QoL. Other satisfaction questions could ask about physical health, ability to perform activities of daily living, freedom from pain, etc. These may be valid expressions of quality of life. However this needs to be demonstrated using accepted psychometric principles if there is a desire to claim quality of life benefits from these questions.

  • Responder 2
    Satisfaction with health status could be included as a part of HRQoL assessment and is likely to be‘factored in" when a person is providing a self-report of their overall well-being. However, I firmly believe that satisfaction with the provision of medical care or other aspects specific to the health care delivery system are a separate issue.

  • Responder 3
    I would place satisfaction as a separate issue from HRQOL. The literature certainly seems to indicate the issues are separate, although satisfaction is important, and can be seen as an outcome of health care in and of itself, I think you muddy the waters if you include it with HRQOL.

  • Responder 4
    This is not in itself integral to the measurement of HrQoL, but could be a co-variate. It may relate to patient expectation and hence have a confounding effect on the measurement of outcomes consequential to treatment. I would suggest that satisfaction measures (if recorded) ought to be presented independent of any HrQoL or clinical data. Supposing that 80% of patients were happy with a product that yielded gains for only 20% of those treated. How would that be set against an alternative in which 20% of patients were ‘satisfied’ but clinical or HrQoL measures indicated that 80% patients benefited significantly?

  • Responder 5
    Satisfaction with health or lack of good health, rather than with the delivery of health care, is generally considered to be part of health-related quality of life. This concept can be included as one of the domains in a multidimensional measure. The Satisfaction with Illness Scale illustrates how this concept can be operationalized. On the other, satisfaction with health can be reflected in the response options that are used. For example, respondents might be asked to rate their ability to do certain physical tasks in terms of satisfaction with performance on each of the tasks.

  • Responder 6
    Satisfaction to me reflects issues involving delivery of care and should not be part of OoL assessment in clinical trials. In fact, they should be excluded as they introduce bias. Satisfaction specific to the drug effect may be part of QoL though but should not be referred to as satisfaction issues.

Issue 5: Side-effects

Summary: Side Effects and HRQL Assessment

HRQL seeks to assess the impact of a treatment on an individual’s functioning. The impact of the treatment’s side effects should be included in HRQL measurement, especially for claims that a drug‘maintains" HRQL. This may be accomplished through the use of additional scale items added to the battery, the use of a generic quality of life scale or the inclusion of HRQL scales to measure the impact of drug side effects.

Issue 5a: Are side-effects part of QoL assessment or separate?

  • Responder 1
    Side-effects (and disease symptoms) tend to be measured more by presence and absence. QoL assessment tends to be a measurement of the impact of these effects and symptoms on an individual's perception of their health status and functioning. Side-effects (and symptoms) could (and perhaps should in a broad manner) be included in QoL so that we not only learn that a side-effect has occurred but also what impact this side-effect has had on the individual.

  • Responder 2
    Side-effects may manifest themselves in ways that impact HRQoL, but they are separate.

  • Responder 4
    See comments on AE. HrQoL - at least in my definition - relates to a generic construct. The inclusion of items dealing with particular drug effects suggests the use of a condition-specific measure.

  • Responder 5
    Health-related quality of life is the impact of disease, injury, treatment or policy on length of life as well as impairments, function status, perception, and opportunity. Thus, to the extent that treatment side effects influence the domains of health-related quality of life, the impacts of these effects are important to include in a comprehensive operational definition of health. For example, if a treatment is known to cause drowsiness, an instrument might try to determine the extent to which this impacted on patients' ability to perform their usual social roles. Similarly, impacts of treatment effects can be included as well.

  • Responder 5
    Side effects should be separate from the quality of life assessment. How drugs effect the quality of life can be asked if relevant, e.g. cancer is a good example. But there are other cases where is less relevant.

    I would interpret a side effect to be a treatment effect that does not involve the primary clinical endpoint. This is difficult to answer because a QoL domain change could be defined as a side effect. Medical side effects like toxicity are not part of QoL assessment but can effect QoL score changes.

Issue 5b: Should a QoL questionnaire ask about how drugs effect QoL - or should this be disease-specific only?

  • Responder 1
    Generally QoL questionnaire ask about health status, functioning and, in disease-specific questionnaires, how a specific symptom may affect QoL. It is probably best to continue to measure the effect of drugs on QoL by indirect questions that ask about current status, changes over a past period of time or changes in effects due to changes in symptoms. To explicitly ask about how a drug affects QoL seems leading and unnecessarily specific, as well as resulting in a rather limited and non-generalizable result.

  • Responder 2
    I don’t believe that asking specifically if the drug led to improvements or decrements in HRQoL is particularly informative. Attribution of the cause of changes is a complex process and it may add unnecessary subject burden to the HRQoL assessment process. The effects of the drug and the amelioration of the condition it is treating should be reflected in the data from a responsive instrument. I would need to hear a clearly articulated justification for asking specifically about how the drug effects QoL as opposed to just asking how QoL has changed since the initiation of treatment.

  • Responder 3
    I would collapse this with‘safety". Side effects or adverse events are the reason why there are safety assessments. They are kept separate. Not completely clear what issue 5b is about. I recall some disease-specific questionnaires do include some drug-related questions, but it is clearly not required. This would also be rather condition-specific. Additionally, if the agency is trying to write guidance for drugs, devices and biologicals, drug-specific queries, while important, may not fall under the guidance as they would be too specific.

Issue 6: Clinical trial administration

Issue 6a: Should there always be a generic and disease-specific questionnaire used in clinical trials?

Summary: Use of General and Disease-Specific HRQL Instruments

It is often desirable to include both general and disease-specific HRQL instruments in clinical trials. This can be accomplished with‘modular" instruments that include a set of generic questions supplemented by a set of specific disease-related questions that vary with the disease under question. The purpose of the disease—specific questions is to determine, in a sensitive fashion, difference among study arms. The generic instrument can be used to assess overall HRQL, measuring overall impacts of the disease and its treatment, and to help provide a fuller context for understanding disease-specific HRQL results.

Unfortunately, the ability to include a variety of HRQL measurements in clinical trials can place an increased logistic and financial burden on sponsors. If individual disease-specific or generic HRQL scales can validly and sufficiently assess the impact of a condition and its treatment and can provide interpretable data, then single instruments are desirable.

  • Responder 1
    If it is desired to measure QoL in a clinical trial, there should always be a generic questionnaire. A disease-specific questionnaire allows for measuring much smaller changes. If a generic questionnaire is a yardstick by which different diseases and different interventions are compared, then a disease-specific questionnaire is a micro-calibration that allows measurement of changes that can only be seen with a magnifying glass. Unfortunately, these micro changes cannot currently be compared from disease to disease.

  • Responder 2
    If HRQoL is being assessed in a clinical trial, there should always be an assessment of general HRQoL. It doesn’t matter if this is done through the use of two separate instruments (a generic and a disease-specific) or with one instrument in which the generic core domains are embedded.

  • Responder 3
    If general and disease-specific instruments were to be required in all CTs in which HRQOL was to be measured, this would pose a standard that is not supported by current data. If previous studies demonstrate change via a disease-specific instrument and not with a general instrument, it would seem reasonable to not include the general instrument in future studies. It is expensive to include additional pages, it creates additional patient burden, it creates additional statistical issues and if there is no intended use for the data. Mandating instrument inclusion in trials is an important issue if it were to be included in guidance. The guidance might suggest that both be considered for use initially, but it is clearly not useful to continually repeat the measure for no purpose.

  • Responder 5
    In the clinical literature related to health-related quality of life, disease-specific measures are commonly used. Some of these include items that ask about the impact of treatment on function or perception. Treatment-specific measures, such as the Treatment Impact Questionnaire, have also been developed, although far fewer of these instruments exist than do the disease-specific measures.

  • Responder 6
    There should in most cases be a disease-specific question. Depending on the population and the study question, a generic questionnaire could be included as well. In case of premature discontinuation patient should be asked to complete a final quality of life q. If discontinuation is due to death, frequent assessments of quality of life should be performed and the last available measurement used in a last value carried forward analysis.

    It depends on the indication. There will be indications where generic instruments will have too many floor effects (terminal cancer etc) to be of any use. Generic instruments, like summary scores, should be used primarily for validation purposes. Usually the disease specific instrument should be used for QoL assessment because it will be more sensitive to change. Again, a validation issue if disease specific shows change and generic does not.

Issue 6b: How does one ensure follow-up on treatment protocol violators/drop-outs for QoL measurement?

  • Responder 1
    The same way one ensures follow-up on other outcome measures for protocol violators and drop-outs. The protocol can specify that these individuals are to be asked to complete a questionnaire at time of discontinuation or disenrollment and investigators can be encouraged to obtain this information whenever possible, but if the patient refuses to respond or does not comply there is little that can be done.

  • Responder 2
    Clinical trial site coordinators and investigators must be convinced that this data is just as critical as all the other clinical trial data they are collecting. Therefore, attempts to prevent drop-out or loss of data should be the same for the QoL measurement as it is for the other measures. If the subject is lost to follow-up for all aspects of the trial, then there is nothing additional that can be recommended.

  • Responder 3
    Handling of drop outs/early terminators is an issue that may be important, although the standard way of handling is probably sufficient. The issue of censoring due to death, drop-outs due to AEs etc. has been raised, and there is no easy way to deal with it, other than full disclosure in the label. This may be a drug specific issue. To specify a precise way of handling this across all studies may not be appropriate, but as with 6a, it becomes very important if included in the guidance.

  • Responder 4
    This is a general problem that is really independent of HrQoL measurement.

  • Responder 5
    There are several ways of dealing with this issue such as last value carried forward analysis and proxy administration. Methods need to be tailored to the patient population.

Issue 7: Timing / administration of measurement Issue

Summary: What is the Appropriate Timing for Data Collection?

The timing of HRQL assessments should be related to the expected timing of clinical changes. This should be based on an analysis of the natural course of the disease and consideration of treatment effects. In some instances, HRQL outcomes may lag clinical effects, while in other cases, they may precede clinical effects.

It is often advisable to obtain baseline data on HRQL to measure changes.

Regardless of the timing of the test administration, HRQL tests should always be included in the same location within a test battery. Some people advocate showing respondents answers to previous questionnaire responses. This procedure may decrease variability and lead to more consistent responding. However, there has been little critical investigation of possibly biasing effects. Showing respondents prior responses it is not recommended unless there are supportive rationale for this procedure in individual protocols.

7a: What is the appropriate time frame for the measures to be collected? Should it correspond to when clinical measures are obtained - more frequently or less - if it’s a chronic drug, is 6 weeks long enough, etc?

  • Responder 1
    Time frame and frequency of measures should be related to time frame of expected changes due to the disease, the therapy and the interaction between the disease and the therapy. The questionnaire may also ask about changes over a specific time frame, in which case more frequent data collection seems unreasonable (however, patients are probably not very good at integrating their condition over an extended period time, so a question about health over the past two weeks will probably result in answers that are more likely to reflect health over the last couple of days.) Clinical measures and QoL measures may not go hand-in-hand. Patients may respond quicker on clinical measures; for example pulmonary function may improve as measured by pulmonary function tests sooner than the patient perceives a QoL benefit. In fact a sustained clinical response may be necessary before patient notices a QoL benefit. Alternatively, a patient may perceive a QoL benefit sooner than a clinical response is noticed, for example, analgesic effect following therapy for arthritis before clinical benefits as measured by joint swelling are noted.

  • Responder 2
    This is driven by the treatment and condition being investigated. The clinical trial should be designed to collect HRQoL at the time points where significant HRQoL and clinical impact may occur. For example, if you are assessing the HRQoL impact of a nicotine delivery system for smoking cessation, assessment should occur during the acute phase of the nicotine withdrawal process. HRQoL data collection should correspond with the primary clinical data collection points; however, additional or fewer HRQoL data collection points may be justified depending on the trial.

  • Responder 3
    The literature suggests multiple options for timing of HRQOL data collection. The appropriate time frame may vary from study to study. It may not be necessary to always have administration timed with clinical measures. At a minimum, guidance should suggest a baseline and an end of study measure would be important. At a minimum, the timing should be stated in the study protocol.

  • Responder 4
    Observation of HrQoL should relate to the time frame for the specification of other measures. The focus should be on point estimates of HrQoL, rather than averaged recall over the past x weeks. Frequency of observation should be determined by frequency of patient contact and/or considerations of expected time to respond.

  • Responder 5
    Responses to health-related quality of life measures are as susceptible to the influence of extraneous factors as are responses to other measures used in clinical trial settings. Thus, to obtain useful results, the data collection process should be standardized and the same process followed throughout a study. Thus, if the questionnaires are initially administered at the beginning of the clinic visit, this practice should be continued through the end of the study. Analysis of data from the National Health Interview Survey indicated that asking about health status before asking about the presence of various diseases created an upward bias in reported health. That is, there was 15 percentage point increase in the percent of persons reporting themselves to be in good health when this information was collected before that about disease and dysfunction compared with when the health status data was collected after the disease and dysfunction data. Giving the questionnaires always in the same place relative to other tests will avoid this type of bias in clinical trial settings.

    Questionnaire responses can also vary according to the location in which the data are collected and to the persons who are nearby. Noisy locations may provide distractions; if a noisy location is chosen for all administrations throughout a trial, then the effect may be to increase random variability rather than to introduce systematic bias. On the other hand, locations that are quiet for one administration and noisy for another introduce bias, leading the results to be hard to interpret.

  • Responder 6
    Time frame as for clinical assessments. Enable correlation of changes in clinical and quality of life parameters. Depending of the patient population and the drug, quality of life may be assessed at less frequent intervals than the clinical parameters. In chronic disease a longer follow-up is required, otherwise patients will not have the appropriate time to really perceive the true treatment benefits. If the action of the drug is fast, and the treatment effects rapidly observed, a shorter follow-up period is required. Quality of life questionnaires should preferable be administered before other examinations in order to avoid bias due to the fact that the patient has learned of the results of the procedure. Patients should not see their previous responses.

    Depends on the indication and study type. QoL assessments are appropriate for phase 2b, 3 as well as 4 trials. The time frame needs to be adequate to detect change. Should it correspond to when clinical measures are obtained - more frequently or less again, depends on the indication but for efficiency and cost consideration, usually yes.- if it’s a chronic drug, is 6 weeks long enough, once again, depends on the indication etc?

Issue 7b: Should the questionnaires be administered at the beginning of clinic visits or after the patients have already had other tests done?

  • Responder 1
    Questionnaires should be administered before performing other tests, discussion of the patient's condition and / or status, or probing for side-effects.

  • Responder 2
    Intuitively I would assume that it would be best to have someone complete potentially cognitively demanding questionnaires when they are‘fresher" and not influenced by other preceding aspects of the visit. Either way, the study must be internally consistent. The assessment should always be done at the same point in the clinic visit.

  • Responder 4
    HrQoL measurement should occur at the start of clinic visits, and certainly prior to any other tests.

  • Responder 6
    Always.

Issue 7c: Should the respondents see their prior answers?

  • Responder 1
    There is some evidence that allowing respondents to see their prior answers results in less variability. There is probably no strong reason why this should or should not be done, but whatever decision is made, it should be consistently applied. All respondents should see their prior answers or no respondent should. Respondents should always see their prior answer(s) or they should never see their prior answer(s) during the entire trial period.

  • Responder 2
    Under most circumstances, respondents should not see their prior answers. However, a case could be made to do so depending on the aim of the measurement process.

  • Responder 3
    Within a study visit, timing of administration may not have to be specified in a guidance, but stating that the timing should be consistent for all study participants is important and that should be specified in the protocol. Work by Guyatt and Juniper (I think) suggests that it doesn’t matter too much if people see their previous scores or not. The same issue arises from patient reported efficacy measures...should you show them a previous pain score? It doesn’t seem to matter too much for group analyses. For individual patients it might be quite useful. But this issue is not for individual patient management. Consistency is what is important. A similar issue arises from various routes of administration, which could be a larger issue. For example, if the patient completes the first questionnaire in the office and additional questionnaires from home or by mail, the results might vary.

  • Responder 4
    This is an issue which requires empirical investigation. However, I would personally opt to collect HrQoL information on a blinded basis, and allow the patient to see their prior observations afterwards. If s/he wishes to modify their current responses having seen t-1data, then this change could be noted separately.

  • Responder 6
    Generally no unless for some reason it is required as justified in the study design.

Issue 8: Blinding - Under what circumstances could results from an unblinded evaluation ever be used for marketing or promotion?

Summary: Blinding

Following good clinical trials practice, blinding is highly desirable. As HRQL measures subjective responses, the lack of blinding could pose a serious threat to internal validity. However, in many cases it is impractical or impossible to blind study respondents.

  • Responder 1
    When blinding is difficult and perhaps not relevant, for example in the comparison of two formulations, one oral and one iv, in which the formulation itself may have impact on QoL. Trials would have to be randomized and any additional sources of bias or indications of favoring one therapy over the other avoided.

  • Responder 2
    With the increased sophistication of HRQoL measurement techniques and the increasing interest in including HRQoL as an endpoint in clinical trials, unblinded evaluations should be unnecessary.

  • Responder 3
    This is not so much a HRQOL issue as a general study design issue. There are other areas such as medication compliance, use of devices, where unblinded studies are the only means of capturing the impact of the intervention. So while it is important, it is a bigger issue than HRQOL which is why I listed it as secondary. I would also expect that any initial guidance is likely to be conservative and therefore standard study designs are likely to be the first out of the box.

  • Responder 4
    Perhaps not at all for patient information purposes, but such information could be used to supplement other forms of data. I guess that the concern here is about the link between patient self-assessment (in reporting HrQoL) and their knowledge of treatment. There has to be data out there for studies where patients have gone to open-label (the Abbot Retonavir study certainly has this), and we might look for possible response shifts due to patient awareness. But this could be part of a wider research agenda that was identified as a product of the ISPOR-FDA summit.

  • Responder 6
    If there is no way to blind the patient or investigator, the study results can be still be used. There are such cases. Otherwise, a control group and blinding are desirable.

    In many studies blinding is not possible due to side effects, etc. But the trial should always attempt to blind even though in reality patients often know the treatment they are on. Studies that do not even attempt to blind should be discouraged. This is the philosophy behind ITT analysis - which applies to both blinding and randomization to treatment.


Issue 9: Development

Issue 9a: Can an instrument be truly valid if it is comprised of a battery of scales that an investigator put together without patient input?

Summary: Batter of Scales Approach

In order to measure HRQL, investigators sometimes select several scales that ostensibly measure a variety of HRQL outcomes. In theory, the use of a series of valid HRQL measures should result in a valid HRQL battery. However, in practice, the integration of HRQL instruments may reduce the validity of HRQL measurement. Patient input may be necessary to assure that the questions posed in a battery of scales are not biased by prior questions. The psychometric properties of the battery of scales may need to be described, especially if there are any new scales included in the battery.

  • Responder 1
    Very unlikely. A cornerstone of QoL assessment is to try to ensure that those items of most importance and impact to the patient are included and that during item reduction and validation, items are grouped with similar items to form a domain and that no important subset of questions is lost and no construct is represented by multiple domains. A battery of scales approach allows the investigator to select domains to be included and perhaps to overweight specific areas by including several domains (scales) that measure similar constructs and omit other important domains or at least play down their value in total battery. To obtain a psychometrically valid questionnaire from a battery of scales would involve essentially the same amount of effort as development and validation of a new questionnaire. The investigator would have to appropriately group and weight questions by multiple administrations of entire battery. And there would still be no guarantee that the new questionnaire included all the appropriate domains (or scales).

  • Responder 2
    It could be by chance. It also could be valid if each of the battery’s elements had been originally developed and tested in the target population. However, I would find it hard to believe that the battery would comprehensively capture all of the HRQoL issues that are salient to the target population. The investigator would be well-advised to test it in focus groups or pilot test the battery in a subset of the target population.

  • Responder 3
    There are two issues here, batteries and patient input for validation. Specific guidance on minimal requirements of validity would be useful. The guidance should be broad enough to apply to any instrument regardless of the type. It is possible to create an instrument without patient input, but it is risky. If a sponsor chooses to take that risk, that would be their decision. In the same manner, to imply that an instrument is valid simply because patient input is received would also be inappropriate. If the guidance references the published literature on HRQOL instrument development, that would seem to be sufficient. Guidance on steps to demonstrate validity could be very important, it would seem that some minimal level of validation would be desirable.

  • Responder 4
    Answer (a) validity or ‘true’ validity is an area of presumed certainty - it is fraught with subjectivity.

  • Responder 5
    The use of a battery of questionnaires may be a valid measurement strategy, depending on the purpose of measurement and the characteristics of the study population. Batteries, however, are more likely to measure health status than health-related quality of life given the nature of available instruments. Use of a battery may be the preferred measurement strategy when no disease-specific measure exists, or when a set of existent measures is fully compatible with the nature of the indication.

  • Responder 6
    A battery can certainly be valid, many of the questionnaires in the battery have probably been developed on the basis of patient input, and the investigator is bound to have considered the relevance of the selected methods prior to starting the study. Validation should be conducted in the standard way for batteries as for single questionnaires.

Issue 9b: What could be the rationale or the validation associated with a battery-of-scales approach?

  • Responder 1
    There is no rationale from an objective standpoint. Battery of scales approach allows too much discretion from the investigator to be appropriate in clinical trial. Validation would have to begin with patient interviews and focus groups to understand items of importance and still proceed with item reduction and creation of domains by additional testing. Finally the psychometric properties of the 'new' questionnaire would need to be described. The investigators should be required to demonstrate that they have not oversampled certain items and undersampled or excluded other items of importance.

  • Responder 2
    It may be appropriate to combine existing instruments if it meets the needs of the trial. The rationale would be that, based on the evidence, the investigators believe that the instruments they are combining comprehensively cover the domains of importance in the target population. The validity of the battery will depend on the evidence in the literature supporting the validity of the individual instruments being combined.

  • Responder 3
    The rationale for validation of a battery of scales would be no different than that for validating one scale. You want to know if you are measuring what you think you are. The process would likely be more laborious simply because there is more to validate. The whole issue of using a battery to measure HRQOL has a whole host of challenges in terms of what scales might be considered primary, what secondary, how to interpret results if all scales don’t move in the same direction, etc. As familiarity with HRQOL instruments grows, it may be that the use of a battery of instruments, with the goal of having the battery reflect HRQOL falls by the wayside. A series of questionnaires may be important to describe the impact of a drug, but it may not be HRQOL, it may be specific aspects of functioning, or something else. This is not inconsistent with some clinical measures where multiple measures are used to study an intervention.

  • Responder 4
    Answer (b) assembling a battery of ‘valid’ measures is not the problem - it is the prior selection of the primary outcome measure. Answer (c) since when did patients get to vote on the selection of outcome measures in any clinical trial - in any case, most HrQoL measures incorporate no measure of patient preference, weights are often arbitrary and lack any true arithmetic properties.

  • Responder 6
    In the battery approach, the items and scales should all still be validated in a similar but not necessarily identical disease population. Subspecialty batteries (GI, Cardiovascular, Oncology, etc.) that have been validated in these general populations may be valid to a broader audience. They would fit somewhere in between disease specific and generic and may be applicable to measuring QoL in HMO and other setting, but less useful in therapeutic evaluation. In addition, the battery approach provides a framework for the evolution of instruments by allowing that evaluation of test items as part of the QoL assessment. This parallels the educational testing model where new items are constantly being evaluated.

Issue 10: Evidence of proper development and validation

Issue 10a: What is needed to be submitted with CT data to FDA?

Summary: Evidence Submitted to FDA

Information related to questionnaire development and validation should be submitted to FDA. This includes information related to: procedures for item generation (e.g., patient input), methods of item reduction, formation of domains and item inclusion in that domain, theoretical or construct validity rationale for question selection, response features for individual questions (e.g., measurement across a range of disease severity), as well as information related to reliability of scale measurement.

In addition, there should be an analysis plan submitted that is incorporated into the protocol or precedes data analysis. The data analysis plan should specify any validity research to be incorporated in the study as well as primary, secondary, descriptive or hypothesis generating analyses.

  • Responder 1
    Information on questionnaire development and validation should include description of the following:
    • item generation with patient input
    • methods of item reduction
    • formation of domains from questions
    • proof that individual items belong in assigned domain
    • evidence that questionnaire is measuring quality of life (construct validity)
    • relationship between score on questionnaire and disease severity
    • linearity of response with disease changes at different levels of disease severity

    In addition, an analysis plan describing which domain(s) will be analyzed for QoL endpoint and which domains are expected to be unaffected by therapy would need to submitted. Plan should also indicate methods of analyses and what constitutes a clinical significant difference.

  • Responder 2
    There should be a published data and other supporting documentation that reviews the instrument’s development process, describes the measurement properties of the instrument (e.g., evidence of validity, reliability) in the target population, and adequate justification for the use of the instrument in the clinical trial.

  • Responder 3
    This is an important issue, that goes hand in hand with development. These areas could potentially be collapsed. Issue 10a might also be included with issue 6, trial administration, since this has to do with what goes in the protocol and ultimately with the results submitted to the agency. While the guidance may recommend the use of instruments that have demonstrated reliability and validity, I would not recommend going so far as to saying that an instrument MUST have demonstrated reliability and validity in the intended population before starting the study. It is possible to run concurrent studies, or to use the trial itself as the source for validity information. In some cases, where the study population is very different from the previous uses, this is a risk the sponsor has to decide if they want to take. In other cases, the risk may be very minimal and it would not make sense to delay a trial to conduct some validation work to submit to the Agency with a protocol, when they won’t really want to see it until after the study is completed. Order is less important here, although one would traditionally validate before using, as timelines get squeezed more creative, but still legitimate approaches should not immediately be rejected.

  • Responder 4
    A standardised history file ought to be prepared in respect to all HrQoL measures submitted to the FDA. This could be achieved on an industry basis - or alternatively it might be a project to be coordinated through ISPOR/ISOQOL subject to public funding. Where innovative HrQoL measures are proposed, then a corresponding history file would be required, which would conform to the standard template. This would be the responsibility of the company responsible for the clinical study, and would require third-party endorsement. Variations to the logged history file would require independent verification initiate by FDA. It would be desirable to avoid the use of existing self-declared reference groups, but rather to consider the establishment of a new public body with relevant expertise.

  • Responder 5
    Instrument validation is a key issue in health-related quality-of-life assessment whether an instrument is to be used in a battery or as the sole measure in a clinical study. The classical test theory approach to validation, e.g., construct, concurrent and content validity, has been widely used. It is becoming clear, however, that additional methods of validation will be needed to deal with issues relevant to the use of health-related quality of life measures in clinical trials, to assure that measures continue to be valid across time, and to interpret findings. An important component of these new approaches to validation will be an understanding of when each health-related quality-of-life measure was developed, whether the included domains are time dependent, and how the measure has been used and in which study populations. In addition, the comprehensive compilation of mean scores and, when possible, their change over time for each of the study populations will be important for determining the performance of each instrument.

    The availability of this comprehensive information for existent measures is an important consideration when selecting a measure of health-related quality of life. This availability also may be important determinant in whether or not a battery is used, especially if the alternative is to develop a new measure with restricted resources.

  • Responder 6
    Documentation of reliability and validity according to standard procedures. The correlation is usually low between clinical and subjective parameters. That is the reason why they supplement each other. It is of course desirable that there is a correlation albeit low.

    One large or two smaller studies, adequately powered to detect change using validated instruments. Instrument validation characteristics can be cited in the literature or presented as part of the CT data to the FDA.

Issue 10b: How much correlation should there be between QoL measures and clinical measures?

  • Responder 1
    It depends. If none, then one must question validity of either proposed QoL measure or clinical measure. If perfect, then why include both? Relationship will obviously be better when clinical measure is one that more relevant to patient and when QoL measure includes those domains most likely to change with changes in clinical measure.

  • Responder 2
    The point needs to remain very clear that if all we want to do is mirror clinical measures then why measure HRQoL. We know in certain circumstance that HRQoL may not correlate highly with clinical measures. The primary reason we measure HRQoL is because we believe that there is something uniquely important about the subject’s assessment of treatment outcomes that may not be reflected in more clinical or biologically-based measures. High correlation is important when the developers of an instrument support the validity of it through hypothesized strong relationships with clinical measures. Other than that, attempting to state an absolute level of correlation (e.g., 0.70) between HRQoL and clinical measures is naïve and counterproductive. It is very situational and specific to the condition and treatment being assessed.

  • Responder 3
    The degree of correlation between clinical and HRQOL measures has been discussed in the literature. In many cases we know it is not very high. If you are truly measuring something different than a clinical measure, one would not expect to see high correlations. However if the clinical measure is the HRQOL measure, than obviously that relationship should be 1.0 If the issue is one of validation as opposed to correlation with clinical measures, the area for the guidance to address might be that for validation purposes, a clinical measure might be an appropriate choice.

  • Responder 4
    There is danger of seeing a circular relationship between HrQoL and clinical measures. The real question is not ‘how much’ but rather ‘is there, or should there be a correlation ..’. This ought to be established before the study (if possible), but a near-zero correlation ought not to be regarded as a disaster. Why should there not be a disconnect between clinical observations and patient-assessed HrQoL ? Asymptomatic hypertension might co-exist with high levels of HrQoL, as seems to be the case too, with HIV/AIDS where diagnosis sometimes appears to improve HrQoL. If we had perfect correspondence between HrQoL and clinical measures, then why would we need (either) ? There are no hard and fast rules in respect to the correlation between HrQoL and clinical measures, and it would be dangerous (and unprofitable) to invent guidelines. Let the data shine through.

  • Responder 6
    It depends on the condition. They may be quite uncorrected (e.g. viral load surrogate endpoints for HIV infection and QoL endpoints).

Issue 11: Responsiveness - Should responsiveness be proven prior to a clinical trial being undertaken — or, if discovered in the trial, is this good enough?

Summary: Responsiveness

Responsiveness to change can be considered one aspect to validation. Generally, validity should be demonstrated prior to the conduct of a full RCT, preferably in a population similar to the one used for the RCT. In some instances, where other validity indices are known, an analysis of responsiveness could add additional information about the measurement instrument.

  • Responder 1
    Why would someone wish to conduct a trial with an endpoint, which has not been demonstrated as responsive to change? However, if there has not been a prior treatment that improved disease, estimates of responsiveness may be based solely on relationship with severity of disease. In this case, showing that one group improved or experienced less decrement in QoL over a comparitor would be acceptable without having demonstrating prior responsiveness to therapy.

  • Responder 2
    It is unlikely that the responsiveness of an instrument will ever be‘proven" absolutely. However, there should be evidence of responsiveness in the target population before an instrument is used in a clinical trial. In addition, an investigative team would be foolhardy to use an instrument that had not been shown to be responsive to change (unless they were counting on the lack of responsiveness to be an advantage, which would be fraudulent). In the unlikely circumstance that there has been no opportunity to obtain evidence of responsiveness prior to the clinical trial, demonstrating responsiveness (i.e., the instrument detected change) in a trial would be acceptable. However, if responsiveness is not demonstrated (i.e., the instrument did not detect change) in the trial, it does not indicate with certainty that no HRQoL change occurred.

  • Responder 3
    Responsiveness could almost fall under the issue of validation, perhaps as a sub heading. Responsiveness needs to be defined, and certainly for the first uses of any instrument in a specific therapeutic area, it would seem prudent to address issues of responsiveness. But what is it? A statistically significant change over some period of time that is seen in an intervention group but not in a control group? However, it does not seem essential to dictate WHEN responsiveness has to be‘proved". If the sponsor wants to take the risk that is really up to them. It might be useful to state what happens when the instrument is not responsive, (i.e. the odds of a claim or an indication are not good.)

  • Responder 4
    Discovery in trial settings is perfectly acceptable. This may well be the case where a new HrQoL measure is being fielded, or where one measure within a battery lacks relevant prior exposure to the particular patient/treatment group. The risk in using an ‘unproven’ HrQoL is a matter for the trial designers.

  • Responder 6
    Good enough to be proven within a trial. If evaluated during the trial this is good enough. However, the investigators should realize they are taking risks here. A good option is to conduct small pilot studies of reliability prior to CT if no previously proven instruments are available. The danger of under powering a study due to non-responsive instruments is very high in my opinion in QoL assessments. Statistical advances, including IRT methods with hopefully increase the responsiveness of QoL instruments in the future.

Issue 12: Clinical significance

Issue 12a: How is clinical significance determined?

Summary: Clinical Significance

Determining the‘clinical significance" of a HRQL finding is fraught with theoretical and practical difficulties. Measures of‘minimally important difference" or‘minimally clinical important difference" seek to measure the smallest change in a domain score that patients perceive as beneficial. There are no commonly accepted or satisfactory methods for determining clinical significance.

Because FDA requires that labeling and promotional claims be based upon meaningful (i.e., clinically significant as well as statistical differences) it may be necessary to calibrate HRQL domain scales. However, there may be a variety of approaches that could be used to perform this validation. Experimentation and evaluation is needed. At this point, it is not desirable to dictate what type of evidence of clinical significance should be compiled.

  • Responder 1
    There is no commonly accepted or satisfactory practice today. Most commonly cited methods are calibration to (1) incremental changes on a single global health-related QoL question or (2) ratio of average change to a measure of variability. Both are flawed. The first because different global questions are used (ie, disease, overall health, overall quality of life, etc) and different increments are possible (ie, 5-point scale, 7-point scale, etc). Also it is not clear why one incremental change on a global question should be clinical significant or why a change of less than one increment should not be clinically significant. The second approach (effect size) is just a restatement of statistical significance. In practice, clinical significance of any endpoint is usually the result of familiarity with the measure and relationship of changes with that measure to other measures and especially outcomes.

  • Responder 2
    Although often used, I disagree with the use of the term‘clinical significance.’ It is imprecise and infers something that it may not (or need not) be. In my estimation, we should be talking about the incremental difference in an HRQoL scale score or value that has meaning to members of the target population. A term such as‘minimal important difference" more appropriately reflects the actual issue being addressed. Although they use the term‘minimal clinically important difference," Jaeschke and colleagues (Controlled Clinical Trials 1989;10:407-415) describe the concept as the smallest difference in a score in a domain of interest that patients perceive as beneficial. However, this is not the way the term‘clinical significance" is usually used. I am very concerned about the potential for an overemphasis on what is being termed‘clinical significance.’ There is no consensus in the literature as to what‘clinical significance" means or to whom the outcome should be‘clinically significant.’ There have been many approaches to its determination, including: mapping a change on an HRQoL scale to a life event; comparing HRQoL scale scores of samples or individuals to well population or disease population norms; relating a change in an HRQoL scale to future medical events or health expenditures.

    These and most other approaches suggest that a change in an HRQoL score must relate to something external, implying that HRQoL itself has no intrinsic meaning. Although our measurement of HRQoL can and must continue to improve, it must be recognized by regulators, clinicians, researchers, etc. that HRQoL is important in and of itself and that it may not be tied directly to clinical change.

  • Responder 3
    This is one of the biggest issues that could have major impact on how HRQOL is used in clinical trials. It is also an issue (or series of issues) where there has not been enough work done to set forth any requirements as to how clinical significance should be measured and interpreted. Effect size, statistical significance, 20% response rates, relationship with global measures, are all approaches that have been taken. They all have strengths and weaknesses. I don’t think any guidance can be too prescriptive here in terms of what one needs to do to demonstrate the meaningfulness of change. However, there is a great opportunity here to get some work done such that for commonly used instruments we might see some consistent estimates forthcoming across various methodological approaches. To the extent that the estimates are all in the same ballpark (whatever that means!), the science would really be advanced.

  • Responder 4
    Clinical significance (rather like beauty and happiness) is often in the eye of the beholder - just witness the debate between physicians arguing over whether an X-ray film shows evidence of an enlarged heart ! This is presumed to relate to the clinical significance of health status (or change in health status) as detected by an HrQoL measure. This needs to be established on a ‘within-study’ basis. Reference to pre-existing data on effect size ought to be avoided. If clinical significance is to be claimed for observed HrQoL change, then this needs to be on the basis of evidence that is collected concurrently.

  • Responder 6
    Effect sizes, using he overall treatment evaluation. Ideally, a study should be powered to detect clinical significance upfront. If statistical significance has been established without‘overpowering" the study it is fair enough to file a claim, especially since there are so many ways of looking into what a clinically significant finding is, and that applies to clinical parameters as well. Effect size compared to overall treatment evaluations. Hypotheses must be stated a priori along with the statistical approach for evaluating multiple endpoints.

Issue 12b: Should a trial be powered to detect clinical significance up-front?

  • Responder 1
    Yes if QoL is the primary endpoint. If clinical endpoint is primary, statistical significance on QoL should be adequate.

  • Responder 2
    If‘clinical significance" has been adequately established (which is rarely the case), then the trial should be powered for both clinical and statistical significance.

  • Responder 3
    This is an important issue as well, but also a moving target. However, most power calculations are based on something other than a statistical change, for example a certain number of mm Hg, a percentage change along a scale, so to expect such for HRQOL is not out of line. The bigger issue may be, how do you power a study when multiple measures are desired for labeling, clinical measures and HRQOL? It would not seem misleading or inappropriate to include both results, even if the study was powered for clinical change; if the hypotheses of measures moving in the same direction holds.

  • Responder 4
    This is a virtually impossible task.

  • Responder 6
    One would hope so. Under most circumstances, QoL studies will require larger sample sizes due to inherent subjective variability.

Issue 12c: If a difference is found to be statistically significant but it is not ‘clinically significant’ - can a claim be filed?

  • Responder 1
    QoL endpoints should be no difference from other endpoints. If a 'clinically significant' change is required for blood pressure, FEV1, cholesterol, bone density etc, then a clinically significant change should be required for QoL. Judgement is required for all endpoints, clinical and QoL, as to what is 'clinically significant'. Usually, there is less experience with QoL measures than the more commonly used clinical measures so there may be less information to on which to base a decision as to what constitutes a clinically significant change. There should be some effort to establish consensus as to what constitutes a clinically significant change including perhaps input from the developers, the trial investigators, the FDA, other experts and patients.

  • Responder 2
    Statistical significance is one of the fundamental underpinnings of the determination of quantitative differences. I do not believe that we should be unjustifiably or inappropriately demanding in HRQoL research. There are many accepted measures of clinical or biological variables for which minimally important differences are not known yet statistical significance is recognized as indicating difference. I think we should resist any temptation to require‘clinical significance" as the only valid determination of difference in HRQoL measurement. Not enough is known about determining clinically significant differences.

    However, if an increment of change or difference that is‘clinically significant" has been adequately determined, then clinical significance should trigger the claim not statistical significance. Nevertheless, until we know more about the determination of clinical significance and come to a consensus as to what it means, I am not sure we can have complete confidence in its measurement. In most cases, I would rather rely on the testing of statistical significance than on very subjective assessments of what‘clinical significance" is. The increase in the likelihood of a type II error may be unacceptable.

  • Responder 3
    The issue of statistical significance vs. clinical significance is also one that bears mention in a guidance. If the goal is clinical, that should be stated up front and all sponsors held to the same standard. If the goal is statistical , then the same should apply and the issue of clinical significance should go away for the Agency. There are other issues embedded here. Should you have to have both clinical AND statistical significance? If so, that must be stated in the guidance, although it sets a clear double standard, and I suspect you will see more ‘functional status measures" being used as clinical measures instead of HRQOL measures. If the studies demonstrate statistical significance but not clinical significance could they still report out the data in a promotional claim? That is a most interesting question, and should be addressed in the guidance. If a 10 point change is what is needed to be‘clinically meaningful" and the difference is 9.8 how would the Agency respond?

  • Responder 4
    Absolutely. However, attention should be drawn to the non-uniform relationship between clinical and HrQoL observations, as suggested in section 2.

  • Responder 6
    Yes, but the evaluation of the claim would be more difficult. Also, negative findings in under powered studies should not be allowed as a claim of equal QoL effect.

Issue 13: Statistical analysis

Issue 13 a: What is the role of summary scores across domains vs domain specific?

Summary: Summary or Domain-Specific Scores

Whether summary or domain scores are used is determined by the purpose of the study. Summary scores are often used in econometric studies. Domain-specific scores are often considered necessary to measure the multidimensionality of HRQL. As HRQL outcomes may be related to only certain domains in a HRQL instrument, it is important to identify in the analysis plan, which domains are of primary interest and how multidimensionality will be treated. Results from all summary scores and measured domains should be presented.

  • Responder 1
    Summary scores attempt to provide an overall measure of QoL. However as QoL is multidimensional and different individuals have different relative values for different dimensions of health, the overall score for most questionnaires has tended to be unsatisfactory even though desirable from a quantitative standpoint. Many therapies only affect one or two domains, so often it is desirable to have domain score(s) as the primary endpoint(s) with the additional qualification of having no demonstrated decrease on the other domains or total score (safety surrogates).

  • Responder 2
    Both summary scores and individual profile scores can be useful in clinical trials. It depends on the purpose of the measurement. If the trial is designed to assess the differential impact of two comparator products on specific HRQoL domains, then the domain specific information is critical. I believe that if profile measures are used that also provide summary measures, the individual domain scores must be reported. The summary measure or measures would be an adjunct to the more domain-specific information. If an a priori decision has been made in the design of the trial that an overall/summary measure of HRQoL is the preferred outcome, then a summary measure alone may be appropriate. However, that decision would have to be sufficiently justified.

  • Responder 3
    Statistical issues are important to address in a guidance in terms of providing some direction as to how HRQOL measures should be analyzed. Some of the sub-issues identified could be listed elsewhere; some could be resolved on a case-by case basis.Some instruments provide summary scores, but not all do, so it is difficult to give a thumbs up or down to their use across the board. Summary scores may present a means of addressing the statistical issues of multiple measures and as such could be a convenient way of‘opening the chest" for all domain scores. However, how a summary score is interpreted can also be an issue. They could hide information that might be useful to a practitioner, they could also hide information that might be relevant to a particular intervention. At a minimum it would appear that summary scores could play a useful role in data analysis, but they shouldn’t be required.

  • Responder 4
    Summary scores - or more precisely a single aggregate index - is a precondition to economic analysis. If it is intended that a cost-effectiveness or cost-utility analysis (they are NOT the same) is to be performed, then a single index is an absolute requirement. Descriptive analysis, or other forms of data manipulation can be carried out on individual domain totals.

  • Responder 6
    Summary scores tends to dilute the treatment response compared to domain scores. Quality of life should be analyzed suing ITT and PP analysis as well if quality of life is the primary end-point (needs to be specified tin the protocol). How to handle missing data should be specified in the protocol. Patient-values needs to be adjusted for mass significance. How many trials, needs to be considered in relation to scope and design of study. A large and well-designed study should be sufficient. The only reason to treat quality of life end-points differently is if there are problems relating to mass significance, but again that applies to clinical parameters as well. It may also be useful to use the baseline quality of life value as a covariate in the analysis rather than to correct for center differences. Equivalence claims should have the same basis as for clinical parameters. In general, the a priori hypothesis should involve the specific domains and not the summary score. The summary score is used to look at the internal consistency of the domains.

Issue 13b: Should QoL be analyzed with intention-to-treat and per-protocol, like all other endpoints?

Summary: Analyses

HRQL analyses should follow accepted statistical procedures for clinical trials. This applies to the analysis design, missing data, protocol violations, dropouts, and p values. All intended analyses should be specified in a data analysis plan. Any special analytic features should be identified pre hoc (e.g., some scales specify that missing data is to be handled in a way that may be different than routine clinical research practice).

  • Responder 1
    Yes.

  • Responder 2
    An intent-to-treat analysis may not be the best approach for HRQoL research. As often is the case, it depends on the disease and the treatment. For clinical trials in which there is a differential drop out rate due to progression of the disease (e.g., death) or negative effects of the treatment, that has to be taken into consideration.

  • Responder 3
    The study population to be analyzed for HRQOL, if any different from a traditional clinical population would need to be clarified in a guidance. The Agency needs to consider how similarly it wants to handle HRQOL relative to traditional clinical measures. This is also an area ripe for research: how do results vary if different populations are used? The problem with using different populations is if you are using clinical endpoints to validate you would want to use the same population.

  • Responder 4
    Yes.

  • Responder 6
    Yes.

Issue 13c: Should missing data be handled the same way it is handled in symptom scores?

  • Responder 1
    This should be included in the analysis plan. The analysis plan should address the whether a missing value for a single question may be imputed and how the imputed value should be derived and how many missing values can be allowed without invalidating the score. In addition, there needs to be discussion as to whether last score should be carried forward in the case of drop-outs or discontinuers. Caution may be needed in trials in which there is differential survival. If one therapy improves survival or duration in trial, the QoL scores may appear paradoxically worse as a healthier subset of patients will remain in arm with the poorest therapy. If informed censoring is expected, the analysis plan should specify up-front how to demonstrate and adjust for this effect.

  • Responder 2
    I do not know how symptom scores are handled. However, there has to be an a priori decision made regarding how missing data due to death or dropout will be handled in the trial. It is not sufficient to use only observed data. There are a number of ways of addressing the problem, including assigning the worst possible HRQoL scale values, using the last observed value, or imputing missing values at data collection points in the trial subsequent to drop out. As a sort of sensitivity analysis, a combination of two or more of these approaches could be used. However, whatever is done should be transparent and fully reported.

  • Responder 3
    If it is to be recommended that missing data be handled differently than is traditional with clinical data, that should be clarified in the guidance. Some instruments specify how to handle missing data, when you can extrapolate or interpolate and when you should not use because too many data points are missing. It would seem useful to follow the developers recommendations in this case.

  • Responder 4
    There is really no scope for handling missing data in HrQoL measurement. The practice ought to be discouraged. The ‘scoring’ of symptoms is a separate disaster area which should be avoided like the plague. Evidence of missing data rates ought to be recorded in the standard history file (see 10a above) and where this is known to be above a notional threshold, then this might constitute strong reasons to make alternative choices in study design. Ex post data manipulation in this field of investigation should be avoided at all costs.

  • Responder 6
    Not necessarily but I will not go into the specifics of the methods.

    Issue 13d: Is there any circumstance where a different p-value could be used (eg.10)?

  • Responder 1
    If the disease in question is relatively rare, it may not be possible to enrol enough patients to power at the 0.05 level. Alternatively, if QoL is a secondary endpoint and enrolment is limited by the primarily endpoint (requires lengthy trial, invasive measures), it may be not be possible to power at the usual 0.05 level. If there is a desire to evaluate a concomitant QoL benefit, a less rigorous standard could be considered.

  • Responder 2
    There may be investigators who, with supporting documentation, can make a case to use a less stringent p-value. I am hesitant to suggest that HRQoL data should be treated less stringently than other clinical trial data. However, I do think that in certain situations, committing a type II error could be more problematic than committing a type I error. The potential for committing a type II error is particularly enhanced when we adjust the alpha for multiple comparisons (e.g., Bonferroni correction).

  • Responder 3
    It may be better to phrase the issue as to what level of confidence should we have in the data? When can we stop looking at p values? If an overall F test is used, (MANOVA) to analyze the data, and you‘win" at that level, should you be required to live or die by p values after that? I have heard some statisticians recommend such an approach, but it would be better to get stats input here than from me!

  • Responder 4
    Absolutely. Worship at the altar of the .01 level is a convention that needs challenging anyway, but in the area of HrQoL investigation the whole concept of statistical significance has yet to be properly confronted. I have a paper dealing with this subject in preparation.

  • Responder 6
    Yes, as long as it is justified. In general though I support the reporting of actual p values as opposed to p<0.05 with a clear analysis of the clinical significance of the findings.

Issue 13e: How many clinical trials to support a label claim - one vs two?

  • Responder 1
    In most cases, two.

  • Responder 2
    A single, well-designed, adequately powered, controlled randomized trial should be sufficient.

  • Responder 3
    Number of trials needed to support a claim, if they will differ from the standard guidance should be specified. It is unlikely that one wouldn’t need to replicate results to make a HRQOL claim.

  • Responder 6
    One large or 2 small, depends on the situation. In general there is less risk associated with QoL claims (e.g. the risk of FDA approving a false QoL claim has no health consequences attached to it) So one study may be adequate.

Issue 13f: Why should QoL endpoints be statistically treated differently than clinical endpoint (i.e. multiplicity)?

  • Responder 1
    They shouldn't. The analysis plan should state up-front which domain(s) and / or total score are to be analyzed as the efficacy endpoint. The overall type 1 error rate should be adjusted when more than one score or more than one timepoint is analyzed as an efficacy endpoint.

  • Responder 2
    Due to the nature of the data, HRQoL endpoints may need to be treated differently. However, although the data is being treated differently, it doesn’t mean the statistical analysis is any less rigorous or valid. Appropriate adjustments will need to be made for multiple comparisons (i.e., alpha slippage).

  • Responder 3
    The issue of multiplicity has been addressed elsewhere. It is important in that it can have an impact on how studies are designed, how endpoints are prioritized and how/if studies would be done. I don’t think the agency is consistent across reviewing divisions in how they handle multiplicity. And sometimes there are good reasons as to why it might be ignored. However to create a level playing field, the issue should be considered. If it is stated that multiplicity must be addressed, how it should be handled will be a difficult process.

  • Responder 4
    HrQoL endpoints are likely to display substantial covariance. This may lead to over-estimation of treatment effects. This could be yet another reason for using a measure that yields a single index aggregate total score, or developing techniques for refining the multiple-domain instruments that are currently offered as profiles.

  • Responder 6
    Yes, there are techniques for doing this for example, the adjustment for multiple comparisons can be made using the Hochberg procedure. The procedure stops when the first significant result is obtained. The Hochberg procedure deems each of the six domains is significant if the p-values associated with the six comparisons are all less than or equal to 0.05. If the largest of the six p-values is greater than 0.05, the other five comparisons are deemed statistically significant if their associated p-values are all less than or equal to 0.025. If the largest of the five remaining p-values is greater than 0.025, the other four comparisons are deemed statistically significant if their associated p-values are all less than or equal to 0.0167. The procedure proceeds in this way until a set of comparisons are deemed statistically significant.

Issue 13g: Under what circumstances can a claim be filed for equivalence?

Summary: Equivalence Claims

Accepted statistical practice must be adapted to support claims of equivalence in HRQL outcomes. A demonstration of‘equivalence" requires the specification of a range of scores that are deemed‘not meaningfully different," sufficient power in the analysis, and a sensitive instrument. A more difficult issue is the degree to which equivalence must be demonstrated in multiple domains to substantiate an equivalence claim.

  • Responder 1
    There are really two different questions here: (1) no difference on a single (specified) domain and (2) no difference on any domain. The first can probably be addressed using current FDA guidelines for demonstrating equivalence. There may be a problem with choice of instrument as some questionnaires are likely to be more sensitive than others. Defending that one has chosen the most sensitive instrument may be difficult. Perhaps the strongest claim that could be made is that therapy A is equivalent to therapy B in terms of domain X (ie, social functioning) as measured by instrument C. In addition, it may be desirable to require equivalence in QoL in conjunction with equivalence on a clinical measure.

    The second question is much more complex than the first which wasn't so easy. As QoL is multidimensional, including many domains would make it very difficult to not show a statistically significant difference in at least one domain by chance alone. Even if all domains tested do not show any statistically (or clinically) significant differences, it could always be argued that there was a domain not tested in which the two therapies would show a difference. Again different questionnaires may have different sensitivities and it could also be argued that if a different questionnaire were chosen, the two therapies would not have been seen to be equivalent. Probably there is no way to say that therapy A and therapy B are equivalent in their effects on QoL. Statements would probably have to be limited to equivalency in specific domains as measured by specific instruments.Perhaps the FDA may wish to avoid statements of equivalency on QoL and assume all therapies in a class are equivalent unless demonstrated otherwise.

  • Responder 2
    In a situation where a controlled randomized trial has been conducted in which the HRQoL endpoints are specified, the HRQoL instrument/battery has been shown to be sensitive/responsive to change in those endpoints, and the trial was adequately powered for detecting change in those endpoints.

  • Responder 3
    This is also a key point and warrants discussion in a guidance. This could be a function of the study hypothesis and the answer (can you reject the null?) it is obviously also a statistical power issue that goes beyond HRQOL measures.

  • Responder 4
    As a minimum
    1. both studies must incorporate identical HrQoL measures
    2. performance parameters (response / completion / missing data rates etc) should be comparable
    3. study populations ought not to differ at baseline in terms of their HrQoL

  • Responder 6
    Again, if it is adequately powered to do so and the hypothesis is clearly stated.


back

Contact ISPOR @ info@ispor.org  |  View Legal Disclaimer
©2008 International Society for Pharmacoeconomics and Outcomes Research.
All rights reserved under International and Pan-American Copyright Conventions.
 
Website design by Eagle Systems USA, Inc.