Join ISPOR  | Sign up for mailing List  | Follow Us: LinkedIn Twitter Facebook YouTube

The Official News & Technical Journal Of The International Society For Pharmacoeconomics And Outcomes Research


Mira Pavlovic MD, Scientific Advice Unit, Agence Française de Securité Sanitaire des Produits de Santé, Direction des Médicaments et des Produits Biologiques, Saint-Denis Cedex, France

Propensity Score Matching With Multi-Level Categories: An Application

The following was presented during from the Second Plenary Session, “Patient Reported Outcomes: A European Perspective,” at the ISPOR 10th Annual European Congress, 22 October 2007, Dublin, Ireland.

Several working parties have been working on the behalf of the Committee of Human Medicinal Products (CHMP) at the European Medicines Agency (EMEA). The role of the Efficacy Working Party (EWP) is to elaborate guidelines for clinical development of medicinal products in different fields of medicine.

In some of the EMEA/EWP guidelines [1] patients related outcomes (PRO) and health related quality of life (HRQL) have been mentioned as a part of drug development programme either as efficacy variables (primary, secondary, supportive), safety variables, or measures useful for benefit risk assessment, but neither define HRQL nor PRO in general nor recommend how to assess HRQL/PRO claims in marketing authorisation applications. The impression of both regulators and pharmaceutical companies has been that HRQL/PRO claims have been granted rarely and on case by case basis.

For all these reasons the EMEA EWP has decided to draft a specific reflection paper on HRQL/PRO assessment in registration trials. The aim was to define the place and to give recommendation for the use and assessment of PRO, in particular HRQL, in the drug evaluation process. In addition, the definitions of PRO and HRQL frequently used interchangeably both by sponsors and regulators were also provided.

Patient-reported outcome (PR0) defines any outcome evaluated directly by the patient himself and based on patient's perception of a disease and its treatment(s). The term PRO is proposed as an umbrella term to cover simple patientassessed measures (such as pain or itching), multi-item, single concept measures (such as Health Assessment Questionnaire, HAQ), multiitem, multi-concept measures (such as the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) or Activities of Daily Living ADL)) as well as broad multidimensional measures (such as HRQL). In addition, PRO may also cover health status, adherence and satisfaction with treatment.

For the purpose of drug development, the PRO scope was considered to be too large. From one side, simple patient-assessed measures are a core symptoms of a disease assessed by patient himself and well established primary and secondary efficacy endpoints in registration trials, and do not need any specific regulatory requirements. From the other side, HRQL, a specific subset of PROs, is a broad concept which can be defined as the patient's subjective perception of the impact of his disease and its treatment(s) on his daily life, physical, psychological and social functioning and well-being. The notion of multidimensionality is a key component of definition of HRQL. A single domain, e.g., physical functioning or fatigue, is not considered as a HRQL (i.e. it cannot be the basis for a claim for a global HRQL improvement), even though it is a patient-reported. In addition, HRQL should be clearly differentiated from the core symptoms of a disease which are, as stated, well-accepted primary and secondary efficacy endpoints in registration trials.

Therefore, the scope of the reflection paper was narrowed to include only HRQL as a specific type/subset of PRO and to define the place of HRQL in the context of drug approval and to give some broad recommendation on its use in the context of already existing guidance documents. No specific recommendation has been given for development and validation of HRQL measures in clinical trials, as it is available elsewhere [2].

The reflection paper was adopted by the CHMP in July 2005 and came into effect in January 2006 [3]. It stressed the notion of multidimensionality as a key component of definition of HRQL.

Basic Recommendations
The basic recommendations are:

• Efficacy and safety of a medicinal product in the given condition are the basis of approval, and
• HRQL claim always goes beyond efficacy and safety assessments, is optional, and should be supported by data collected by instruments validated for use in the corresponding condition. Both generic and disease specific questionnaires may be used for a given condition. In practice, it is very important to choose the questionnaire which contains/is adapted to explore the domains relevant for the disease and its treatment(s).

HRQL claims may be global and more specific. In order to approve a global claim that a product “improves HRQL”, it would be necessary to demonstrate robust improvements in all or most of these domains. Indeed, “HRQL improvement” as a claim implies that the most important and clinically relevant health-related domains of functioning that impact patient's quality of life are known and measured. In all cases a full disclosure of complete results should be provided (section 5.1 of the SmPC).

Whatever the HRQL claim (specific or global), changes observed in all HRQL domains should always be specified.

Specific HRQL claim (product “improves physical functioning”), based on the subset (one or two) of domains of HRQL, is acceptable if:

• Whole HRQL instrument is adequately developed and validated before the trial;
• Subset of domains of interest is pre-specified; and
• Clinical relevance (on the subset of domains) predefined and documented.

EMEA paper does not explicitly require that HRQL instrument be validated to measure the subset of domains independently from the other domains prior to the trial. However, a sponsor needs to document the change on the predefined domains of interest and to provide information on the amount of change (on the subset of domains) that is considered to be clinically and not only statistically relevant. In case of positive/relevant results, a specific claim reflecting domain(s) with improvement might be mentioned in the SmPC.

The claim in the SmPC with the respect to HRQL (i.e. in section 5.1) will always be considered depending on the strength of the evidence and the relevance (pertinence and importance) of the finding. The strength of the evidence should be based on the rationale for HRQL assessment in the context of the disease/medicinal product, the justification of the choice of the HRQL questionnaire( s), the objectives of HRQL assessment and the hypotheses of HRQL changes, the evidence of validation (and of cultural adaptation/translation if applicable) of the HRQL questionnaire(s), the adequacy of the statistical analysis plan, and the relevance of observed changes.

Study Design for HRQL Assessment
The HRQL is considered to be an endpoint similar to any other endpoint. Thus the study design should not be different from any other randomised, preferably double-blind, comparative trials (placebo and/or active comparator controlled). However, the study design might slightly differ depending of the stage of development of a medicinal product.

If a medicinal product has no marketing authorisation, the HRQL (using a validated and appropriate questionnaire) may be studied simultaneously with the efficacy/safety of the medicinal product in pivotal (phase III) trials. The HRQL may be the part of a co-primary endpoint or it may be a key secondary endpoint. Whatever the design, the study should be powered for both endpoints.

If a medicinal product has obtained a marketing authorisation (or if efficacy and safety of the test drug have already been convincingly shown), the HRQL may be assessed in an active comparator trial as placebo comparison might not be feasible any more; in addition to the HRQL endpoint, efficacy endpoints should also be incorporated to ensure that efficacy is sustained.

Study Duration
The HRQL is considered to be an endpoint similar to any other endpoint. Thus the study design should not be different from any other randomised, preferably double-blind, comparative trials (placebo and/or active comparator controlled). However, the study design might slightly differ depending of the stage of development of a medicinal product.

Both in relapsing and remitting symptom-driven conditions and in chronic stable conditions, longterm trials (6 months or more) are recommended. HRQL assessment during very short-term trials (less than one month) are not currently encouraged as it assesses more the improvement of the daily living due to the effective treatment in a given condition rather than the HRQL in its multidimensionality.

Statistical Analysis Plan
The methodology for analysing HRQL data is similar to the methodology used for any efficacy trial, except that by its nature, HRQL assessment (multi-items, multi-domains, repeated over time) renders such issues as multiplicity and missing data more problematic. The following point should be specifically defined:

1. Timing and number of HRQL assessments.
2. Sample size/power of the study based on expected difference between groups : definition of the Minimal Important Difference (MID), definition of a responder (minimal important of change within a patient).

Indeed, the relevance of HRQL changes should always be justified by the sponsor. This relevance should have been defined a priori in the protocol, as it constitutes the basis for generating hypotheses. The minimal important difference (MID) may be used when powering the studies. It should be kept in mind however that the determination of MID should be upon a combination of statistical reasoning and clinical judgment and none of them on its own is sufficient.

One approach for controlling multiplicity across different endpoints is a hierarchical testing. The most important (efficacy) endpoint is tested first; if it is significant, then the second endpoint (HRQL) can be tested. If the first endpoint is not significant, then no further testing is undertaken. The number of patients, necessary to support the change in the primary endpoint, is frequently sufficient to test for the HRQL change. In some situations the trial is even overpowered and results in significant but not relevant and very small differences in HRQL scores. Therefore, every effort should be made to ensure that the sample size calculated for the primary endpoint is adequate for demonstrating hypotheses made a priori on the HRQL assessment.

The approach to overcome multiplicity will also depend on the number of domain scores. As stated earlier is may be of interest to pre-specify a subset of HRQL domains which will be the basis for a specific claim. Other methods may include correction of p-values, hierarchical testing among multidomain scores (if the comparison of the domain score considered as the most important is significant, then the second domain is tested) or global test procedures. To report only a global score across domains, although it may reduce the number of tests, is not considered adequate as it will reduce the information on HRQL multidimensionality and may mask or overestimate HRQL treatment differences in important domains. The method for handling multiplicity should be stated a priori in the statistical analysis plan.

HRQL Assessment in Cancer: Design of Chemotherapy Trials (Taxotere) [4]
As already stated, the HRQL assessment has primarily an interest in chronic diseases, both in non-life threatening and in severe life threatening diseases such as cancer. A comparison of two chemotherapy regimens, where one confers better HRQL than the other, is, at least in theory, important information for patient management. The EMEA reflection paper recommends comparative randomised trials (a face to face comparison or an add-on design) with the overall survival rate or the progression-free survival (depending on cancer) as the primary endpoint and the HRQL as a co-primary or a secondary endpoint. However, for gaining a HRQL claim, the HRQL benefit must be achieved without any reduction in treatment efficacy (e.g. through reduced doses/toxicity). Open-label studies are not recommended although achieving double-blind may be difficult in some chemotherapy trials. Therefore, at least patients should be masked for the treatment assignment. A particular attention should be paid to the management of missing data due to patient deterioration and death. In summary, there are several problems to take into account in the analysis of results: lack of investigators commitment, cumbersome instruments, duration of trial to capture a change of HRQL (independent of survival benefit), handling of missing data, and, most importantly, clinical significance of apparent HRQL differences.

However, in situations where two treatment regimes show no significant survival differences, an improvement of HRQL becomes an important element of decision making. Taxotere (docetaxel) marketing authorisation file is an excellent example. QoL assessment was done from the following indications: • Breast cancer: in combination with doxorubicin (1st line); • Non small cell lung cancer: in combination with platinum agents (1st line); • Hormone-refractory prostate cancer in combination with prednisone; • Gastric adenocarcinoma in combination with cisplatin and 5-FU (1st line); and • Head and neck squamous cell carcinoma: in combination with cisplatin and 5-FU (1st line).

Studies were open-label, long-term, parallel group comparative randomised trials, with overall survival as a primary endpoint and HRQL as a secondary endpoint. The HRQL was assessed by using validated instruments: FACT-P for prostate cancer, EORTC QLQ-C30 (global QoL domain) and QLQ-HN35 (4 modules) for head and neck squamous cell carcinoma, and LCSS, and EQ5D for non small cell lung cancer (TAX 326 study). The TAX 326 study compared two drug combinations of docetaxel plus cisplatin or carboplatin to vinorelbine plus cisplatin in chemotherapy-naïve patients with unresectable or metastatic nonsmall cell lung cancer. Even if there was no difference in overall survival and time to progression between docetaxel + carboplatin and vinorelbin + cisplatin treatment arms, patients on docetaxel + carboplatin had better HRQL, (both on LCSS and EQ5D scales), as well as better Karnofsky status and less weight loss than patients treated with vinorelbin + cisplatin.

The HRQL improvement-related claim for docetaxel + carboplatin treatment arm was mentioned in the section 5.1 of Taxotere SmPC, even if no detail of HRQL results could be found neither in the EMEA reports on docetaxel, in its EPAR or in its SmPC. The HRQL improvement was also mentioned for all other indications even in case of non-conclusive results:

1. “QoL measured by the EORT questionnaire was comparable and stable during treatment and follow-up” (breast cancer) 2. “QoL results consistently indicated improvement in favour of … arm” (gastric adenocarcinoma) 3. “No statistical differences were observed between treatment groups for global QoL” (prostate cancer) 4. “Patients treated with… experienced significantly less deterioration of their Global health score assessed with EORTC QLQ-C30 scale” (head and neck squamous cell carcinoma)

Other Examples
Another example of the HRQL assessment in oncology was given from the published literature: doxorubicin and paclitaxel regimen was compared to doxorubicin and cyclophosphamide as the first-line chemotherapy in patients with metastatic breast cancer [5]. This was a short-term (3-month), openlabel, randomised controlled trial with the overall survival as the primary endpoint and validated HRQL questionnaires (EORTC QLQ-C30, QLQ-BR23 Breast module, disease specific) as secondary endpoints. Five scales from these two questionnaires were pre-selected for primary analysis and the expected minimally important difference (MID) was pre-defined. The sample size was based on the primary endpoint, but the study was also powered to detect 10-point shift on the overall HRQL scale. All HRQL results were displayed, both for pre-specified and non pre-specified HRQL domains.

If we compare the conduct of this trial to the recommendations given by regulatory authorities, there are several remarks which may be formulated: the full disclosure of results is valuable, as well as the power of the study based on both overall survival and 10-point improvement in the HRQL scale. The open-label character of the trial is understood, however, it was not clear of patients were blinded or not for treatment assignment. In addition, it was not clear if the five predefined scales were previously validated to be used separately. Finally, the short trial duration (3 months) was questioned, as the observed HRQL changes may have reflected more efficacy/safety of treatments than their impact on HRQL long-term.

The last example is coming from a recent marketing authorisation file of an epoetin similar to an already approved reference product (biosimilar epoetin). These biological medicinal products, administered to improve anaemia in patients with renal insufficiency and cancer, are also believed to improve the HRQL. The file was analysed to see if and how the HRQL was assessed. The pivotal trial was a randomised, double-blind 6-month equivalence trial in haemodialysis patients to compare the new epoetin to a reference product for the maintenance treatment of anaemia. The primary endpoint was haemoglobin- based; secondary endpoints were: “QoL assessment” of “energy level”, “ability to work” and “overall QoL”. However, all these items were assessed by a simple question: “Rate your energy level/ability to work/overall QoL during the past week” by using a linear analogue self assessment scales. There was no real HRQL assessment in any of epoetin files and in any of the granted indications.

With the recent release of the EMEA reflection paper on HRQL and of the FDA guidance on patient-reported outcomes (PRO), these patient-based measures have gained acknowledgment of their value in the drug evaluation process. The two papers issued by regulatory authorities set the principles of the assessment of PRO and HRQL in clinical trials and the requirements for gaining a specific claim based on these measures. However, it might be still too early to know whether these recommendations have been put into practice by sponsors from one side and whether they have already had an impact on the assessment of medicinal products and PRO/HRQL-related claims by regulators. The given examples show rather disparate picture. The review of new trials and marketing authorisation requests will test whether the EMEA reflection paper on HRQL is a useful tool for sponsors for gaining HRQL claims in the drug development process.

1. Szende A, Leidy NK, Revicki D. Value Health 2005;8:534-548.
2. FDA Guidance for Industry. Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labelling Claims. 2006
3. CHMP reflection paper on the regulatory guidance for the use of health-related quality of life (HRQL) measures in the evaluation of medicinal products. EMEA/CHMP/EWP/139391/2004
4. EPAR Taxotere.
5. Bottomley A, Biganzoli L, Cufer T et al. J Clin Oncol 2004;22:2576-2586.


  Issues Index | 2008 Issues Index  

Contact ISPOR @  |  View Legal Disclaimer
©2016 International Society for Pharmacoeconomics and Outcomes Research. All rights reserved under International and Pan-American Copyright Conventions. 
Website design by Eagle Systems USA, Inc.