|
Mira Pavlovic MD, Scientific Advice Unit, Agence Française de Securité Sanitaire des Produits de Santé, Direction
des Médicaments et des Produits Biologiques, Saint-Denis Cedex, France
Propensity Score Matching With Multi-Level Categories: An Application
The following was presented during from the Second Plenary Session, “Patient Reported Outcomes: A European Perspective,” at the ISPOR 10th Annual European Congress, 22 October 2007, Dublin, Ireland.
Introduction
Several working parties have been working on the behalf of the Committee of Human Medicinal Products (CHMP) at the European Medicines Agency (EMEA). The role of the Efficacy Working Party (EWP) is to elaborate guidelines for clinical development of medicinal products in different fields of medicine.
In some of the EMEA/EWP guidelines [1] patients
related outcomes (PRO) and health related quality
of life (HRQL) have been mentioned as a part
of drug development programme either as efficacy
variables (primary, secondary, supportive),
safety variables, or measures useful for benefit
risk assessment, but neither define HRQL nor
PRO in general nor recommend how to assess
HRQL/PRO claims in marketing authorisation
applications. The impression of both regulators
and pharmaceutical companies has been that
HRQL/PRO claims have been granted rarely and
on case by case basis.
For all these reasons the EMEA EWP has decided
to draft a specific reflection paper on HRQL/PRO
assessment in registration trials. The aim was to
define the place and to give recommendation for
the use and assessment of PRO, in particular
HRQL, in the drug evaluation process. In addition,
the definitions of PRO and HRQL frequently used
interchangeably both by sponsors and regulators
were also provided.
Patient-reported outcome (PR0) defines any outcome
evaluated directly by the patient himself
and based on patient's perception of a disease
and its treatment(s). The term PRO is proposed
as an umbrella term to cover simple patientassessed
measures (such as pain or itching),
multi-item, single concept measures (such as
Health Assessment Questionnaire, HAQ), multiitem,
multi-concept measures (such as the
Western Ontario and McMaster Universities
Osteoarthritis Index (WOMAC) or Activities of
Daily Living ADL)) as well as broad multidimensional
measures (such as HRQL). In addition,
PRO may also cover health status, adherence and
satisfaction with treatment.
For the purpose of drug development, the PRO
scope was considered to be too large. From one
side, simple patient-assessed measures are a
core symptoms of a disease assessed by patient
himself and well established primary and secondary
efficacy endpoints in registration trials, and
do not need any specific regulatory requirements.
From the other side, HRQL, a specific subset of
PROs, is a broad concept which can be defined
as the patient's subjective perception of the
impact of his disease and its treatment(s) on his
daily life, physical, psychological and social functioning
and well-being. The notion of multidimensionality
is a key component of definition of
HRQL. A single domain, e.g., physical functioning
or fatigue, is not considered as a HRQL (i.e. it
cannot be the basis for a claim for a global HRQL
improvement), even though it is a patient-reported.
In addition, HRQL should be clearly differentiated
from the core symptoms of a disease which
are, as stated, well-accepted primary and secondary
efficacy endpoints in registration trials.
Therefore, the scope of the reflection paper was
narrowed to include only HRQL as a specific
type/subset of PRO and to define the place of
HRQL in the context of drug approval and to give
some broad recommendation on its use in the
context of already existing guidance documents.
No specific recommendation has been given for
development and validation of HRQL measures in
clinical trials, as it is available elsewhere [2].
The reflection paper was adopted by the CHMP in
July 2005 and came into effect in January 2006
[3]. It stressed the notion of multidimensionality
as a key component of definition of HRQL.
Basic Recommendations
The basic recommendations are:
• Efficacy and safety of a medicinal product in
the given condition are the basis of approval, and
• HRQL claim always goes beyond efficacy and
safety assessments, is optional, and should be
supported by data collected by instruments validated
for use in the corresponding condition.
Both generic and disease specific questionnaires
may be used for a given condition. In practice, it
is very important to choose the questionnaire
which contains/is adapted to explore the domains
relevant for the disease and its treatment(s).
HRQL claims may be global and more specific. In
order to approve a global claim that a product
“improves HRQL”, it would be necessary to
demonstrate robust improvements in all or most
of these domains. Indeed, “HRQL improvement”
as a claim implies that the most important and
clinically relevant health-related domains of functioning
that impact patient's quality of life are
known and measured. In all cases a full disclosure
of complete results should be provided (section
5.1 of the SmPC).
Whatever the HRQL claim (specific or global),
changes observed in all HRQL domains should
always be specified.
Specific HRQL claim (product “improves physical
functioning”), based on the subset (one or two)
of domains of HRQL, is acceptable if:
• Whole HRQL instrument is adequately developed
and validated before the trial;
• Subset of domains of interest is pre-specified;
and
• Clinical relevance (on the subset of domains)
predefined and documented.
EMEA paper does not explicitly require that HRQL
instrument be validated to measure the subset of
domains independently from the other domains
prior to the trial. However, a sponsor needs to
document the change on the predefined domains
of interest and to provide information on the
amount of change (on the subset of domains)
that is considered to be clinically and not only
statistically relevant. In case of positive/relevant
results, a specific claim reflecting domain(s) with
improvement might be mentioned in the SmPC.
The claim in the SmPC with the respect to HRQL
(i.e. in section 5.1) will always be considered
depending on the strength of the evidence and the
relevance (pertinence and importance) of the
finding. The strength of the evidence should be
based on the rationale for HRQL assessment in the context of the disease/medicinal product, the
justification of the choice of the HRQL questionnaire(
s), the objectives of HRQL assessment and
the hypotheses of HRQL changes, the evidence
of validation (and of cultural adaptation/translation
if applicable) of the HRQL questionnaire(s),
the adequacy of the statistical analysis plan, and
the relevance of observed changes.
Study Design for HRQL
Assessment
The HRQL is considered to be an endpoint similar
to any other endpoint. Thus the study design
should not be different from any other randomised,
preferably double-blind, comparative
trials (placebo and/or active comparator controlled).
However, the study design might slightly
differ depending of the stage of development of a
medicinal product.
If a medicinal product has no marketing authorisation,
the HRQL (using a validated and appropriate
questionnaire) may be studied simultaneously
with the efficacy/safety of the medicinal product
in pivotal (phase III) trials. The HRQL may be
the part of a co-primary endpoint or it may be a
key secondary endpoint. Whatever the design,
the study should be powered for both endpoints.
If a medicinal product has obtained a marketing
authorisation (or if efficacy and safety of the test
drug have already been convincingly shown), the
HRQL may be assessed in an active comparator
trial as placebo comparison might not be feasible
any more; in addition to the HRQL endpoint, efficacy
endpoints should also be incorporated to
ensure that efficacy is sustained.
Study Duration
The HRQL is considered to be an endpoint similar
to any other endpoint. Thus the study design
should not be different from any other randomised,
preferably double-blind, comparative
trials (placebo and/or active comparator controlled).
However, the study design might slightly
differ depending of the stage of development of a
medicinal product.
Both in relapsing and remitting symptom-driven
conditions and in chronic stable conditions, longterm
trials (6 months or more) are recommended.
HRQL assessment during very short-term trials
(less than one month) are not currently
encouraged as it assesses more the improvement
of the daily living due to the effective treatment
in a given condition rather than the HRQL in
its multidimensionality.
Statistical Analysis Plan
The methodology for analysing HRQL data is
similar to the methodology used for any efficacy
trial, except that by its nature, HRQL assessment
(multi-items, multi-domains, repeated over time)
renders such issues as multiplicity and missing
data more problematic. The following point
should be specifically defined:
1. Timing and number of HRQL assessments.
2. Sample size/power of the study based on
expected difference between groups : definition
of the Minimal Important Difference (MID), definition
of a responder (minimal important of change
within a patient).
Indeed, the relevance of HRQL changes should
always be justified by the sponsor. This relevance
should have been defined a priori in the protocol,
as it constitutes the basis for generating hypotheses.
The minimal important difference (MID) may
be used when powering the studies. It should be
kept in mind however that the determination of
MID should be upon a combination of statistical
reasoning and clinical judgment and none of
them on its own is sufficient.
One approach for controlling multiplicity across
different endpoints is a hierarchical testing. The
most important (efficacy) endpoint is tested first;
if it is significant, then the second endpoint
(HRQL) can be tested. If the first endpoint is not
significant, then no further testing is undertaken.
The number of patients, necessary to support the
change in the primary endpoint, is frequently sufficient
to test for the HRQL change. In some situations
the trial is even overpowered and results in
significant but not relevant and very small differences
in HRQL scores. Therefore, every effort
should be made to ensure that the sample size
calculated for the primary endpoint is adequate
for demonstrating hypotheses made a priori on
the HRQL assessment.
The approach to overcome multiplicity will also
depend on the number of domain scores. As
stated earlier is may be of interest to pre-specify
a subset of HRQL domains which will be the
basis for a specific claim. Other methods may
include correction of p-values, hierarchical testing
among multidomain scores (if the comparison
of the domain score considered as the most
important is significant, then the second domain
is tested) or global test procedures. To report
only a global score across domains, although it
may reduce the number of tests, is not considered
adequate as it will reduce the information on
HRQL multidimensionality and may mask or
overestimate HRQL treatment differences in
important domains. The method for handling
multiplicity should be stated a priori in the statistical
analysis plan.
HRQL Assessment in Cancer:
Design of Chemotherapy Trials
(Taxotere) [4]
As already stated, the HRQL assessment has primarily
an interest in chronic diseases, both in
non-life threatening and in severe life threatening
diseases such as cancer. A comparison of two
chemotherapy regimens, where one confers better
HRQL than the other, is, at least in theory,
important information for patient management.
The EMEA reflection paper recommends comparative
randomised trials (a face to face comparison
or an add-on design) with the overall survival
rate or the progression-free survival (depending
on cancer) as the primary endpoint and the HRQL
as a co-primary or a secondary endpoint.
However, for gaining a HRQL claim, the HRQL
benefit must be achieved without any reduction in
treatment efficacy (e.g. through reduced
doses/toxicity). Open-label studies are not recommended
although achieving double-blind may
be difficult in some chemotherapy trials.
Therefore, at least patients should be masked for
the treatment assignment. A particular attention
should be paid to the management of missing
data due to patient deterioration and death. In
summary, there are several problems to take into
account in the analysis of results: lack of investigators
commitment, cumbersome instruments,
duration of trial to capture a change of HRQL
(independent of survival benefit), handling of
missing data, and, most importantly, clinical significance
of apparent HRQL differences.
However, in situations where two treatment
regimes show no significant survival differences,
an improvement of HRQL becomes an important
element of decision making. Taxotere (docetaxel)
marketing authorisation file is an excellent example.
QoL assessment was done from the following
indications:
• Breast cancer: in combination with doxorubicin
(1st line);
• Non small cell lung cancer: in combination
with platinum agents (1st line);
• Hormone-refractory prostate cancer in combination
with prednisone;
• Gastric adenocarcinoma in combination with
cisplatin and 5-FU (1st line); and
• Head and neck squamous cell carcinoma: in
combination with cisplatin and 5-FU (1st line).
Studies were open-label, long-term, parallel
group comparative randomised trials, with overall
survival as a primary endpoint and HRQL as a
secondary endpoint. The HRQL was assessed by
using validated instruments: FACT-P for prostate
cancer, EORTC QLQ-C30 (global QoL domain)
and QLQ-HN35 (4 modules) for head and neck
squamous cell carcinoma, and LCSS, and EQ5D
for non small cell lung cancer (TAX 326 study).
The TAX 326 study compared two drug combinations
of docetaxel plus cisplatin or carboplatin to
vinorelbine plus cisplatin in chemotherapy-naïve
patients with unresectable or metastatic nonsmall
cell lung cancer. Even if there was no difference
in overall survival and time to progression
between docetaxel + carboplatin and vinorelbin + cisplatin treatment arms, patients on docetaxel + carboplatin had better
HRQL, (both on LCSS and EQ5D scales), as well as better Karnofsky status
and less weight loss than patients treated with vinorelbin + cisplatin.
The HRQL improvement-related claim for docetaxel + carboplatin treatment
arm was mentioned in the section 5.1 of Taxotere SmPC, even if no detail of
HRQL results could be found neither in the EMEA reports on docetaxel, in its
EPAR or in its SmPC. The HRQL improvement was also mentioned for all
other indications even in case of non-conclusive results:
1. “QoL measured by the EORT questionnaire was comparable and stable
during treatment and follow-up” (breast cancer)
2. “QoL results consistently indicated improvement in favour of … arm”
(gastric adenocarcinoma)
3. “No statistical differences were observed between treatment groups for
global QoL” (prostate cancer)
4. “Patients treated with… experienced significantly less deterioration of their
Global health score assessed with EORTC QLQ-C30 scale” (head and neck
squamous cell carcinoma)
Other Examples
Another example of the HRQL assessment in oncology was given from the
published literature: doxorubicin and paclitaxel regimen was compared to
doxorubicin and cyclophosphamide as the first-line chemotherapy in patients
with metastatic breast cancer [5]. This was a short-term (3-month), openlabel,
randomised controlled trial with the overall survival as the primary endpoint
and validated HRQL questionnaires (EORTC QLQ-C30, QLQ-BR23
Breast module, disease specific) as secondary endpoints. Five scales from
these two questionnaires were pre-selected for primary analysis and the
expected minimally important difference (MID) was pre-defined. The sample
size was based on the primary endpoint, but the study was also powered to
detect 10-point shift on the overall HRQL scale. All HRQL results were displayed,
both for pre-specified and non pre-specified HRQL domains.
If we compare the conduct of this trial to the recommendations given by regulatory
authorities, there are several remarks which may be formulated: the
full disclosure of results is valuable, as well as the power of the study based
on both overall survival and 10-point improvement in the HRQL scale. The
open-label character of the trial is understood, however, it was not clear of
patients were blinded or not for treatment assignment. In addition, it was not
clear if the five predefined scales were previously validated to be used separately.
Finally, the short trial duration (3 months) was questioned, as the
observed HRQL changes may have reflected more efficacy/safety of treatments
than their impact on HRQL long-term.
The last example is coming from a recent marketing authorisation file of an
epoetin similar to an already approved reference product (biosimilar epoetin).
These biological medicinal products, administered to improve anaemia in
patients with renal insufficiency and cancer, are also believed to improve the
HRQL. The file was analysed to see if and how the HRQL was assessed. The
pivotal trial was a randomised, double-blind 6-month equivalence trial in
haemodialysis patients to compare the new epoetin to a reference product for
the maintenance treatment of anaemia. The primary endpoint was haemoglobin-
based; secondary endpoints were: “QoL assessment” of “energy level”,
“ability to work” and “overall QoL”. However, all these items were assessed
by a simple question: “Rate your energy level/ability to work/overall QoL during
the past week” by using a linear analogue self assessment scales. There
was no real HRQL assessment in any of epoetin files and in any of the granted
indications.
Conclusion
With the recent release of the EMEA reflection paper on HRQL and of the FDA
guidance on patient-reported outcomes (PRO), these patient-based measures
have gained acknowledgment of their value in the drug evaluation
process. The two papers issued by regulatory authorities set the principles of
the assessment of PRO and HRQL in clinical trials and the requirements for
gaining a specific claim based on these measures. However, it might be still
too early to know whether these recommendations have been put into practice
by sponsors from one side and whether they have already had an impact
on the assessment of medicinal products and PRO/HRQL-related claims by
regulators. The given examples show rather disparate picture. The review of
new trials and marketing authorisation requests will test whether the EMEA
reflection paper on HRQL is a useful tool for sponsors for gaining HRQL
claims in the drug development process.
References
1. Szende A, Leidy NK, Revicki D. Value Health 2005;8:534-548.
2. FDA Guidance for Industry. Patient-Reported Outcome Measures: Use in Medical Product
Development to Support Labelling Claims. 2006
3. CHMP reflection paper on the regulatory guidance for the use of health-related quality of life
(HRQL) measures in the evaluation of medicinal products. EMEA/CHMP/EWP/139391/2004
4. EPAR Taxotere.
5. Bottomley A, Biganzoli L, Cufer T et al. J Clin Oncol 2004;22:2576-2586.
|