vos-headline-type-email-header-062620
HEOR Articles

Fit-for-Purpose Real-World Data: An Integral Component of Evidence Planning

 

 
Dana Stafkey, PharmD, PhD; Tasneem Lokhandwala, PhD, Cencora, Conshohocken, PA, USA; Sumeet Bakshi, MBBS, MBA, Cencora, London, United Kingdom

 

 

Introduction

Integrated evidence planning in pharmaceutical/biotechnology companies is a holistic plan developed and implemented to ensure that evidence generation is aligned with regulatory, clinical, and commercial objectives throughout the product lifecycle.1 In recent years, real-world evidence (RWE) has emerged as a crucial element in integrated evidence planning. Regulatory bodies are increasingly accepting findings from RWE studies to support the safety, efficacy, and value of pharmaceutical/biotech products beyond traditional clinical trials.2-4 And evidence required by downstream stakeholders such as payers, policy makers, healthcare providers, and patients is being planned and sometimes even generated at early development stages.

At the same time, evolution of the healthcare and technology landscape has expanded the diverse and novel real-world data (RWD) sources, from claims (open and closed), electronic health records (EHRs) (structured and unstructured), registries to linked data from wearable devices, mobile health apps, genomic data, social determinants of health, and patient-reported outcomes collected via digital platforms. An example of evolution with respect to EHRs is increased digitization of patient records as well as global adoption of Fast Healthcare Interoperability Resources standards allowing healthcare data to be shared across health systems to enable automated decision support and other machine-based processing.2 While emerging data sources offer richer, more granular insights into patient health and treatment effects, they also introduce significant complexity in assessing fit-for-purpose RWD that are reliable, relevant, and robust enough to answer specific research or regulatory questions. This paper evaluates key RWD sources and explores critical factors involved in identifying the appropriate RWD source to generate RWE.

 

 

 

Regulatory bodies are increasingly accepting findings from RWE studies to support the safety, efficacy, and value of pharmaceutical/biotech products beyond traditional clinical trials.

Sources of RWD

Health insurance claims

Insurance claims data have long been a cornerstone of RWE generation in the pharmaceutical and biotech industry. There are 2 types of claims data: closed and open.

Closed Claims

Closed claims are sourced directly from payers or health plans. Open claims are sourced from practice management systems, clearinghouses, or other information systems. Closed claims capture all healthcare interactions for a patient reimbursed by a specific payer. Claims derived from a closed source are fully adjudicated; they contain limited to no duplicate claims and costs represent the final approved amount. However, the adjudication process results in a lag of approximately 3 to 6 months from submission to data acquisition. These data allow patients to be followed longitudinally for as long as they remain eligible for insurance. If a patient changes insurers, they will be lost to follow-up. This is common in the US healthcare system, where insurance is often provided by employers. The average duration of follow-up for patients in the United States is approximately 3 years.

Closed claims are beneficial when the research objective requires a patient to be evaluated over time and across healthcare settings, including outcomes such as treatment patterns, adherence, persistence, healthcare utilization, and costs. However, closed claims may have limited longitudinal follow-up due to shorter periods of enrollment with an insurer and studies requiring longer time tracking of >3 years (including identification, index/baseline period, and follow-up periods) will face issues of reduced sample sizes with increasing study time. Data lags in closed claims also mean that more recent events are not reflected in the data and data extraction for a study will have to be timed accordingly. The data may also be biased towards enrollment characteristics of included health plans and, therefore, an assessment of representativeness and generalizability for a specific use case may be required.

Open Claims

Open claims capture only those claims that are processed through the specified practice management system, which may be limited to a subset of providers or practices. Open claims are nonadjudicated and are pending processing and review or payment. These claims may contain duplicates, and final payments may be missing or misrepresented. However, open claims are available 1 to 2 days after submission. 

Due to very short time lags, open claims are valuable for evaluating early access market performance and when coverage across multiple payers is warranted. Open claims databases are also significantly larger and more nationally representative. Although caution must be used since open claims are nonadjudicated and advanced analytical expertise may be required to obtain valid insights.

It is important to note that both open and closed claims do not contain detailed clinical information, including but not limited to lab results and reasons for treatment discontinuation. In addition, utilization of over-the-counter medications and other healthcare services provided but not processed through the insurance will not be captured. Thus, claims data may not be appropriate for addressing clinical research questions.

Not all RWD are created equally. Selecting the right data source is a critical component of integrated evidence planning.

Electronic Health Records

Clinical research questions may best be addressed using EHRs. EHRs contain detailed clinical information recorded by healthcare providers, such as diagnoses, medications, lab results, vital signs, and clinical notes. Data in an EHR may be structured or unstructured.

Structured Data

Structured data refers to information that is predefined, organized, and stored in fixed fields, including diagnosis codes, procedure codes, medication lists, laboratory values, vital signs, and demographics.

Structured data from EHRs are readily available and valuable for studies where healthcare provider-reported clinical variables are required. Patients can be followed longitudinally within the practice and outcomes such as disease progression can be assessed. However, patients receiving care from different providers and practices, not included in the EHRs, may lead to fragmentation of data. Harmonization of data across different EHR vendors may be challenging and quality and completeness may vary across vendors. It is important to evaluate the variables available and completeness of each of the variables before selecting the appropriate EHR to meet the research objectives.

Unstructured Data

Unstructured data (eg, physician notes, progress reports, and radiology reports) refers to free-text data which are not stored in predefined fields and requires natural language processing or manual review to extract insights.

Unstructured EHR data are valuable when additional clinical data are needed to supplement the data available in the structured EHR.

Specificity and sensitivity of methods used to extract clinical variables from unstructured data may vary. Manual extract by a single data abstractor allows for consistency and continuity but is very time-consuming. The emergence of front-end analytical platforms is allowing for more timely abstraction and analytical flexibility; however, data quality may be compromised in favor of simplicity and efficiency.

 

Patient registries

Registries are organized systems that collect data on patients with a specific disease, condition, or treatment exposure over time. These may be industry-sponsored, academic, or provider-led. Registries often provide the richest clinical data for a specific population of interest. Data may include detailed clinical assessment, biomarkers, labs, patient-reported outcomes, and long-term follow-up.

These data are best used when evaluating a complex or rare disease that requires more detailed clinical information than provided within an EHR, when the population in an EHR is too small to evaluate, or when the follow-up period in an EHR is too short to evaluate the outcomes of interest. Recruitment techniques, coverage, and minimal required datasets may determine the representativeness, generalizability, and completeness of a registry. Minimal required datasets, where applicable, usually mandate diagnosis and key demographic information and may be sufficient for incidence/prevalence analyses but outcomes and follow-up data may suffer large lags and/or missingness. In many cases, registries are managed by academia, and data access may require specific governance criteria or collaborative academic arrangements.

 

Other data sources

In addition to claims, EHRs, and registries, there is a growing list of data sources available for research that could be used independently or merged with other available data sources to provide a more holistic view of patients and/or the healthcare system. These include but are not limited to genomic/biomarker data, patient-generated data (including wearables, mobile apps, surveys, and patient-reported outcome measures), and social determinants of health.

Genomic/biomarker data are often obtained through genomic and genetic testing and allow researchers to evaluate genetic markers that may influence disease progression and/or treatment effectiveness. Patient data from wearables and mobile apps generate large-scale continuous data generation across various measures. The measures vary significantly by app and the frequency of measurements varies across users. Surveys and patient-reported outcomes are data collected directly from patients on their health status, preferences, and satisfaction. These measures may come from both validated and unvalidated instruments and are subject to a patient’s recall ability. Finally, social determinants of health data (eg, race/ethnicity, income, access to transport) are increasingly being used to understand disparities in access to care and downstream outcomes. Researchers should be aware of and address the limitations of using such data at an aggregate- (eg, county-level) versus patient-level.

 

Table. Data Characteristics and Use Cases

 

Determining if the RWD are fit-for-purpose?

Not all RWD are created equally. Selecting the right data source is a critical component of integrated evidence planning. To generate credible, actionable, and regulatory-grade RWE, it is important to ensure that the RWD are fit-for-purpose. Understanding the differences between data types will help researchers determine the appropriateness of the data to address their research question (see Table). Further guidance can be obtained from multiple regulatory and scientific organizations that have released frameworks and guidance documents to help stakeholders assess the fitness of RWD.3-6 In many cases, a more detailed and targeted fit-for-purpose assessment may have to be conducted by independent experts to match a research question with the best available data source.

 

References

  1. Lee WC, Blanchette C, Pokras S, et al. The evolution and future of integrated evidence planning. Expert Rev Pharmacoecon Outcomes Res. 2025;25(6):855–862. doi:10.1080/14737167.2025.2497876.
  2. Vorisek C, Lehne M, Klopfenstein S, et al. Fast healthcare interoperability resources (FHIR) for interoperability in health research: systematic review. JMIR Med Inform. 2022;10(7). doi:10.2196/35724
  3. Berger ML, Sox H, Willke RJ, et al. Good practices for real-world data studies of treatment and/or comparative effectiveness: recommendations from the joint ISPOR-ISPE Special Task Force on Real-World Evidence in Health Care Decision Making. Value Health. 2017;20(8):1003-1008. doi: 10.1016/j.jval.2017.08.3019.
  4. Gatto NM, Reynolds RF, Campbell UB. A structured preapproval and postapproval comparative study design framework to generate valid and transparent real-world evidence for regulatory decisions. Clin Pharmacol Ther. 2019;106(1):103-115. doi: 10.1002/cpt.1480.
  5. Wang SV, Pinheiro S, Hua W, et al. STaRT-RWE: structured template for planning and reporting on the implementation of real world evidence studies. BMJ. 2021:12;372:m4856. doi: 10.1136/bmj.m4856.
  6. European Medicine Agency (EMA). EMA Data Quality Framework for Real-World Data. https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/data-quality-framework-eu-medicines-regulation_en.pdf. Published October 2023. Accessed May 29, 2025.

 

 

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×