HEOR Articles

Unlocking the Potential of Electronic Health Records in Neuroscience for Health Economics and Outcomes Research

Luke Bryden, DPhil; Ana Todorovic, PhD; Panagiota Kontari, PhD; David Newton, BSc; Benjamin Fell, DPhil; Akrivia Health, Oxford, England, United Kingdom

Psychiatric disorders are a leading cause of disability globally,1 representing a huge unmet need for people living with these conditions. Despite increased healthcare spending and the use of psychotropic medications, this burden has remained stable over the past 30 years,1 with suboptimal clinical practice and lack of primary prevention identified as likely explanations.2 This lack of progress in the treatment of psychiatric disorders also manifests in poor success rates for clinical trials in psychiatry; only 7.3% of drugs advance from phase I to approval.3,4 The complexity and heterogeneity of psychiatric disorders compared to other disease areas is one factor that has hindered progress so far, presenting a unique challenge for the research community. To bring about progress, there is a need for further understanding of the underlying neurobiology, environmental factors, psychological mechanisms, and social determinants involved in psychiatric disorders to inform the development of new treatments, which have the potential to improve the quality of life of those living with mental illness and to be of great benefit to wider society. This understanding will also be of great relevance to the health economics and outcomes research (HEOR) community, given the increasing demand from payers for evidence of the disease and economic burden associated with conditions during treatment appraisal processes.

 "EHRs represent a rich source of data from real-world clinical practice; despite their richness, EHR data have been challenging to analyze in their raw form, and difficult or impossible to access outside the NHS."


Psychiatric disorders cross the boundaries of diagnoses, both in terms of clinical presentation and disease biology.5 To lay the groundwork for successful development of novel therapeutics, a deeper characterization of these disorders is necessary to ensure the right patient receives the right treatment at the right time. This characterization requires a longitudinal and biopsychosocial approach to understand the synergistic interplay between these factors over a patient’s lifetime. Bringing together data sources that cover the entire landscape of psychiatric disorders—from genes to socioeconomic factors—is necessary for an integrated approach, together with large sample sizes for meaningful analyses. Moreover, a data source that captures diverse, hard-to-reach patient populations is essential for the comprehensive characterization of psychiatric disorders. In this article, we outline how leveraging psychiatric electronic health record (EHR) data to deeply characterize the clinical trajectories of patients can provide just such a data source and explore how the potential of EHR data can be realized in the context of the United Kingdom’s National Health Service (NHS).

"When levied against clinical documents in EHRs, NLP can unlock this rich data source for research at scale, providing billions of data points related to patient care."


EHRs represent a rich source of data from real-world clinical practice. Because they capture patients’ experience over a lifetime, EHRs have the power to inform analyses of disease burden and resource utilization, in addition to capturing clinical outcomes following care and treatment interventions. In the NHS, psychiatric EHRs record not only diagnoses and treatment, but also rich, narrative detail on disease severity/progression, symptoms, treatment pathways, and potentially the social context of the patient in question. NHS psychiatric EHRs contain some key structured data, including demographics, some coded diagnostic data, and patient-reported outcomes. However, most clinically relevant information is recorded as unstructured data, including free text clinical notes, referral letters, and discharge summaries. Such documents represent a large part of the clinically actionable information contained within EHR systems. There is immense research value in these rich, transdiagnostic descriptions of patient states, but utilizing them for research poses significant challenges. Unstructured data are difficult to analyze, particularly at scale, and the sensitivity of clinical text documents (even when personally identifiable information like names and addresses have been masked) is too great to permit access outside of the controlling healthcare organization. Therefore historically, despite their richness, EHR data have been challenging to analyze in their raw form and difficult or impossible to access outside the NHS.

To improve the accessibility of psychiatric EHRs, a data-structuring solution is required. Specifically, this solution needs to allow clinical free-text data to be translated into a structured format, preserving the richness and contextual nuance of the source, while rendering the data more tractable for quantitative analysis. Crucially, data structuring would also allow EHRs to be anonymized or aggregated, preserving patient privacy, and so enabling access from outside the NHS.

Natural language processing (NLP) methods can provide this kind of data structuring solution. NLP describes a broad range of methods for automatically processing natural language data and includes a variety of techniques for structuring free text. When levied against clinical documents in EHRs, NLP can unlock this rich data source for research at scale, providing billions of data points related to patient care. These data points provide a source of information for retrospective analyses in psychiatric conditions, from assessing clinical outcomes to quantifying healthcare resource utilization.

"Linking these datasets, along with other national datasets/registries, allows mental health outcomes to be mapped over a lifetime."


When developing NLP data structuring models for use on clinical text data, it is critical to retain the contextual nuance of the original document. This is achieved by taking a clinically orientated approach to NLP concept design (Figure 1). Extensive qualitative analysis of how concepts (eg, medications, diagnosis, and symptoms) are described in source EHR data and direct involvement of practicing clinicians in the concept design process are vital to the development of a thorough NLP approach. For example, developing an NLP concept for medication requires knowledge of how this is usually described by clinicians—what medication information is relevant to clinical care, what gets recorded, and often more importantly, what does not. Medication descriptions in NHS psychiatry usually refer to current usage, past usage, or discussions of potential usage, but rarely ever include explicit ”negation” (describing medications a patient is not taking). Hence, the medication concept includes categories of “is on”, “was on,” and “other”, but no “not on” category. Inclusion of clinically redundant fields (regardless of their potential research utility) can lead to poor model performance, so data exploration and direct clinical involvement are vital to match concept design to the reality of source data (Figure 2).

Figure 1: Using NLP to capture a broad range of clinical concepts from unstructured clinical notes

Bryden_Figure 1

In the context of health economics, structuring medication data in this way allows it to be factored into, for example, healthcare resource utilization analyses and estimations of medical costs for psychiatric disorders. Without NLP, these analyses would not be possible at scale, given that most/all medication data are within unstructured free-text clinical notes in psychiatric EHRs. Additionally, these data allow for the clinical benefit of medications to be monitored over time. For novel therapeutics, this allows for further generation of evidence for clinical benefit beyond the short time frame of randomized controlled trials, which is particularly pertinent to medications assessed under value-based agreements where drug pricing is linked to clinical outcomes. Patient outcomes in terms of changes in rates of hospitalization, service use, and disease-relevant symptoms can be monitored following prescription of the medication in question.


Figure 2: Three NLP models to capture the depth of information within a clinical concept

Bryden_Figure 2

A use case of these structured medication data is the identification of patient groups defined by treatment patterns. Treatment-resistant depression (TRD), also conceptualized as difficult-to-treat depression (DTD),6 is defined as patients who fail to respond to 2 or more antidepressant drugs of adequate dosing and treatment duration.7 With medication data structured using NLP, the parameters within this definition can be operationalized to identify TRD. The contextual classification of medication mentioned in the free text allows the sequence of medications taken to be mapped. A study using this approach in a secondary care dataset found patients with TRD were more likely to be hospitalized and have more comorbidities than patients with depression that is not resistant to treatment.6 In the future, further use of NLP to structure information related to socioeconomics factors (eg, employment and accommodation information) from the free text will help capture a more holistic set of resource utilization indicators for TRD and other psychiatric disorders.

"Real-world data within psychiatric EHRs provide a unique opportunity to gain insights into the complex and heterogeneous nature of psychiatric disorders."

While the insight from a large secondary care dataset described the disease burden associated with TRD,6 it is known that many patients with TRD are managed in primary care8 and therefore linking primary and secondary care datasets would provide a fuller picture of treatment pathways and health outcomes. Linking these datasets, along with other national datasets/registries, allows mental health outcomes to be mapped over a lifetime.9,10 As we emphasized at the beginning of this article, a biopsychosocial approach is required to understand the influence of multiple factors on mental health outcomes. Linkage of datasets from across healthcare services, social care, and biomedicine is crucial to this integrated approach and will provide exciting research opportunities.

Real-world data within psychiatric EHRs provide a unique opportunity to gain insights into the complex and heterogeneous nature of psychiatric disorders. EHRs are a valuable source of information related to a patient’s experience of care, and there is a growing need to develop innovative approaches to effectively leverage these data to inform the development of future treatment strategies and to improve care practices. Well-designed methods that can effectively integrate and analyze data from multiple sources will be critical in fully realizing the potential of EHRs in mental health research. This will involve developing sophisticated algorithms and tools for NLP and data harmonization. The implementation of innovative approaches can enhance accessibility of EHR data for mental health research, improve our understanding of factors influencing patient outcomes, and inform evidence-based interventions in real-world settings.



1. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Lond Engl. 2020;396(10258):1204-1222. doi:10.1016/S0140-6736(20)30925-9

2. Jorm AF, Patten SB, Brugha TS, Mojtabai R. Has increased provision of treatment reduced the prevalence of common mental disorders? review of the evidence from four countries. World Psychiatry. 2017;16(1):90-99. doi:10.1002/wps.20388

3. Thomas D, Chancellor D, Micklus A, LaFever S, Hay M. Clinical Development Success Rates and Contributing Factors 2011–2020. Biotechnology Innovation Organization. Published 2021. Accessed April 13, 2023. www.bio.org/clinical-development-success-rates-and-contributing-factors-2011-2020

4. Mullard A. 2022 FDA approvals. Nat Rev Drug Discov. 2023;22(2):83-88. doi:10.1038/d41573-023-00001-3

5. Pelin H, Ising M, Stein F, et al. Identification of transdiagnostic psychiatric disorder subtypes using unsupervised learning. Neuropsychopharmacology. 2021;46(11):1895-1905. doi:10.1038/s41386-021-01051-0

6. Costa T, Menzat B, Engelthaler T, et al. The burden associated with, and management of, difficult-to-treat depression in patients under specialist psychiatric care in the United Kingdom. J Psychopharmacol Oxf Engl. 2022;36(5):545-556. doi:10.1177/02698811221090628

7. Sforzini L, Worrell C, Kose M, et al. A Delphi-method-based consensus guideline for definition of treatment-resistant depression for clinical trials. Mol Psychiatry. 2022;27(3):1286-1299. doi:10.1038/s41380-021-01381-x

8. Wiles N, Taylor A, Turner N, et al. Management of treatment-resistant depression in primary care: a mixed-methods study. Br J Gen Pract. 2018;68(675):e673-e681. doi:10.3399/bjgp18X699053

9. Sabia S, Fayosse A, Dumurgier J, et al. Association of ideal cardiovascular health at age 50 with incidence of dementia: 25 year follow-up of Whitehall II cohort study. BMJ. 2019;366:l4414. doi:10.1136/bmj.l4414

10. Rahman MA, Todd C, John A, et al. School achievement as a predictor of depression and self-harm in adolescence: linked education and health record study. Br J Psychiatry J Ment Sci. 2018;212(4):215-221. doi:10.1192/bjp.2017.69

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now