ISPOR Conferences

Computable Phenotypes for Generating RWE: What Are They and Can They Really Be Standardized and Reused?

Chintal H. Shah, MS, BPharm
, University of Maryland Baltimore, Baltimore, MD, USA

Computable phenotypes are definitions of clinical conditions, outcomes and exposures that can be implemented in real-world data sources, such as electronic health record data and/or medical claims data.

Adoption of computable phenotypes has a variety of goals including, but not limited to: increased transparency, enhanced efficiency of research in “real-world” health system settings, and superior speed of dissemination of results and resultant practice change. However, given the lack of homogeneity within and across various real-world datasets in terms of data quality and population representativeness, standardization via computable phenotypes remains a major challenge.

In this session, the 3 panelists described their experiences with the development of compatible phenotypes for various applications.

Elise Berliner, PhD
(Cerner Enviza, USA) moderated the session and introduced these 4 elements of computable phenotypes: (1) clinical encounters; (2) diagnosis code; (3) medications; and (4) age criteria. Berliner also presented information from a systematic review where 66 different (out of 113 eligible studies) algorithms were used to define asthma. This highlighted the need for standardization. Moreover, in the aforementioned study, an overwhelming majority of studies did not report on the validity of the algorithm used and this lack of standardization is a major roadblock in transparency and credibility of results obtained using real-world data (RWD).


"Given the lack of homogeneity within and across various real-world datasets, standardization via computable phenotypes remains a major challenge."

Kevin Haynes, PharmD, MSCE
(Janssen, USA) walked the audience through some challenges in the development of computable phenotypes through a number of examples. Haynes explained that although the requirement of repeated diagnosis over a period of time may help confirm a condition, it runs the risk of introducing immortal time bias into the study. Also, an underlying problem in most phenotype definitions is that, by definition, in order for a condition to be captured (using most RWE data types), it must be severe enough to warrant a healthcare encounter. Another major challenge is a lack of longitudinal data over a long enough period of time to accurately capture patient history. Haynes concluded that while phenotypes have a number of advantages, we have to overcome a number of challenges in order to identify and develop these phenotypes.

David Carrell, PhD
(Kaiser Permanente, USA) presented an alternative to algorithm reuse, that is, scalable algorithm development. Algorithm reuse is appealing as it saves time and money. Algorithm scalability refers to the ability to easily implement an algorithm in another data set and to obtain comparable performance. Since in its current form, scalability is a major challenge, Carrell introduced the idea that an automated phenotype development process wherein the development costs are reduced so much that it could be feasible to implement the entire development process in each setting may be an alternative path to algorithm scalability.


"Use of machine learning to automate phenotyping does have potential, but we are many years away from universal automation."

Rachel Richesson, PhD, MPH, MS
(University of Michigan, USA) presented on the elements that make phenotypes findable for assessment or reuse. These elements are: find, assess, implement, validate, report, and publish. Standardizing computational phenotypes is a process that requires a lot of time and effort, and subsequently leads to considerable cost. In her discussion, Richesson raised the important question, “Who should pay for these efforts?” She also referred the audience to the following resources/communities for phenotypes:

  • PheKB (Phenotype Knowledge Base). In some cases, where available, this dataset also provided information on metrics such as positive predictive value for these algorithms.
  • OHDSI (Observational Health Data Sciences and Informatics). Phenotype library.

Utilization of natural language processing was discussed to help with phenotyping, but the practical challenge with this approach is that electronic medical record data is not easy to access, and even when available, linking the data is a major challenge. It was agreed that this use of machine learning to automate phenotyping does have potential, but we are many years away from universal automation. The ”gold standard for validation” was discussed and the panelists were of the opinion that the medical chart is probably the closest to the gold standard or the ‘truth.’ Lastly, the major challenges and the added complexities with the use of these methods in rare disease spaces were discussed.

SpotlightOn_ISPOR2022_472x571 VOL. 8, NO. 3 S1

May 2022 Supplement

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now