METHODS TO SUMMARIZE COMPLICATED DATASETS CONTAINING STRUCTURED, NOMINAL DATA USING SAS
Author(s)
Hamed Zahedi, PhD, Ph.D. Student University of Louisville, Louisville, KY, USA
Presentation Documents
AbstractObjective: The purpose of this study is to show the methodology for preprocessing and analyzing large healthcare databases. We consider working with large databases of clinical information such as National Inpatient Sample (NIS), and Thomson MedStat MarketScan data containing all patient claims in 40 million observations. Methods: We can define a group of procedures and treat them as one episode to investigate the frequency of occurrence. In many studies, only the primary procedure and diagnosis are considered when there are more than one procedure and diagnosis columns, but important information could be in those other columns. In our database used for the study, there are fifteen procedure and fifteen diagnosis columns that we use to find episodes of patient care. We also combine information from multiple datasets: inpatient, outpatient, pharmacy information. Another approach is to consider a sequence of treatments on patients and to study the effectiveness of treatment by looking at this sequence for each patient. Studying the physician decisions and the results of them is interesting to many health care organizations. Results: Powerful statistical software is required to work with large data files. We used SAS Enterprise Guide and the RXMATCH function to summarize codes defining a specific diagnosis, using multiple information sources. An alternative approach is to use SAS Text Miner. We combine columns using the CATX function. Then we use SAS Text Miner on the defined text string; the terms window in the output gives the frequency and number of documents. We use Text Miner features such as “Treating as equivalent terms”, “Sorting” and “Filtering” to get summaries of different diagnosis or procedures. We successfully defined episodes of care. Conclusion: Preprocessing is an essential aspect of outcomes research. Dealing with multiple data sources is essential.
Conference/Value in Health Info
2008-05, ISPOR 2008, Toronto, Ontario, Canada
Value in Health, Vol. 11, No. 3 (May/June 2008)
Code
PMC20
Topic
Real World Data & Information Systems
Topic Subcategory
Health & Insurance Records Systems
Disease
Multiple Diseases