Detecting Potentially Fraudulent Data in Online Discrete Choice Experiments (DCE): A New Method Using Behaviorally 'Irrelevant' Respondent Variables

Author(s)

Iskiwitch C¹, White B¹, von Butler L¹, Panattoni L², Coulter J³, Prood N², Gahlon G², Land N², Maravic M²
¹SurveyEngine, Berlin, BE, Germany, ²Precision AQ, New York, NY, USA, ³Pfizer Inc, Grand Rapids, MI, USA

Presentation Documents

ISPOREurope24_Iskiwitch_MSR192_POSTER144906.pdf

OBJECTIVES: In response to the failure of internal data quality checks during the patient recruitment of an online health-focused DCE, we developed an approach to identify potentially fraudulent responses in real-time.

METHODS: We posited that the preferences of legitimate respondents should not vary by behaviorally ‘irrelevant’ respondent variables, including technical metadata (e.g., network, VPN use, or browser type). We ran multinomial logistic regression (MNL) models on the preference data stratified according to ‘irrelevant’ variables and conducted Likelihood Ratio (LR) tests of the stratified models vs. the aggregate. We hypothesized that a significant LR test, combined with qualitative differences in preference weights, may identify potential fraud. Implementation of the identification method occurred during data collection in early 2024. The target sample size was 450 patients.

RESULTS: Of almost 7,000 potential participants who started the screener, 19% used an anonymous browser mode, which could represent an attempt to avoid identification by panels. Results of a likelihood ratio (LR) test segmenting on this ‘irrelevant’ variable indicated that those using an anonymous browser mode were behaviorally different from those not using it, LR(16) = 65.2, p < .001. Visual inspection suggested the preference weights among the potentially fraudulent data were disordered. Follow-up ad hoc analyses supported our suspicion, including an increase over time in the screener passing rates for the suspicious group, suggesting that they used previous knowledge of the screening criteria. We repeated this approach in a second DCE and identified a different suspicious variable.

CONCLUSIONS: Using LR tests on ‘irrelevant’ segmentation variables allowed identification of the location and method of suspected fraudulent participation. Preference studies should report the use and success of fraud detection methods in protocols and dissemination. Future research should validate this method and examine how it compares to latent class analysis and other proposed approaches for detecting fraudulent data.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR192

Topic

Methodological & Statistical Research

Topic Subcategory

Survey Methods

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation