Detecting Potentially Fraudulent Data in Online Discrete Choice Experiments (DCE): A New Method Using Behaviorally 'Irrelevant' Respondent Variables
Author(s)
Iskiwitch C1, White B1, von Butler L1, Panattoni L2, Coulter J3, Prood N2, Gahlon G2, Land N2, Maravic M2
1SurveyEngine, Berlin, BE, Germany, 2Precision AQ, New York, NY, USA, 3Pfizer Inc, Grand Rapids, MI, USA
Presentation Documents
OBJECTIVES: In response to the failure of internal data quality checks during the patient recruitment of an online health-focused DCE, we developed an approach to identify potentially fraudulent responses in real-time.
METHODS: We posited that the preferences of legitimate respondents should not vary by behaviorally ‘irrelevant’ respondent variables, including technical metadata (e.g., network, VPN use, or browser type). We ran multinomial logistic regression (MNL) models on the preference data stratified according to ‘irrelevant’ variables and conducted Likelihood Ratio (LR) tests of the stratified models vs. the aggregate. We hypothesized that a significant LR test, combined with qualitative differences in preference weights, may identify potential fraud. Implementation of the identification method occurred during data collection in early 2024. The target sample size was 450 patients.
RESULTS: Of almost 7,000 potential participants who started the screener, 19% used an anonymous browser mode, which could represent an attempt to avoid identification by panels. Results of a likelihood ratio (LR) test segmenting on this ‘irrelevant’ variable indicated that those using an anonymous browser mode were behaviorally different from those not using it, LR(16) = 65.2, p < .001. Visual inspection suggested the preference weights among the potentially fraudulent data were disordered. Follow-up ad hoc analyses supported our suspicion, including an increase over time in the screener passing rates for the suspicious group, suggesting that they used previous knowledge of the screening criteria. We repeated this approach in a second DCE and identified a different suspicious variable.
CONCLUSIONS: Using LR tests on ‘irrelevant’ segmentation variables allowed identification of the location and method of suspected fraudulent participation. Preference studies should report the use and success of fraud detection methods in protocols and dissemination. Future research should validate this method and examine how it compares to latent class analysis and other proposed approaches for detecting fraudulent data.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR192
Topic
Methodological & Statistical Research
Topic Subcategory
Survey Methods
Disease
No Additional Disease & Conditions/Specialized Treatment Areas