Computer Science Colloquia
Friday, May 11, 2012
Advisor: abhi shelat
Attending Faculty: Worthy Martin, chair; Gabe Robins, and Jason Lawrence
10:00 AM, Rice Hall, Rm. 504
PhD Quals Presentation
KD-Tree Algorithm for Propensity Score Matching
Objective. Propensity scores have been widely used in epidemiology to
control for confounding bias in post-marketing studies, but the standard
technique of matching patients by score becomes computationally
impractical with studies of three or more treatment groups. We present a
multi-category propensity score matching algorithm that is expected-case
quadratic on the number of participants per group.
Materials and Methods. Utilizing kd-tree data structures to provide
efficient queries for nearby points and a search radius related to a
best-guess match between participants in each treatment group, we reduce
the number of participants that must be considered for each matching. We
then match patients by propensity score, balancing the distribution of
confounders in each treatment group and thereby removing bias.
Discussion. We prove the correctness of this approach, showing that each
search radius considered contains the same match brute force choses for
Results. Our algorithm outperforms the brute force approach in the expected case, requiring only O(n) space and O(kdn2) time compared with brute forces O(nk+1) time, for k treatment groups. This difference is clearly seen in our empirical study of 1000 participants in 3 groups: our algorithm matches in 3.5 seconds compared to brute forces 19.53 hours. Conclusion. In the vast majority of cases, we can accomplish matching on propensity score with three or more treatment groups without the constraint of exponential growth of the search space. Considering four groups of 5,000 patients, that is a reduction from 625 trillion matches to 100 million and orders of magnitude shorter computation time.