Quantitative analysis of large-scale data is often complicated by the presence of diverse subgroups, which reduce the accuracy of inferences they make on held-out data.
For example, in an online network of a social media platform, the number of people who mention a topic in their posts---i. e., its global popularity---can be dramatically different from how people see it in their social feeds---i. e., its perceived popularity---where the feeds aggregate their friends' posts.
Social and Information Networks Physics and Society
Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships.
Ranked #27 on Node Classification on Pubmed
We describe a data-driven discovery method that leverages Simpson's paradox to uncover interesting patterns in behavioral data.
We present a statistical method to automatically identify Simpson's paradox in data by comparing statistical trends in the aggregate data to those in the disaggregated subgroups.
Computers and Society