Paper tables with annotated results for Mining Hidden Populations through Attributed Search

Paper

Mining Hidden Populations through Attributed Search

Researchers often query online social platforms through their application programming interfaces (API) to find target populations such as people with mental illness~\cite{De-Choudhury2017} and jazz musicians~\cite{heckathorn2001finding}. Entities of such target population satisfy a property that is typically identified using an oracle (human or a pre-trained classifier). When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population. Finding individuals who belong to these populations on social networks is hard because they are non-queryable, and the sampler has to explore from a combinatorial query space within a finite budget limit. By exploiting the correlation between queryable attributes and the population of interest and by hierarchically ordering the query space, we propose a Decision tree-based Thompson sampler (\texttt{DT-TMP}) that efficiently discovers the right combination of attributes to query. Our proposed sampler outperforms the state-of-the-art samplers in online experiments, for example by 54\% on Twitter. When the number of matching entities to a query is known in offline experiments, \texttt{DT-TMP} performs exceedingly well by a factor of 0.9-1.5$\times$ over the baseline samplers. In the future, we wish to explore the option of finding hidden populations by formulating more complex queries.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Mining Hidden Populations through Attributed Search

Reader Guidelines

Editor Guidelines