Selecting Sentences versus Selecting Tree Constituents for Automatic Question Ranking

COLING 2016 · Alberto Barr{\'o}n-Cede{\~n}o, Giovanni Da San Martino, Salvatore Romeo, Aless Moschitti, ro ·

Community question answering (cQA) websites are focused on users who query questions onto an online forum, expecting for other users to provide them answers or suggestions. Unlike other social media, the length of the posted queries has no limits and queries tend to be multi-sentence elaborations combining context, actual questions, and irrelevant information. We approach the problem of question ranking: given a user{'}s new question, to retrieve those previously-posted questions which could be equivalent, or highly relevant. This could prevent the posting of nearly-duplicate questions and provide the user with instantaneous answers. For the first time in cQA, we address the selection of relevant text {---}both at sentence- and at constituent-level{---} for parse tree-based representations. Our supervised models for text selection boost the performance of a tree kernel-based machine learning model, allowing it to overtake the current state of the art on a recently released cQA evaluation framework.