A Strong and Robust Baseline for Text-Image Matching

ACL 2019  ·  Fangyu Liu, Rongtian Ye ·

We review the current schemes of text-image matching models and propose improvements for both training and inference. First, we empirically show limitations of two popular loss (sum and max-margin loss) widely used in training text-image embeddings and propose a trade-off: a kNN-margin loss which 1) utilizes information from hard negatives and 2) is robust to noise as all $K$-most hardest samples are taken into account, tolerating \emph{pseudo} negatives and outliers. Second, we advocate the use of Inverted Softmax (\textsc{Is}) and Cross-modal Local Scaling (\textsc{Csls}) during inference to mitigate the so-called hubness problem in high-dimensional embedding space, enhancing scores of all metrics by a large margin.

PDF Abstract ACL 2019 PDF ACL 2019 Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods