Text Matching
133 papers with code • 0 benchmarks • 7 datasets
Matching a target text to a source text based on their meaning.
Benchmarks
These leaderboards are used to track progress in Text Matching
Most implemented papers
Visual Semantic Reasoning for Image-Text Matching
It outperforms the current best method by 6. 8% relatively for image retrieval and 4. 8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set).
Extractive Summarization as Text Matching
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News
The search can directly warn fake news posters and online users (e. g. the posters' followers) about misinformation, discourage them from spreading fake news, and scale up verified content on social media.
Identifying Machine-Paraphrased Plagiarism
Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity.
ActionCLIP: A New Paradigm for Video Action Recognition
Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".
ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
Image-Text matching (ITM) is a common task for evaluating the quality of Vision and Language (VL) models.
Dissecting Deep Metric Learning Losses for Image-Text Retrieval
In the event that the gradients are not integrable to a valid loss function, we implement our proposed objectives such that they would directly operate in the gradient space instead of on the losses in the embedding space.
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.
Self-supervised vision-language pretraining for Medical visual question answering
Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiographic image, which is a challenging problem that requires a model to integrate both vision and language information.
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations.