Cross-modal retrieval with noisy correspondence
14 papers with code • 3 benchmarks • 5 datasets
Noisy correspondence learning aims to eliminate the negative impact of the mismatched pairs (e.g., false positives/negatives) instead of annotation errors in several tasks.
Most implemented papers
Negative Pre-aware for Noisy Cross-modal Matching
Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio.
Learning with Noisy Correspondence for Cross-modal Matching
Based on this observation, we reveal and study a latent and challenging direction in cross-modal matching, named noisy correspondence, which could be regarded as a new paradigm of noisy labels.
Deep Evidential Learning with Noisy Correspondence for Cross-Modal Retrieval
However, it will unavoidably introduce noise (i. e., mismatched pairs) into training data, dubbed noisy correspondence.
Cross-Modal Retrieval with Partially Mismatched Pairs
On the one hand, our method only utilizes the negative information which is much less likely false compared with the positive information, thus avoiding the overfitting issue to PMPs.
BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
As one of the most fundamental techniques in multimodal learning, cross-modal matching aims to project various sensory modalities into a shared feature space.
Integrating Language Guidance Into Image-Text Matching for Correcting False Negatives
Extensive experiments on two ITM benchmarks show that our method can improve the performance of existing ITM models.
Noisy Correspondence Learning with Meta Similarity Correction
Despite the success of multimodal learning in cross-modal retrieval task, the remarkable progress relies on the correct correspondence among multimedia data.
Cross-modal Active Complementary Learning with Self-refining Correspondence
Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
To achieve this, we propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs.
Cross-modal Retrieval with Noisy Correspondence via Consistency Refining and Mining
Thanks to the consistency refining and mining strategy of CREAM, the overfitting on the false positives could be prevented and the consistency rooted in the false negatives could be exploited, thus leading to a robust CMR method.