Dual Softmax Loss is a loss function based on symmetric cross-entropy loss used in the CAMoE video-text retrieval model. Every text and video are calculated the similarity with other videos or texts, which should be maximum in terms of the ground truth pair. For DSL, a prior is introduced to revise the similarity score. Multiplying the prior with the original similarity matrix imposes an efficient constraint and can help to filter those single side match pairs. As a result, DSL highlights the one with both great Text-to-Video and Video-to-Text probability, conducting a more convincing result.

Source: Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss


