Zero-shot Text Retrieval
6 papers with code • 1 benchmarks • 2 datasets
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks.
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives.
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.
In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.
Keras GPT Copilot: Integrating the Power of Large Language Models in Deep Learning Model Development
Keras GPT Copilot is the first Python package designed to integrate an LLM copilot within the model development workflow, offering iterative feedback options for enhancing the performance of your Keras deep learning models.