LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

pku-yuangroup/languagebind 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

FLAVA: A Foundational Language And Vision Alignment Model

facebookresearch/multimodal CVPR 2022

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks.

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

jetrunner/laprador Findings (ACL) 2022

Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives.

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

ofa-sys/chinese-clip 2 Nov 2022

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

flagai-open/flagai 12 Nov 2022

In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.

Keras GPT Copilot: Integrating the Power of Large Language Models in Deep Learning Model Development

fabprezja/keras-gpt-copilot Zenodo GitHub 2023

Keras GPT Copilot is the first Python package designed to integrate an LLM copilot within the model development workflow, offering iterative feedback options for enhancing the performance of your Keras deep learning models.