Search Results

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation6 Jul 2023

We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring an FM for downstream tasks.

Action Recognition Temporal Localization +1

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

3 code implementations8 Mar 2023

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

Llama 2: Open Foundation and Fine-Tuned Chat Models

19 code implementations18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Arithmetic Reasoning +5

CoCa: Contrastive Captioners are Image-Text Foundation Models

5 code implementations4 May 2022

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

 Ranked #1 on Image Classification on ImageNet (Top 1 Accuracy metric)

Action Classification Decoder +10

Can Foundation Models Wrangle Your Data?

2 code implementations20 May 2022

Foundation Models (FMs) are models trained on large corpora of data that, at very large scale, can generalize to new tasks without any task-specific finetuning.

Entity Resolution Imputation +1

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

1 code implementation24 Sep 2024

The proposed method is evaluated on multiple speech restoration tasks, including speech denoising, bandwidth extension, codec artifact removal, and target speaker extraction.

Bandwidth Extension Denoising +3

Making Large Language Models A Better Foundation For Dense Retrieval

1 code implementation24 Dec 2023

LLaRA consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the text embeddings from LLM are used to reconstruct the tokens for the input sentence and predict the tokens for the next sentence, respectively.

Retrieval Sentence +1

Cosmos World Foundation Model Platform for Physical AI

5 code implementations7 Jan 2025

In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups.

model Position

Code Llama: Open Foundation Models for Code

2 code implementations24 Aug 2023

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Ranked #37 on Code Generation on MBPP (using extra training data)

16k Code Generation +3