Search Results for author: Chunyuan Li

Found 112 papers, 70 papers with code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations • 9 Mar 2023 • Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Referring Expression Referring Expression Comprehension +2

124,593

Paper
Code

Improved Baselines with Visual Instruction Tuning

5 code implementations • 5 Oct 2023 • Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning.

Ranked #3 on visual instruction following on LLaVA-Bench

Factual Inconsistency Detection in Chart Captioning visual instruction following +1

124,593

Paper
Code

Visual Instruction Tuning

9 code implementations • NeurIPS 2023 • Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.

Ranked #4 on Visual Question Answering on BenchLMM

Video Question Answering visual instruction following +2

124,593

Paper
Code

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

1 code implementation • 18 Sep 2023 • Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen

We find that scaling LMM consistently enhances model performance and improves language capabilities, and performance of LoRA/QLoRA tuning of LMM are comparable to the performance of full-model fine-tuning.

Ranked #45 on Visual Question Answering on MM-Vet

Visual Question Answering

15,941

Paper
Code

Focal Modulation Networks

6 code implementations • 22 Mar 2022 • Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao

For semantic segmentation with UPerNet, FocalNet base at single-scale outperforms Swin by 2. 4, and beats Swin at multi-scale (50. 5 v. s.

Ranked #8 on Object Detection on COCO minival (using extra training data)

Image Classification Object Detection +2

12,034

Paper
Code

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

3 code implementations • 17 Oct 2023 • Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V.

Interactive Segmentation Referring Expression +4

4,011

Paper
Code

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

2 code implementations • 1 Nov 2023 • Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li

LLaVA-Interactive is a research prototype for multimodal human-AI interaction.

Image Generation Image Segmentation +1

4,011

Paper
Code

Instruction Tuning with GPT-4

2 code implementations • 6 Apr 2023 • Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao

Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed.

Instruction Following

3,957

Paper
Code

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

2 code implementations • 8 Jun 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu

We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.

Ranked #81 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

3,436

Paper
Code

Grounded Language-Image Pre-training

2 code implementations • CVPR 2022 • Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Ranked #1 on 2D Object Detection on RF100

Described Object Detection Few-Shot Object Detection +1

1,947

Paper
Code

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

8 code implementations • 19 Apr 2022 • Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Ranked #1 on Object Detection on ELEVATER

Fairness Few-Shot Image Classification +4

1,947

Paper
Code

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation • 10 Jul 2023 • Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

1,904

Paper
Code

Visual In-Context Prompting

3 code implementations • 22 Nov 2023 • Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

1,904

Paper
Code

GLIGEN: Open-Set Grounded Text-to-Image Generation

1 code implementation • CVPR 2023 • Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee

Large-scale text-to-image diffusion models have made amazing advances.

Ranked #4 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Image Inpainting

1,779

Paper
Code

Generalized Decoding for Pixel, Image, and Language

1 code implementation • CVPR 2023 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.

Ranked #4 on Instance Segmentation on ADE20K val (using extra training data)

Image Segmentation Panoptic Segmentation +3

1,243

Paper
Code

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations • ICCV 2023 • Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

1,243

Paper
Code

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

Ranked #1 on Image Retrieval on MS COCO (Recall@10 metric)

Image Captioning Image Retrieval +3

1,197

Paper
Code

Focal Self-attention for Local-Global Interactions in Vision Transformers

3 code implementations • 1 Jul 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal self-attention, we propose a new variant of Vision Transformer models, called Focal Transformer, which achieves superior performance over the state-of-the-art vision Transformers on a range of public image classification and object detection benchmarks.

Ranked #17 on Instance Segmentation on COCO test-dev

Image Classification Instance Segmentation +3

1,183

Paper
Code

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

1 code implementation • 17 Oct 2022 • Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao

This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years.

Few-Shot Learning Image Captioning +11

993

Paper
Code

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

1 code implementation • 18 Sep 2023 • Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao

This paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.

Text-to-Image Generation

993

Paper
Code

RegionCLIP: Region-based Language-Image Pretraining

1 code implementation • CVPR 2022 • Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans.

Ranked #11 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Image Classification Object +3

643

Paper
Code

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.

Ranked #1 on LMM real-life tasks on Leaderboard

Instruction Following LLM real-life tasks +3

617

Paper
Code

Focal Attention for Long-Range Interactions in Vision Transformers

1 code implementation • NeurIPS 2021 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification object-detection +2

542

Paper
Code

Measuring the Intrinsic Dimension of Objective Landscapes

4 code implementations • ICLR 2018 • Chunyuan Li, Heerad Farkhoor, Rosanne Liu, Jason Yosinski

A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

468

Paper
Code

Efficient Self-supervised Vision Transformers for Representation Learning

1 code implementation • ICLR 2022 • Chunyuan Li, Jianwei Yang, Pengchuan Zhang, Mei Gao, Bin Xiao, Xiyang Dai, Lu Yuan, Jianfeng Gao

This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning.

Ranked #16 on Self-Supervised Image Classification on ImageNet

Representation Learning Self-Supervised Image Classification

405

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

369

Paper
Code

Unified Contrastive Learning in Image-Text-Label Space

1 code implementation • CVPR 2022 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.

Contrastive Learning Image Classification +2

369

Paper
Code

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

369

Paper
Code

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

1 code implementation • EMNLP 2020 • Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang, Jianfeng Gao

We hope that our first pre-trained big VAE language model itself and results can help the NLP community renew the interests of deep generative models in the era of large-scale pre-training, and make these principled methods more practical.

Language Modelling Representation Learning +1

353

Paper
Code

Joint Embedding of Words and Labels for Text Classification

2 code implementations • ACL 2018 • Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences.

Ranked #11 on Text Classification on DBpedia

General Classification Sentiment Analysis +2

323

Paper
Code

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

2 code implementations • ACL 2018 • Dinghan Shen, Guoyin Wang, Wenlin Wang, Martin Renqiang Min, Qinliang Su, Yizhe Zhang, Chunyuan Li, Ricardo Henao, Lawrence Carin

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations.

Ranked #1 on Named Entity Recognition (NER) on CoNLL 2000

Document Classification General Classification +4

284

Paper
Code

On the Hidden Mystery of OCR in Large Multimodal Models

1 code implementation • 13 May 2023 • Yuliang Liu, Zhang Li, Biao Yang, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai

In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).

Key Information Extraction Nutrition +4

275

Paper
Code

TrustLLM: Trustworthiness in Large Language Models

1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao

This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.

Ethics Fairness

265

Paper
Code

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

4 code implementations • ICLR 2020 • Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson

The posteriors over neural network weights are high dimensional and multimodal.

Bayesian Inference Stochastic Optimization

233

Paper
Code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

230

Paper
Code

Few-shot Natural Language Generation for Task-Oriented Dialog

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, Jianfeng Gao

It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains.

Ranked #4 on Data-to-Text Generation on MULTIWOZ 2.1

Data-to-Text Generation Few-Shot Learning

188

Paper
Code

Towards Building the Federated GPT: Federated Instruction Tuning

1 code implementation • 9 May 2023 • Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Yufan Zhou, Guoyin Wang, Yiran Chen

This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.

Federated Learning

185

Paper
Code

LAFITE: Towards Language-Free Training for Text-to-Image Generation

2 code implementations • 27 Nov 2021 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.

Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ

Zero-Shot Text-to-Image Generation

176

Paper
Code

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

1 code implementation • 3 Oct 2023 • Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao

To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks.

Chatbot Image Captioning +5

175

Paper
Code

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

2 code implementations • NAACL 2019 • Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin

Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks.

Language Modelling Response Generation +1

171

Paper
Code

Feature Quantization Improves GAN Training

2 code implementations • ICML 2020 • Yang Zhao, Chunyuan Li, Ping Yu, Jianfeng Gao, Changyou Chen

The instability in GAN training has been a long-standing problem despite remarkable research efforts.

Ranked #1 on Conditional Image Generation on CIFAR-100

Conditional Image Generation Face Generation +3

169

Paper
Code

Learning Customized Visual Models with Retrieval-Augmented Knowledge

1 code implementation • CVPR 2023 • Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability.

Ranked #1 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (using extra training data)

Contrastive Learning Retrieval +3

117

Paper
Code

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training

1 code implementation • EMNLP 2020 • Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, Bill Dolan

Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation.

Language Modelling Representation Learning +1

112

Paper
Code

Parameter-efficient Model Adaptation for Vision Transformers

2 code implementations • 29 Mar 2022 • Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang

In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.

Benchmarking Classification +2

Paper
Code

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching

5 code implementations • NeurIPS 2017 • Chunyuan Li, Hao liu, Changyou Chen, Yunchen Pu, Liqun Chen, Ricardo Henao, Lawrence Carin

We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching.

Paper
Code

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

1 code implementation • CVPR 2020 • Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, Jianfeng Gao

By training on a large amount of image-text-action triplets in a self-supervised learning manner, the pre-trained model provides generic representations of visual environments and language instructions.

Ranked #1 on Visual Navigation on Help, Anna! (HANNA)

Navigate Self-Supervised Learning +2

Paper
Code

SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching

1 code implementation • 11 May 2020 • Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Jianfeng Gao

We present a new method SOLOIST that uses transfer learning and machine teaching to build task bots at scale.

Ranked #4 on End-To-End Dialogue Modelling on MULTIWOZ 2.0

End-To-End Dialogue Modelling Few-Shot Learning +4

Paper
Code

Triangle Generative Adversarial Networks

1 code implementation • NeurIPS 2017 • Zhe Gan, Liqun Chen, Wei-Yao Wang, Yunchen Pu, Yizhe Zhang, Hao liu, Chunyuan Li, Lawrence Carin

The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs.

Attribute Generative Adversarial Network +3

Paper
Code

Implicit Deep Latent Variable Models for Text Generation

1 code implementation • IJCNLP 2019 • Le Fang, Chunyuan Li, Jianfeng Gao, Wen Dong, Changyou Chen

Deep latent variable models (LVM) such as variational auto-encoder (VAE) have recently played an important role in text generation.

Language Modelling Response Generation +2

Paper
Code

Few-Shot Named Entity Recognition: A Comprehensive Study

2 code implementations • 29 Dec 2020 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents a comprehensive study to efficiently build named entity recognition (NER) systems when a small number of in-domain labeled data is available.

Few-Shot Learning named-entity-recognition +2

Paper
Code

Seeds Cleansing CNMF for Spatiotemporal Neural Signals Extraction of Miniscope Imaging Data

1 code implementation • 3 Apr 2017 • Jinghao Lu, Chunyuan Li, Fan Wang

Miniscope calcium imaging is increasingly being used to monitor large populations of neuronal activities in freely behaving animals.

Neurons and Cognition Quantitative Methods

Paper
Code

Twin Auxiliary Classifiers GAN

4 code implementations • 5 Jul 2019 • Mingming Gong, Yanwu Xu, Chunyuan Li, Kun Zhang, Kayhan Batmanghelich

One of the popular conditional models is Auxiliary Classifier GAN (AC-GAN), which generates highly discriminative images by extending the loss function of GAN with an auxiliary classifier.

Ranked #2 on Conditional Image Generation on CIFAR-100

Conditional Image Generation

Paper
Code

Twin Auxilary Classifiers GAN

1 code implementation • NeurIPS 2019 • Mingming Gong, Yanwu Xu, Chunyuan Li, Kun Zhang, Kayhan Batmanghelich

One of the popular conditional models is Auxiliary Classifier GAN (AC-GAN) that generates highly discriminative images by extending the loss function of GAN with an auxiliary classifier.

Ranked #2 on Image Generation on CIFAR-100

Conditional Image Generation

Paper
Code

Deep Temporal Sigmoid Belief Networks for Sequence Modeling

1 code implementation • NeurIPS 2015 • Zhe Gan, Chunyuan Li, Ricardo Henao, David Carlson, Lawrence Carin

Deep dynamic generative models are developed to learn sequential dependencies in time-series data.

Time Series Time Series Analysis

Paper
Code

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

1 code implementation • 1 Apr 2024 • Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM).

Instruction Following Language Modelling +3

Paper
Code

Structure-Aware Human-Action Generation

1 code implementation • ECCV 2020 • Ping Yu, Yang Zhao, Chunyuan Li, Junsong Yuan, Changyou Chen

Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence.

Ranked #2 on Human action generation on NTU RGB+D 2D

Action Generation graph construction +1

Paper
Code

Adversarial Time-to-Event Modeling

4 code implementations • ICML 2018 • Paidamoyo Chapfuwa, Chenyang Tao, Chunyuan Li, Courtney Page, Benjamin Goldstein, Lawrence Carin, Ricardo Henao

Modern health data science applications leverage abundant molecular and electronic health data, providing opportunities for machine learning to build statistical models to support clinical practice.

Survival Analysis

Paper
Code

Towards Amortized Ranking-Critical Training for Collaborative Filtering

1 code implementation • 10 Jun 2019 • Sam Lobel, Chunyuan Li, Jianfeng Gao, Lawrence Carin

In this paper we investigate new methods for training collaborative filtering models based on actor-critic reinforcement learning, to directly optimize the non-differentiable quality metrics of interest.

Ranked #4 on Recommendation Systems on Million Song Dataset

Collaborative Filtering Learning-To-Rank +1

Paper
Code

Symmetric Variational Autoencoder and Connections to Adversarial Learning

2 code implementations • 6 Sep 2017 • Liqun Chen, Shuyang Dai, Yunchen Pu, Chunyuan Li, Qinliang Su, Lawrence Carin

A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence.

Paper
Code

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

1 code implementation • 25 Dec 2015 • Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied.

Stochastic Optimization

Paper
Code

Adversarial Learning of a Sampler Based on an Unnormalized Distribution

3 code implementations • 3 Jan 2019 • Chunyuan Li, Ke Bai, Jianqiao Li, Guoyin Wang, Changyou Chen, Lawrence Carin

We investigate adversarial learning in the case when only an unnormalized form of the density can be accessed, rather than samples.

Q-Learning

Paper
Code

Survival Cluster Analysis

1 code implementation • 29 Feb 2020 • Paidamoyo Chapfuwa, Chunyuan Li, Nikhil Mehta, Lawrence Carin, Ricardo Henao

As a result, there is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles, while jointly accounting for accurate individualized time-to-event predictions.

Survival Analysis

Paper
Code

Partition-Guided GANs

1 code implementation • CVPR 2021 • Mohammadreza Armandpour, Ali Sadeghian, Chunyuan Li, Mingyuan Zhou

We formulate two desired criteria for the space partitioner that aid the training of our mixture of generators: 1) to produce connected partitions and 2) provide a proxy of distance between partitions and data samples, along with a direction for reducing that distance.

Ranked #9 on Image Generation on STL-10

Image Generation

Paper
Code

Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation

1 code implementation • ICCV 2021 • Jinyu Yang, Chunyuan Li, Weizhi An, Hehuan Ma, Yuzhi Guo, Yu Rong, Peilin Zhao, Junzhou Huang

Recent studies imply that deep neural networks are vulnerable to adversarial examples -- inputs with a slight but intentional perturbation are incorrectly classified by the network.

Segmentation Semantic Segmentation +1

Paper
Code

Hierarchical Graph Capsule Network

1 code implementation • 16 Dec 2020 • Jinyu Yang, Peilin Zhao, Yu Rong, Chaochao Yan, Chunyuan Li, Hehuan Ma, Junzhou Huang

Graph Neural Networks (GNNs) draw their strength from explicitly modeling the topological information of structured data.

Graph Classification

Paper
Code

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

2 code implementations • 3 Oct 2023 • Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li

Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning.

Attribute Hallucination +2

Paper
Code

Learning Structural Weight Uncertainty for Sequential Decision-Making

1 code implementation • 30 Dec 2017 • Ruiyi Zhang, Chunyuan Li, Changyou Chen, Lawrence Carin

Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications.

Decision Making Multi-Armed Bandits +1

Paper
Code

Robust Navigation with Language Pretraining and Stochastic Sampling

1 code implementation • IJCNLP 2019 • Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin Choi

Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments.

Vision and Language Navigation

Paper
Code

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

1 code implementation • 29 Nov 2022 • Chunyuan Li, Xinliang Zhu, Jiawen Yao, Junzhou Huang

Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.

Multiple Instance Learning Representation Learning +2

Paper
Code

Contrastive Attraction and Contrastive Repulsion for Representation Learning

1 code implementation • 8 May 2021 • Huangjie Zheng, Xu Chen, Jiangchao Yao, Hongxia Yang, Chunyuan Li, Ya zhang, Hao Zhang, Ivor Tsang, Jingren Zhou, Mingyuan Zhou

We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples.

Contrastive Learning Representation Learning

Paper
Code

Continuous-Time Flows for Efficient Inference and Density Estimation

no code implementations • ICML 2018 • Changyou Chen, Chunyuan Li, Liqun Chen, Wenlin Wang, Yunchen Pu, Lawrence Carin

Distinct from normalizing flows and GANs, CTFs can be adopted to achieve the above two goals in one framework, with theoretical guarantees.

Density Estimation

Paper
Add Code

Adversarial Symmetric Variational Autoencoder

no code implementations • NeurIPS 2017 • Yunchen Pu, Wei-Yao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, Lawrence Carin

A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: ($i$) from observed data fed through the encoder to yield codes, and ($ii$) from latent codes drawn from a simple prior and propagated through the decoder to manifest data.

Paper
Add Code

VAE Learning via Stein Variational Gradient Descent

no code implementations • NeurIPS 2017 • Yunchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent.

Paper
Add Code

Learning Generic Sentence Representations Using Convolutional Neural Networks

no code implementations • EMNLP 2017 • Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes.

Sentence

Paper
Add Code

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

no code implementations • ACL 2017 • Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence Carin

Recurrent neural networks (RNNs) have shown promising performance for language modeling.

Language Modelling Stochastic Optimization

Paper
Add Code

Unsupervised Learning with Truncated Gaussian Graphical Models

no code implementations • 15 Nov 2016 • Qinliang Su, Xuejun Liao, Chunyuan Li, Zhe Gan, Lawrence Carin

Gaussian graphical models (GGMs) are widely used for statistical modeling, because of ease of inference and the ubiquitous use of the normal distribution in practical approximations.

Unsupervised Pre-training

Paper
Add Code

Stochastic Gradient MCMC with Stale Gradients

no code implementations • NeurIPS 2016 • Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin

In this paper we develop theory to show that while the bias and MSE of an SG-MCMC algorithm depend on the staleness of stochastic gradients, its estimation variance (relative to the expected estimate, based on a prescribed number of samples) is independent of it.

Paper
Add Code

Variational Autoencoder for Deep Learning of Images, Labels and Captions

no code implementations • NeurIPS 2016 • Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin

A novel variational autoencoder is developed to model images, as well as associated labels or captions.

Paper
Add Code

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

no code implementations • 23 Dec 2015 • Chunyuan Li, Changyou Chen, David Carlson, Lawrence Carin

Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace and more

Paper
Add Code

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

no code implementations • 23 Dec 2015 • Chunyuan Li, Changyou Chen, Kai Fan, Lawrence Carin

Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learning.

Vocal Bursts Intensity Prediction

Paper
Add Code

A Deep Generative Deconvolutional Image Model

no code implementations • 23 Dec 2015 • Yunchen Pu, Xin Yuan, Andrew Stevens, Chunyuan Li, Lawrence Carin

A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework.

Dictionary Learning Image Generation

Paper
Add Code

Policy Optimization as Wasserstein Gradient Flows

no code implementations • ICML 2018 • Ruiyi Zhang, Changyou Chen, Chunyuan Li, Lawrence Carin

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate.

Reinforcement Learning (RL)

Paper
Add Code

Generative Adversarial Network Training is a Continual Learning Problem

no code implementations • ICLR 2019 • Kevin J Liang, Chunyuan Li, Guoyin Wang, Lawrence Carin

We hypothesize that this is at least in part due to the evolution of the generator distribution and the catastrophic forgetting tendency of neural networks, which leads to the discriminator losing the ability to remember synthesized samples from previous instantiations of the generator.

Continual Learning Generative Adversarial Network +1

Paper
Add Code

Persistence-based Structural Recognition

no code implementations • CVPR 2014 • Chunyuan Li, Maks Ovsjanikov, Frederic Chazal

This paper presents a framework for object recognition using topological persistence.

3D Shape Classification 3D Shape Retrieval +5

Paper
Add Code

Learning Weight Uncertainty With Stochastic Gradient MCMC for Shape Classification

no code implementations • CVPR 2016 • Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin

Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision.

General Classification Stochastic Optimization

Paper
Add Code

DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain

no code implementations • WS 2019 • Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon, Jianfeng Gao

We use a multi-source transfer learning approach to transfer the knowledge from MT-DNN and SciBERT to natural language understanding tasks in the medical domain.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Straight-Through Estimator as Projected Wasserstein Gradient Flow

no code implementations • 5 Oct 2019 • Pengyu Cheng, Chang Liu, Chunyuan Li, Dinghan Shen, Ricardo Henao, Lawrence Carin

The Straight-Through (ST) estimator is a widely used technique for back-propagating gradients through discrete random variables.

Paper
Add Code

Multi-View Learning for Vision-and-Language Navigation

no code implementations • 2 Mar 2020 • Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith

Learning to navigate in a visual environment following natural language instructions is a challenging task because natural language instructions are highly variable, ambiguous, and under-specified.

MULTI-VIEW LEARNING Navigate +1

Paper
Add Code

Weakly supervised cross-domain alignment with optimal transport

no code implementations • 14 Aug 2020 • Siyang Yuan, Ke Bai, Liqun Chen, Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin

Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing.

Paper
Add Code

Robust Conversational AI with Grounded Text Generation

no code implementations • 7 Sep 2020 • Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Heung-Yeung Shum

This article provides an overview of this progress and discusses related methods and technologies that can be incorporated for building robust conversational AI systems.

Text Generation World Knowledge

Paper
Add Code

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

no code implementations • EMNLP 2020 • Bang An, Jie Lyu, Zhenyi Wang, Chunyuan Li, Changwei Hu, Fei Tan, Ruiyi Zhang, Yifan Hu, Changyou Chen

The neural attention mechanism plays an important role in many natural language processing applications.

Bayesian Inference

Paper
Add Code

Improving Text Generation with Student-Forcing Optimal Transport

no code implementations • EMNLP 2020 • Guoyin Wang, Chunyuan Li, Jianqiao Li, Hao Fu, Yuh-Chen Lin, Liqun Chen, Yizhe Zhang, Chenyang Tao, Ruiyi Zhang, Wenlin Wang, Dinghan Shen, Qian Yang, Lawrence Carin

An extension is further proposed to improve the OT learning, based on the structural and contextual information of the text sequences.

Machine Translation Text Generation +2

Paper
Add Code

ReMP: Rectified Metric Propagation for Few-Shot Learning

no code implementations • 2 Dec 2020 • Yang Zhao, Chunyuan Li, Ping Yu, Changyou Chen

Few-shot learning features the capability of generalizing from a few examples.

Few-Shot Learning

Paper
Add Code

Self-supervised Pre-training with Hard Examples Improves Visual Representations

no code implementations • 25 Dec 2020 • Chunyuan Li, Xiujun Li, Lei Zhang, Baolin Peng, Mingyuan Zhou, Jianfeng Gao

Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.

Ranked #69 on Self-Supervised Image Classification on ImageNet

Data Augmentation Representation Learning +1

Paper
Add Code

RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems

no code implementations • ACL 2021 • Baolin Peng, Chunyuan Li, Zhu Zhang, Chenguang Zhu, Jinchao Li, Jianfeng Gao

For task-oriented dialog systems to be maximally useful, it must be able to process conversations in a way that is (1) generalizable with a small number of training examples for new task domains, and (2) robust to user input in various styles, modalities or domains.

Paper
Add Code

SDA: Improving Text Generation with Self Data Augmentation

no code implementations • 2 Jan 2021 • Ping Yu, Ruiyi Zhang, Yang Zhao, Yizhe Zhang, Chunyuan Li, Changyou Chen

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision.

Data Augmentation Imitation Learning +2

Paper
Add Code

Leveraging User Behavior History for Personalized Email Search

no code implementations • 15 Feb 2021 • Keping Bi, Pavel Metrikov, Chunyuan Li, Byungki Byun

Given these observations, we propose to leverage user search history as query context to characterize users and build a context-aware ranking model for email search.

Learning-To-Rank

Paper
Add Code

SYNERGY: Building Task Bots at Scale Using Symbolic Knowledge and Machine Teaching

no code implementations • 21 Oct 2021 • Baolin Peng, Chunyuan Li, Zhu Zhang, Jinchao Li, Chenguang Zhu, Jianfeng Gao

We propose SYNERGY, a hybrid learning framework where a task bot is developed in two steps: (i) Symbolic knowledge to neural networks: Large amounts of simulated dialog sessions are generated based on task-specific symbolic knowledge which is represented as a task schema consisting of dialog flows and task-oriented databases.

Paper
Add Code

Rethinking Sentiment Style Transfer

no code implementations • Findings (EMNLP) 2021 • Ping Yu, Yang Zhao, Chunyuan Li, Changyou Chen

To overcome this issue, we propose a graph-based method to extract attribute content and attribute-independent content from input sentences in the YELP dataset and IMDB dataset.

Attribute Style Transfer +1

Paper
Add Code

Few-Shot Named Entity Recognition: An Empirical Baseline Study

no code implementations • EMNLP 2021 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents an empirical study to efficiently build named entity recognition (NER) systems when a small amount of in-domain labeled data is available.

Few-Shot Learning named-entity-recognition +2

Paper
Add Code

A Generic Approach for Enhancing GANs by Regularized Latent Optimization

no code implementations • 7 Dec 2021 • Yufan Zhou, Chunyuan Li, Changyou Chen, Jinhui Xu

With the rapidly growing model complexity and data volume, training deep generative models (DGMs) for better performance has becoming an increasingly more important challenge.

Image Inpainting text-guided-image-editing +1

Paper
Add Code

Towards Language-Free Training for Text-to-Image Generation

no code implementations • CVPR 2022 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.

Zero-Shot Text-to-Image Generation

Paper
Add Code

STT: Soft Template Tuning for Few-Shot Adaptation

no code implementations • 18 Jul 2022 • Ping Yu, Wei Wang, Chunyuan Li, Ruiyi Zhang, Zhanpeng Jin, Changyou Chen

Significantly, it can even outperform the time- and resource-consuming fine-tuning method on sentiment classification tasks.

Few-Shot Learning Language Modelling +3

Paper
Add Code

Lafite2: Few-shot Text-to-Image Generation

no code implementations • 25 Oct 2022 • Yufan Zhou, Chunyuan Li, Changyou Chen, Jianfeng Gao, Jinhui Xu

The low requirement of the proposed method yields high flexibility and usability: it can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning; it can be applied on different models including generative adversarial networks (GANs) and diffusion models.

Retrieval Text-to-Image Generation

Paper
Add Code

Scaling Vision-Language Models with Sparse Mixture of Experts

no code implementations • 13 Mar 2023 • Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He

The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs).

Paper
Add Code

Large Multimodal Models: Notes on CVPR 2023 Tutorial

no code implementations • 26 Jun 2023 • Chunyuan Li

This tutorial note summarizes the presentation on ``Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4'', a part of CVPR 2023 tutorial on ``Recent Advances in Vision Foundation Models''.

Language Modelling

Paper
Add Code

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations • 25 Jul 2023 • Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Paper
Add Code

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Paper
Add Code

BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

no code implementations • 16 Oct 2023 • Yu Gu, Jianwei Yang, Naoto Usuyama, Chunyuan Li, Sheng Zhang, Matthew P. Lungren, Jianfeng Gao, Hoifung Poon

In a comprehensive battery of tests on counterfactual medical image generation, BiomedJourney substantially outperforms prior state-of-the-art methods in instruction image editing and medical image generation such as InstructPix2Pix and RoentGen.

counterfactual Denoising +2

Paper
Add Code

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

no code implementations • 2 Nov 2023 • Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres

In particular, we instruction tune vision-language models to generate detailed visual descriptions of camera trap images using similar terminology to experts.

Paper
Add Code

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

no code implementations • NeurIPS 2023 • Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao

In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.

Instruction Following Language Modelling +2

Paper
Add Code

Training Small Multimodal Models to Bridge Biomedical Competency Gap: A Case Study in Radiology Imaging

no code implementations • 12 Mar 2024 • Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, ZiYi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Serena Yeung-Levy, Curtis P. Langlotz, Sheng Wang, Hoifung Poon

Frontier models such as GPT-4V still have major competency gaps in multimodal capabilities for biomedical applications.

Cross-Modal Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.