Search Results for author: Junnan Li

Found 40 papers, 28 papers with code

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

1 code implementation • 3 Apr 2024 • Anthony Meng Huat Tiong, Junqi Zhao, Boyang Li, Junnan Li, Steven C. H. Hoi, Caiming Xiong

Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate.

Transfer Learning

Paper
Code

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

1 code implementation • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

Visual Reasoning

Paper
Code

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

1 code implementation • 31 May 2023 • Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

1,423

Paper
Code

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

1 code implementation • NeurIPS 2023 • Dongxu Li, Junnan Li, Steven C. H. Hoi

Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.

Representation Learning Text-to-Image Generation

8,724

Paper
Code

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation • 14 May 2023 • Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Recent advancements in multimodal pre-training methods have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions.

Ranked #6 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Representation Learning +1

354

Paper
Code

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

1 code implementation • 13 May 2023 • Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

Ranked #1 on Code Search on CodeXGLUE - AdvTest

Arithmetic Reasoning Code Completion +4

2,593

Paper
Code

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

2 code implementations • NeurIPS 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.

Ranked #5 on visual instruction following on LLaVA-Bench

Video Question Answering visual instruction following +1

8,724

Paper
Code

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

12 code implementations • 30 Jan 2023 • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Ranked #1 on Image Retrieval on Flickr30k

Generative Visual Question Answering Image Captioning +10

125,059

Paper
Code

From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models

no code implementations • CVPR 2023 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi

To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

Paper
Add Code

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

3 code implementations • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi

To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

8,724

Paper
Code

Tackling Data Heterogeneity in Federated Learning with Class Prototypes

1 code implementation • 6 Dec 2022 • Yutong Dai, Zeyuan Chen, Junnan Li, Shelby Heinecke, Lichao Sun, ran Xu

We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes.

Personalized Federated Learning

Paper
Code

BotSIM: An End-to-End Bot Simulation Toolkit for Commercial Task-Oriented Dialog Systems

1 code implementation • 29 Nov 2022 • Guangsen Wang, Shafiq Joty, Junnan Li, Steven Hoi

BotSIM adopts a layered design comprising the infrastructure layer, the adaptor layer and the application layer.

User Simulation

112

Paper
Code

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

2 code implementations • 17 Oct 2022 • Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, Steven C. H. Hoi

Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting.

Ranked #2 on Visual Question Answering (VQA) on VQA v2 val

Image Captioning Network Interpretation +2

8,724

Paper
Code

LAVIS: A Library for Language-Vision Intelligence

1 code implementation • 15 Sep 2022 • Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Benchmarking Image Captioning +8

8,724

Paper
Code

Masked Unsupervised Self-training for Label-free Image Classification

1 code implementation • 7 Jun 2022 • Junnan Li, Silvio Savarese, Steven C. H. Hoi

We demonstrate the efficacy of MUST on a variety of downstream tasks, where it improves upon CLIP by a large margin.

Image Classification Representation Learning +1

103

Paper
Code

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

6 code implementations • 28 Jan 2022 • Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi

Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.

Ranked #3 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)

Image Captioning Image-text matching +5

125,059

Paper
Code

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

1 code implementation • CVPR 2022 • Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi

To achieve this, we first introduce an entity prompter module, which is trained with VTC to produce the similarity between a video crop and text prompts instantiated with entity names.

Ranked #19 on Zero-Shot Video Retrieval on DiDeMo

Entity Alignment Retrieval +3

183

Paper
Code

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

1 code implementation • 18 Nov 2021 • Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong

To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.

Object object-detection +1

Paper
Code

Improving Tail-Class Representation with Centroid Contrastive Learning

no code implementations • 19 Oct 2021 • Anthony Meng Huat Tiong, Junnan Li, Guosheng Lin, Boyang Li, Caiming Xiong, Steven C. H. Hoi

ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the interpolative image can be used to retrieve the centroids for both source classes.

Ranked #22 on Long-tail Learning on CIFAR-10-LT (ρ=10)

Contrastive Learning Image Classification +2

Paper
Add Code

Cascaded Fast and Slow Models for Efficient Semantic Code Search

no code implementations • 15 Oct 2021 • Akhilesh Deepak Gotmare, Junnan Li, Shafiq Joty, Steven C. H. Hoi

The goal of natural language semantic code search is to retrieve a semantically relevant code snippet from a fixed set of candidates using a natural language query.

Code Search Re-Ranking +1

Paper
Add Code

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

5 code implementations • NeurIPS 2021 • Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, Shafiq Joty, Caiming Xiong, Steven Hoi

Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.

Ranked #5 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)

Grounded language learning Image-text matching +8

8,724

Paper
Code

Learning From Noisy Data With Robust Representation Learning

1 code implementation • ICCV 2021 • Junnan Li, Caiming Xiong, Steven C.H. Hoi

In contrast to most existing methods, we combat noise by learning robust representation.

Contrastive Learning Representation Learning

Paper
Code

Noise-Robust Contrastive Learning

no code implementations • 1 Jan 2021 • Junnan Li, Caiming Xiong, Steven Hoi

In contrast to most existing methods, we combat noise by learning robust representation.

Contrastive Learning

Paper
Add Code

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

3 code implementations • ICCV 2021 • Junnan Li, Caiming Xiong, Steven Hoi

CoMatch jointly learns two representations of the training data, their class probabilities and low-dimensional embeddings.

Ranked #2 on Semi-Supervised Image Classification on CIFAR-10, 20 Labels

Contrastive Learning Representation Learning +2

121

Paper
Code

MoPro: Webly Supervised Learning with Momentum Prototypes

2 code implementations • ICLR 2021 • Junnan Li, Caiming Xiong, Steven C. H. Hoi

We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning.

Ranked #12 on Image Classification on OmniBenchmark (using extra training data)

Contrastive Learning Image Classification +2

Paper
Code

The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation

1 code implementation • ECCV 2020 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals.

General Classification Instance Segmentation +4

100

Paper
Code

Prototypical Contrastive Learning of Unsupervised Representations

2 code implementations • ICLR 2021 • Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi

This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.

Ranked #5 on Contrastive Learning on imagenet-1k

Clustering Contrastive Learning +4

538

Paper
Code

Improving out-of-distribution generalization via multi-task self-supervised pretraining

no code implementations • 30 Mar 2020 • Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, Richard Socher

Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness.

Ranked #116 on Domain Generalization on PACS

Adversarial Robustness Domain Generalization +4

Paper
Add Code

Towards Noise-resistant Object Detection with Noisy Annotations

no code implementations • 3 Mar 2020 • Junnan Li, Caiming Xiong, Richard Socher, Steven Hoi

We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.

Object object-detection +1

Paper
Add Code

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

2 code implementations • ICLR 2020 • Junnan Li, Richard Socher, Steven C. H. Hoi

Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data.

Ranked #3 on Learning with noisy labels on CIFAR-100N

Learning with noisy labels

509

Paper
Code

Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos

no code implementations • 9 Feb 2020 • Junnan Li, Jianquan Liu, Yongkang Wong, Shoji Nishimura, Mohan Kankanhalli

To enable research in this direction, we introduce 360Action, the first omnidirectional video dataset for multi-person action recognition.

Action Localization Action Recognition +1

Paper
Add Code

GradMix: Multi-source Transfer across Domains and Tasks

no code implementations • 9 Feb 2020 • Junnan Li, Ziwei Xu, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task.

Action Recognition Meta-Learning +1

Paper
Add Code

Classification Calibration for Long-tail Instance Segmentation

1 code implementation • 29 Oct 2019 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-tail training data based on the LVIS [5] dataset, and find a major cause is the inaccurate classification of object proposals.

Classification General Classification +3

100

Paper
Code

Visual Social Relationship Recognition

no code implementations • 13 Dec 2018 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Social relationships form the basis of social structure of humans.

Visual Social Relationship Recognition

Paper
Add Code

Learning to Learn from Noisy Labeled Data

1 code implementation • CVPR 2019 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect.

Ranked #26 on Image Classification on Clothing1M (using extra training data)

Learning with noisy labels Meta-Learning

120

Paper
Code

Unsupervised Learning of View-invariant Action Representations

1 code implementation • NeurIPS 2018 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view.

Action Recognition Representation Learning +1

Paper
Code

Interact as You Intend: Intention-Driven Human-Object Interaction Detection

no code implementations • 29 Aug 2018 • Bingjie Xu, Junnan Li, Yongkang Wong, Mohan S. Kankanhalli, Qi Zhao

The recent advances in instance-level detection tasks lay strong foundation for genuine comprehension of the visual scenes.

Human-Object Interaction Detection

Paper
Add Code

Video Storytelling: Textual Summaries for Events

no code implementations • 25 Jul 2018 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video.

Sentence

Paper
Add Code

Attention Transfer from Web Images for Video Recognition

no code implementations • 3 Aug 2017 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

However, due to the domain shift problem, the performance of Web images trained deep classifiers tend to degrade when directly deployed to videos.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Dual-Glance Model for Deciphering Social Relationships

1 code implementation • ICCV 2017 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life.

Ranked #3 on Visual Social Relationship Recognition on PIPA

object-detection Object Detection +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.