Search Results for author: Junnan Li

Found 40 papers, 28 papers with code

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

1 code implementation3 Apr 2024 Anthony Meng Huat Tiong, Junqi Zhao, Boyang Li, Junnan Li, Steven C. H. Hoi, Caiming Xiong

Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate.

Transfer Learning

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

1 code implementation30 Nov 2023 Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

Visual Reasoning

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

1 code implementation31 May 2023 Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

1 code implementation NeurIPS 2023 Dongxu Li, Junnan Li, Steven C. H. Hoi

Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.

Representation Learning Text-to-Image Generation

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation14 May 2023 Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Recent advancements in multimodal pre-training methods have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions.

Ranked #6 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Representation Learning +1

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

1 code implementation13 May 2023 Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

Arithmetic Reasoning Code Completion +4

From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models

no code implementations CVPR 2023 Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi

To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

3 code implementations21 Dec 2022 Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi

To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

Tackling Data Heterogeneity in Federated Learning with Class Prototypes

1 code implementation6 Dec 2022 Yutong Dai, Zeyuan Chen, Junnan Li, Shelby Heinecke, Lichao Sun, ran Xu

We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes.

Personalized Federated Learning

BotSIM: An End-to-End Bot Simulation Toolkit for Commercial Task-Oriented Dialog Systems

1 code implementation29 Nov 2022 Guangsen Wang, Shafiq Joty, Junnan Li, Steven Hoi

BotSIM adopts a layered design comprising the infrastructure layer, the adaptor layer and the application layer.

User Simulation

LAVIS: A Library for Language-Vision Intelligence

1 code implementation15 Sep 2022 Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Benchmarking Image Captioning +8

Masked Unsupervised Self-training for Label-free Image Classification

1 code implementation7 Jun 2022 Junnan Li, Silvio Savarese, Steven C. H. Hoi

We demonstrate the efficacy of MUST on a variety of downstream tasks, where it improves upon CLIP by a large margin.

Image Classification Representation Learning +1

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

6 code implementations28 Jan 2022 Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi

Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.

Ranked #3 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)

Image Captioning Image-text matching +5

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

1 code implementation CVPR 2022 Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi

To achieve this, we first introduce an entity prompter module, which is trained with VTC to produce the similarity between a video crop and text prompts instantiated with entity names.

Entity Alignment Retrieval +3

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

1 code implementation18 Nov 2021 Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong

To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.

Object object-detection +1

Improving Tail-Class Representation with Centroid Contrastive Learning

no code implementations19 Oct 2021 Anthony Meng Huat Tiong, Junnan Li, Guosheng Lin, Boyang Li, Caiming Xiong, Steven C. H. Hoi

ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the interpolative image can be used to retrieve the centroids for both source classes.

Contrastive Learning Image Classification +2

Cascaded Fast and Slow Models for Efficient Semantic Code Search

no code implementations15 Oct 2021 Akhilesh Deepak Gotmare, Junnan Li, Shafiq Joty, Steven C. H. Hoi

The goal of natural language semantic code search is to retrieve a semantically relevant code snippet from a fixed set of candidates using a natural language query.

Code Search Re-Ranking +1

Noise-Robust Contrastive Learning

no code implementations1 Jan 2021 Junnan Li, Caiming Xiong, Steven Hoi

In contrast to most existing methods, we combat noise by learning robust representation.

Contrastive Learning

MoPro: Webly Supervised Learning with Momentum Prototypes

2 code implementations ICLR 2021 Junnan Li, Caiming Xiong, Steven C. H. Hoi

We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning.

Ranked #12 on Image Classification on OmniBenchmark (using extra training data)

Contrastive Learning Image Classification +2

The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation

1 code implementation ECCV 2020 Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals.

General Classification Instance Segmentation +4

Prototypical Contrastive Learning of Unsupervised Representations

2 code implementations ICLR 2021 Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi

This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.

Clustering Contrastive Learning +4

Improving out-of-distribution generalization via multi-task self-supervised pretraining

no code implementations30 Mar 2020 Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, Richard Socher

Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness.

Adversarial Robustness Domain Generalization +4

Towards Noise-resistant Object Detection with Noisy Annotations

no code implementations3 Mar 2020 Junnan Li, Caiming Xiong, Richard Socher, Steven Hoi

We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.

Object object-detection +1

Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos

no code implementations9 Feb 2020 Junnan Li, Jianquan Liu, Yongkang Wong, Shoji Nishimura, Mohan Kankanhalli

To enable research in this direction, we introduce 360Action, the first omnidirectional video dataset for multi-person action recognition.

Action Localization Action Recognition +1

GradMix: Multi-source Transfer across Domains and Tasks

no code implementations9 Feb 2020 Junnan Li, Ziwei Xu, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task.

Action Recognition Meta-Learning +1

Classification Calibration for Long-tail Instance Segmentation

1 code implementation29 Oct 2019 Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-tail training data based on the LVIS [5] dataset, and find a major cause is the inaccurate classification of object proposals.

Classification General Classification +3

Learning to Learn from Noisy Labeled Data

1 code implementation CVPR 2019 Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect.

Ranked #26 on Image Classification on Clothing1M (using extra training data)

Learning with noisy labels Meta-Learning

Unsupervised Learning of View-invariant Action Representations

1 code implementation NeurIPS 2018 Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view.

Action Recognition Representation Learning +1

Interact as You Intend: Intention-Driven Human-Object Interaction Detection

no code implementations29 Aug 2018 Bingjie Xu, Junnan Li, Yongkang Wong, Mohan S. Kankanhalli, Qi Zhao

The recent advances in instance-level detection tasks lay strong foundation for genuine comprehension of the visual scenes.

Human-Object Interaction Detection

Video Storytelling: Textual Summaries for Events

no code implementations25 Jul 2018 Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video.

Sentence

Attention Transfer from Web Images for Video Recognition

no code implementations3 Aug 2017 Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli

However, due to the domain shift problem, the performance of Web images trained deep classifiers tend to degrade when directly deployed to videos.

Action Recognition Temporal Action Localization +1

Dual-Glance Model for Deciphering Social Relationships

1 code implementation ICCV 2017 Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life.

object-detection Object Detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.