1 code implementation • 3 Apr 2024 • Anthony Meng Huat Tiong, Junqi Zhao, Boyang Li, Junnan Li, Steven C. H. Hoi, Caiming Xiong
Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate.
1 code implementation • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).
1 code implementation • 31 May 2023 • Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi
In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
1 code implementation • NeurIPS 2023 • Dongxu Li, Junnan Li, Steven C. H. Hoi
Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.
1 code implementation • 14 May 2023 • Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese
It achieves a new SOTA of 50. 6% (top-1) on Objaverse-LVIS and 84. 7% (top-1) on ModelNet40 in zero-shot classification.
Ranked #6 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)
2 code implementations • 13 May 2023 • Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi
To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
Ranked #1 on Code Search on CodeXGLUE - AdvTest
2 code implementations • NeurIPS 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi
Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.
Ranked #5 on Visual Question Answering on BenchLMM
12 code implementations • 30 Jan 2023 • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Ranked #1 on Image Retrieval on Flickr30k
no code implementations • CVPR 2023 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi
To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.
3 code implementations • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi
To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.
1 code implementation • 6 Dec 2022 • Yutong Dai, Zeyuan Chen, Junnan Li, Shelby Heinecke, Lichao Sun, ran Xu
We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes.
1 code implementation • 29 Nov 2022 • Guangsen Wang, Shafiq Joty, Junnan Li, Steven Hoi
BotSIM adopts a layered design comprising the infrastructure layer, the adaptor layer and the application layer.
2 code implementations • 17 Oct 2022 • Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, Steven C. H. Hoi
Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting.
Ranked #2 on Visual Question Answering (VQA) on VQA v2 val
1 code implementation • 15 Sep 2022 • Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi
We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.
1 code implementation • 7 Jun 2022 • Junnan Li, Silvio Savarese, Steven C. H. Hoi
We demonstrate the efficacy of MUST on a variety of downstream tasks, where it improves upon CLIP by a large margin.
6 code implementations • 28 Jan 2022 • Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Ranked #3 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)
1 code implementation • CVPR 2022 • Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi
To achieve this, we first introduce an entity prompter module, which is trained with VTC to produce the similarity between a video crop and text prompts instantiated with entity names.
Ranked #19 on Zero-Shot Video Retrieval on DiDeMo
1 code implementation • 18 Nov 2021 • Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong
To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.
no code implementations • 19 Oct 2021 • Anthony Meng Huat Tiong, Junnan Li, Guosheng Lin, Boyang Li, Caiming Xiong, Steven C. H. Hoi
ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the interpolative image can be used to retrieve the centroids for both source classes.
Ranked #22 on Long-tail Learning on CIFAR-10-LT (ρ=10)
no code implementations • 15 Oct 2021 • Akhilesh Deepak Gotmare, Junnan Li, Shafiq Joty, Steven C. H. Hoi
The goal of natural language semantic code search is to retrieve a semantically relevant code snippet from a fixed set of candidates using a natural language query.
5 code implementations • NeurIPS 2021 • Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, Shafiq Joty, Caiming Xiong, Steven Hoi
Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.
Ranked #5 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)
1 code implementation • ICCV 2021 • Junnan Li, Caiming Xiong, Steven C.H. Hoi
In contrast to most existing methods, we combat noise by learning robust representation.
no code implementations • 1 Jan 2021 • Junnan Li, Caiming Xiong, Steven Hoi
In contrast to most existing methods, we combat noise by learning robust representation.
3 code implementations • ICCV 2021 • Junnan Li, Caiming Xiong, Steven Hoi
CoMatch jointly learns two representations of the training data, their class probabilities and low-dimensional embeddings.
2 code implementations • ICLR 2021 • Junnan Li, Caiming Xiong, Steven C. H. Hoi
We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning.
Ranked #12 on Image Classification on OmniBenchmark (using extra training data)
1 code implementation • ECCV 2020 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng
Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals.
2 code implementations • ICLR 2021 • Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.
Ranked #5 on Contrastive Learning on imagenet-1k
no code implementations • 30 Mar 2020 • Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, Richard Socher
Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness.
Ranked #116 on Domain Generalization on PACS
no code implementations • 3 Mar 2020 • Junnan Li, Caiming Xiong, Richard Socher, Steven Hoi
We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.
2 code implementations • ICLR 2020 • Junnan Li, Richard Socher, Steven C. H. Hoi
Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data.
Ranked #3 on Learning with noisy labels on CIFAR-100N
no code implementations • 9 Feb 2020 • Junnan Li, Jianquan Liu, Yongkang Wong, Shoji Nishimura, Mohan Kankanhalli
To enable research in this direction, we introduce 360Action, the first omnidirectional video dataset for multi-person action recognition.
no code implementations • 9 Feb 2020 • Junnan Li, Ziwei Xu, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task.
1 code implementation • 29 Oct 2019 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven Hoi, Jiashi Feng
In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-tail training data based on the LVIS [5] dataset, and find a major cause is the inaccurate classification of object proposals.
1 code implementation • CVPR 2019 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
Despite the success of deep neural networks (DNNs) in image classification tasks, the human-level performance relies on massive training data with high-quality manual annotations, which are expensive and time-consuming to collect.
Ranked #26 on Image Classification on Clothing1M (using extra training data)
no code implementations • 13 Dec 2018 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
Social relationships form the basis of social structure of humans.
1 code implementation • NeurIPS 2018 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view.
no code implementations • 29 Aug 2018 • Bingjie Xu, Junnan Li, Yongkang Wong, Mohan S. Kankanhalli, Qi Zhao
The recent advances in instance-level detection tasks lay strong foundation for genuine comprehension of the visual scenes.
no code implementations • 25 Jul 2018 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video.
no code implementations • 3 Aug 2017 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
However, due to the domain shift problem, the performance of Web images trained deep classifiers tend to degrade when directly deployed to videos.
1 code implementation • ICCV 2017 • Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life.
Ranked #3 on Visual Social Relationship Recognition on PIPA