1 code implementation • 17 Jul 2024 • Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu
To approach this evaluation trilemma, we further introduce LMMS-EVAL LITE, a pruned evaluation toolkit that emphasizes both coverage and efficiency.
2 code implementations • 24 Jun 2024 • Peiyuan Zhang, Kaichen Zhang, Bo Li, Guangtao Zeng, Jingkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu
By simply extrapolating the context length of the language backbone, we enable LMMs to comprehend orders of magnitude more visual tokens without any video training.
3 code implementations • NeurIPS 2023 • Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu
To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.
no code implementations • 6 May 2024 • Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu
Multimodal information, together with our knowledge, help us to understand the complex and dynamic world.
1 code implementation • 29 Mar 2024 • Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa
This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD).
no code implementations • CVPR 2024 • Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy
We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.
3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu
PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.
1 code implementation • 7 Nov 2023 • Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu
In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.
Ranked #124 on Visual Question Answering on MM-Vet
1 code implementation • 12 Oct 2023 • Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu
Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.
1 code implementation • 17 Jul 2023 • Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu
Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
1 code implementation • 26 Jun 2023 • Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu
Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.
2 code implementations • 15 Jun 2023 • Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li
Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
2 code implementations • 8 Jun 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu
We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.
Ranked #19 on Visual Question Answering on MM-Vet v2
1 code implementation • 23 May 2023 • Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen
The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images.
3D Panoptic Segmentation Open Vocabulary Semantic Segmentation +2
1 code implementation • 5 May 2023 • Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Jingkang Yang, Ziwei Liu
Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, effectively following natural language instructions to accomplish real-world tasks.
Ranked #8 on Visual Question Answering on BenchLMM
4 code implementations • 13 Oct 2022 • Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu
Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature.
2 code implementations • 15 Sep 2022 • Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu
Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD.
1 code implementation • 22 Jul 2022 • Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu
Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i. e., objects are detected using bounding boxes followed by prediction of their pairwise relationships.
Ranked #5 on Panoptic Scene Graph Generation on PSG Dataset
2 code implementations • 8 Jun 2022 • Bo Li, Yifei Shen, Jingkang Yang, Yezhen Wang, Jiawei Ren, Tong Che, Jun Zhang, Ziwei Liu
It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets.
Ranked #17 on Domain Generalization on DomainNet (using extra training data)
2 code implementations • 11 Apr 2022 • Jingkang Yang, Kaiyang Zhou, Ziwei Liu
In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
9 code implementations • CVPR 2022 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets.
Ranked #3 on Prompt Engineering on ImageNet V2
4 code implementations • 21 Oct 2021 • Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu
In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i. e., AD, ND, OSR, OOD detection, and OD.
15 code implementations • 2 Sep 2021 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu
Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks.
Ranked #2 on Few-shot Age Estimation on MORPH Album2
2 code implementations • ICCV 2021 • Jingkang Yang, Haoqi Wang, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang, Ziwei Liu
The proposed UDG can not only enrich the semantic knowledge of the model by exploiting unlabeled data in an unsupervised manner, but also distinguish ID/OOD samples to enhance ID classification and OOD detection tasks simultaneously.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • 13 Aug 2021 • Xiaopeng Yan, Riquan Chen, Litong Feng, Jingkang Yang, Huabin Zheng, Wayne Zhang
In this paper, we propose to label only the most representative samples to expand the labeled set.
1 code implementation • 20 Feb 2021 • Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters.
1 code implementation • 12 Oct 2020 • Jingkang Yang, Weirong Chen, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang
VSGraph-LC starts from anchor selection referring to the semantic similarity between metadata and correct label concepts, and then propagates correct labels from anchors on a visual graph using graph neural network (GNN).
Ranked #9 on Image Classification on WebVision-1000
4 code implementations • ECCV 2020 • Jingkang Yang, Litong Feng, Weirong Chen, Xiaopeng Yan, Huabin Zheng, Ping Luo, Wayne Zhang
Therefore, a simple yet effective WSL framework is proposed.
Ranked #7 on Image Classification on WebVision-1000
no code implementations • 29 Nov 2016 • Jingkang Yang, Haohan Wang, Jun Zhu, Eric P. Xing
In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms.