no code implementations • 11 Apr 2024 • Guangzhi Wang, Tianyi Chen, Kamran Ghasedi, HsiangTao Wu, Tianyu Ding, Chris Nuesmeyer, Ilya Zharkov, Mohan Kankanhalli, Luming Liang
S3Editor is model-agnostic and compatible with various editing approaches.
1 code implementation • CVPR 2024 • Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
Multimodal large language models (MLLMs) building upon the foundation of powerful large language models (LLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).
no code implementations • CVPR 2024 • Guangzhi Wang, Yangyang Guo, Ziwei Xu, Mohan Kankanhalli
Human-Object Interaction (HOI) Detection constitutes an important aspect of human-centric scene understanding which requires precise object detection and interaction recognition.
no code implementations • 1 Dec 2023 • Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape.
2 code implementations • 28 Nov 2023 • Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).
1 code implementation • CVPR 2024 • Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli
This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
3 code implementations • 30 Jul 2023 • Bohao Li, Rui Wang, Guangzhi Wang, Yuying Ge, Yixiao Ge, Ying Shan
Based on powerful Large Language Models (LLMs), recent generative Multimodal Large Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting remarkable capability for both comprehension and generation.
no code implementations • 19 Jul 2023 • Guangzhi Wang, Yangyang Guo, Mohan Kankanhalli
Human-Object Interaction Detection is a crucial aspect of human-centric scene understanding, with important applications in various domains.
no code implementations • 27 Jun 2023 • Haowei Li, Wenqing Yan, Du Liu, Long Qian, Yuxing Yang, Yihao Liu, Zhe Zhao, Hui Ding, Guangzhi Wang
The head surface is reconstructed using depth data for spatial registration, avoiding fixing tracking targets rigidly on the patient's skull.
1 code implementation • 20 May 2023 • Guangzhi Wang, Yixiao Ge, Xiaohan Ding, Mohan Kankanhalli, Ying Shan
In our benchmark, which is curated to evaluate MLLMs visual semantic understanding and fine-grained perception capabilities, we discussed different visual tokenizers pre-trained with dominant methods (i. e., DeiT, CLIP, MAE, DINO), and observe that: i) Fully/weakly supervised models capture more semantics than self-supervised models, but the gap is narrowed by scaling up the pre-training dataset.
no code implementations • 13 Jan 2023 • Guangzhi Wang, Hehe Fan, Mohan Kankanhalli
To overcome these two challenges, we propose a unified Relation-Enhanced Transformer (RET) to improve representation discriminability for both point cloud and natural language queries.
1 code implementation • 6 Jul 2022 • Guangzhi Wang, Yangyang Guo, Yongkang Wong, Mohan Kankanhalli
To quantitatively study the object bias problem, we advocate a new protocol for evaluating model performance.
1 code implementation • 5 Jul 2022 • Guangzhi Wang, Yangyang Guo, Yongkang Wong, Mohan Kankanhalli
2) Insufficient number of distant interactions in benchmark datasets results in under-fitting on these instances.
1 code implementation • 10 Aug 2021 • Ziwei Xu, Guangzhi Wang, Yongkang Wong, Mohan Kankanhalli
The concept module generates semantically meaningful features for primitive concepts, whereas the visual module extracts visual features for attributes and objects from input images.
1 code implementation • 22 Nov 2019 • Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer
Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA).
Domain Adaptation
Multi-Source Unsupervised Domain Adaptation