no code implementations • 18 Dec 2024 • Haowei Liu, Xuyang Wu, Guohao Sun, Zhiqiang Tao, Yi Fang
Large language models (LLMs) have demonstrated remarkable effectiveness in text reranking through works like RankGPT, leveraging their human-like reasoning about relevance.
no code implementations • 3 Dec 2024 • Ting-Ruen Wei, Haowei Liu, Huei-Chung Hu, Xuyang Wu, Yi Fang, Hsin-Tai Wu
Experiments show that our methodology performs on par with state-of-the-art models on standard test datasets and outperforms them when images are slightly rotated/ flipped or full range head pose.
1 code implementation • 9 Aug 2024 • Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks.
Ranked #6 on
Video Question Answering
on NExT-QA
no code implementations • 2 Aug 2024 • Huei-Chung Hu, Xuyang Wu, Haowei Liu, Ting-Ruen Wei, Hsin-Tai Wu
Many head pose estimation (HPE) methods promise the ability to create full-range datasets, theoretically allowing the estimation of the rotation and positioning of the head from various angles.
no code implementations • 21 Jul 2024 • Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu
However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images underexplored.
no code implementations • 1 Mar 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu
In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.
no code implementations • 26 Feb 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu
In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.
1 code implementation • 22 Feb 2024 • Zicheng Lin, Zhibin Gou, Tian Liang, Ruilin Luo, Haowei Liu, Yujiu Yang
Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i. e., GQC reasoning.
1 code implementation • 29 Jan 2024 • Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Wenjuan Li, Bing Li, Weiming Hu
In this paper, we will introduce a benchmark dataset named "NFT Top1000 Visual-Text Dataset" (NFT1000), containing 7. 56 million image-text pairs, and being collected from 1000 most famous PFP1 NFT collections2 by sales volume on the Ethereum blockchain.
2 code implementations • CVPR 2024 • Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.
Ranked #10 on
Long-Context Understanding
on MMNeedle