1 code implementation • 29 May 2024 • Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen
With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning.
1 code implementation • 9 Nov 2023 • Zhen Zhou, Yunkai Ma, Junfeng Fan, Zhaoyang Liu, Fengshui Jing, Min Tan
We propose linear GBB (LGBB), a novel OBB representation.
1 code implementation • 26 Oct 2023 • Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks.
1 code implementation • 5 Sep 2023 • Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou
A data recipe is a mixture of data from different sources for training LLMs, which plays a vital role in LLMs' performance.
2 code implementations • 9 May 2023 • Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao
Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.
1 code implementation • 3 Dec 2022 • Jintao Lin, Zhaoyang Liu, Wenhai Wang, Wayne Wu, LiMin Wang
Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings.
1 code implementation • ICCV 2023 • Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, Yizhou Wang
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (using extra training data)
no code implementations • 30 Jun 2022 • Jiaqi Tang, Zhaoyang Liu, Jing Tan, Chen Qian, Wayne Wu, LiMin Wang
Local context modeling sub-network is proposed to perceive diverse patterns of generic event boundaries, and it generates powerful video representations and reliable boundary confidence.
2 code implementations • 25 Apr 2022 • Haoyue Cheng, Zhaoyang Liu, Hang Zhou, Chen Qian, Wayne Wu, LiMin Wang
This paper focuses on the weakly-supervised audio-visual video parsing task, which aims to recognize all events belonging to each modality and localize their temporal boundaries.
3 code implementations • CVPR 2022 • Jiaqi Tang, Zhaoyang Liu, Chen Qian, Wayne Wu, LiMin Wang
Generic event boundary detection is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries.
no code implementations • 28 May 2021 • Xu Xie, Zhaoyang Liu, Shiwen Wu, Fei Sun, Cihang Liu, Jiawei Chen, Jinyang Gao, Bin Cui, Bolin Ding
It is based on the idea that similar users not only have a similar taste on items, but also have similar treatment effect under recommendations.
2 code implementations • 18 Feb 2021 • Liming Jiang, Zhengkui Guo, Wayne Wu, Zhaoyang Liu, Ziwei Liu, Chen Change Loy, Shuo Yang, Yuanjun Xiong, Wei Xia, Baoying Chen, Peiyu Zhuang, Sili Li, Shen Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Liujuan Cao, Rongrong Ji, Changlei Lu, Ganchao Tan
This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection.
no code implementations • 1 Jan 2021 • Haokun Chen, Zhaoyang Liu, Chen Xu, Ziqian Chen, Jinyang Gao, Bolin Ding
In this paper, we propose a novel recommendation framework which effectively utilizes the information of user uncertainty over different item dimensions and explicitly takes into consideration the impact of display policy on user in order to achieve maximal expected posterior utility for the platform.
1 code implementation • 27 Oct 2020 • Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Bolin Ding, Bin Cui
Sequential recommendation methods play a crucial role in modern recommender systems because of their ability to capture a user's dynamic interest from her/his historical interactions.