no code implementations • 11 May 2025 • Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, PengFei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui Chen, Yanwei Li, Yanxu Hu, Yi Lin, Yiyuan Hu, Yiyuan Zhang, Youbin Wu, Yu Li, Yudong Liu, Yue Ling, Yujia Qin, Zanbo Wang, Zhiwu He, Aoxue Zhang, Bairen Yi, Bencheng Liao, Can Huang, Can Zhang, Chaorui Deng, Chaoyi Deng, Cheng Lin, Cheng Yuan, Chenggang Li, Chenhui Gou, Chenwei Lou, Chengzhi Wei, Chundian Liu, Chunyuan Li, Deyao Zhu, Donghong Zhong, Feng Li, Feng Zhang, Gang Wu, Guodong Li, Guohong Xiao, Haibin Lin, Haihua Yang, Haoming Wang, Heng Ji, Hongxiang Hao, Hui Shen, Huixia Li, Jiahao Li, Jialong Wu, Jianhua Zhu, Jianpeng Jiao, Jiashi Feng, Jiaze Chen, Jianhui Duan, Jihao Liu, Jin Zeng, Jingqun Tang, Jingyu Sun, Joya Chen, Jun Long, Junda Feng, Junfeng Zhan, Junjie Fang, Junting Lu, Kai Hua, Kai Liu, Kai Shen, Kaiyuan Zhang, Ke Shen, Ke Wang, Keyu Pan, Kun Zhang, Kunchang Li, Lanxin Li, Lei LI, Lei Shi, Li Han, Liang Xiang, Liangqiang Chen, Lin Chen, Lin Li, Lin Yan, Liying Chi, Longxiang Liu, Mengfei Du, Mingxuan Wang, Ningxin Pan, Peibin Chen, Pengfei Chen, Pengfei Wu, Qingqing Yuan, Qingyao Shuai, Qiuyan Tao, Renjie Zheng, Renrui Zhang, Ru Zhang, Rui Wang, Rui Yang, Rui Zhao, Shaoqiang Xu, Shihao Liang, Shipeng Yan, Shu Zhong, Shuaishuai Cao, Shuangzhi Wu, Shufan Liu, Shuhan Chang, Songhua Cai, Tenglong Ao, Tianhao Yang, Tingting Zhang, Wanjun Zhong, Wei Jia, Wei Weng, Weihao Yu, Wenhao Huang, Wenjia Zhu, Wenli Yang, Wenzhi Wang, Xiang Long, XiangRui Yin, Xiao Li, Xiaolei Zhu, Xiaoying Jia, Xijin Zhang, Xin Liu, Xinchen Zhang, Xinyu Yang, Xiongcai Luo, Xiuli Chen, Xuantong Zhong, Xuefeng Xiao, Xujing Li, Yan Wu, Yawei Wen, Yifan Du, Yihao Zhang, Yining Ye, Yonghui Wu, Yu Liu, Yu Yue, Yufeng Zhou, Yufeng Yuan, Yuhang Xu, Yuhong Yang, Yun Zhang, Yunhao Fang, Yuntao Li, Yurui Ren, Yuwen Xiong, Zehua Hong, Zehua Wang, Zewei Sun, Zeyu Wang, Zhao Cai, Zhaoyue Zha, Zhecheng An, Zhehui Zhao, Zhengzhuo Xu, Zhipeng Chen, Zhiyong Wu, Zhuofan Zheng, ZiHao Wang, Zilong Huang, Ziyu Zhu, Zuquan Song
We present Seed1. 5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.
1 code implementation • 16 Dec 2024 • Chaorui Deng, Deyao Zhu, Kunchang Li, Shi Guang, Haoqi Fan
We introduce Causal Diffusion as the autoregressive (AR) counterpart of Diffusion models.
1 code implementation • 16 Aug 2023 • Qi Chen, Chaorui Deng, Zixiong Huang, BoWen Zhang, Mingkui Tan, Qi Wu
In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i. e., a higher likelihood indicates better perceptual quality and better text-image alignment.
1 code implementation • ICCV 2023 • Chaorui Deng, Qi Chen, Pengda Qin, Da Chen, Qi Wu
In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e. g., CLIP) by adapting them to the video domain.
1 code implementation • ICCV 2023 • Chaorui Deng, Da Chen, Qi Wu
In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.
Ranked #13 on
Video Object Detection
on ImageNet VID
1 code implementation • 17 Sep 2022 • Qi Chen, Chaorui Deng, Qi Wu
Our innovative idea is to explore the rich modes in the training caption corpus to learn a set of "mode embeddings", and further use them to control the mode of the generated captions for existing image captioning models.
no code implementations • CVPR 2021 • Chaorui Deng, ShiZhe Chen, Da Chen, Yuan He, Qi Wu
The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.
no code implementations • 10 Oct 2020 • Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan
Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference.
1 code implementation • ECCV 2020 • Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu
We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.
no code implementations • 19 Jul 2020 • Yanyuan Qiao, Chaorui Deng, Qi Wu
In this survey, we first examine the state of the art by comparing modern approaches to the problem.
42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Ranked #1 on
Object Detection
on COCO test-dev
(Hardware Burden metric)
no code implementations • 12 Feb 2019 • Chaorui Deng, Qi Wu, Guanghui Xu, Zhuliang Yu, Yanwu Xu, Kui Jia, Mingkui Tan
Most state-of-the-art methods in VG operate in a two-stage manner, wherein the first stage an object detector is adopted to generate a set of object proposals from the input image and the second stage is simply formulated as a cross-modal matching problem that finds the best match between the language query and all region proposals.
no code implementations • CVPR 2018 • Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan
There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object.