1 code implementation • 4 Dec 2024 • Qinwei Lin, Xiaopeng Sun, Yu Gao, Yujie Zhong, Dengjie Li, Zheng Zhao, Haoqian Wang
Our method enhances the transmission of LR information in the early stages of diffusion to guarantee image fidelity and stimulates the generation ability of the SD model itself more in the later stages to enhance the detail of generated images.
1 code implementation • 4 Dec 2024 • Xiaopeng Sun, Qinwei Lin, Yu Gao, Yujie Zhong, Chengjian Feng, Dengjie Li, Zheng Zhao, Jie Hu, Lin Ma
We posit that introducing reward feedback learning to finetune the existing models can further improve the quality of the generated images.
1 code implementation • 26 Nov 2024 • Cong Wei, Yujie Zhong, Haoxian Tan, Yong liu, Zheng Zhao, Jie Hu, Yujiu Yang
This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs).
Ranked #1 on Referring Expression Segmentation on RefCOCO+ val (using extra training data)
Large Language Model Open Vocabulary Semantic Segmentation +8
no code implementations • 11 Nov 2024 • Ahmed Telili, Wassim Hamidouche, Ibrahim Farhat, Hadi Amirpour, Christian Timmerer, Ibrahim Khadraoui, Jiajie Lu, The Van Le, Jeonneung Baek, Jin Young Lee, Yiying Wei, Xiaopeng Sun, Yu Gao, JianCheng Huangl, Yujie Zhong
In this paper, we outline the challenge framework, detailing the two competition tracks and highlighting the SR solutions proposed by the top-performing models.
no code implementations • 17 Oct 2024 • Changcheng Xiao, Qiong Cao, Yujie Zhong, Xiang Zhang, Tao Wang, Canqun Yang, Long Lan
In addition, we introduce a novel task called Referring Multi-Object Tracking and Segmentation (RMOTS) and construct a new dataset named Ref-KITTI Segmentation.
Multi-Object Tracking and Segmentation Referring Multi-Object Tracking +3
no code implementations • 9 Sep 2024 • Jiancheng Huang, Yu Gao, Zequn Jie, Yujie Zhong, Xintong Han, Lin Ma
For text reference, we align the text feature of stable diffusion priors with the style feature of our IRStyle to perform text-guided color style transfer (TRStyle).
no code implementations • 3 Jul 2024 • Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma
Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis.
no code implementations • 5 May 2024 • Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma
In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation.
1 code implementation • 12 Apr 2024 • Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma
Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.
1 code implementation • 7 Apr 2024 • Yingsen Zeng, Yujie Zhong, Chengjian Feng, Lin Ma
Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.
Ranked #2 on Natural Language Moment Retrieval on ActivityNet Captions (R@5,IoU=0.5 metric)
no code implementations • CVPR 2024 • Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma
The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.
1 code implementation • CVPR 2024 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.
2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng
More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.
3 code implementations • 11 Sep 2023 • Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, DaCheng Tao
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
Ranked #1 on Temporal Action Localization on MultiTHUMOS
no code implementations • 5 Jun 2023 • Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, DaCheng Tao
This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT.
Ranked #9 on Multi-Object Tracking on SportsMOT
1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
2 code implementations • 22 May 2023 • Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma
Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.
Ranked #1 on Video Object Tracking on SoccerNet-v2
1 code implementation • ICCV 2023 • Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma
Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.
Ranked #12 on Zero-Shot Semantic Segmentation on PASCAL VOC
1 code implementation • CVPR 2023 • Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma
To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks.
1 code implementation • CVPR 2023 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, DaCheng Tao
In this paper, we present a one-stage framework TriDet for temporal action detection.
Ranked #2 on Temporal Action Localization on EPIC-KITCHENS-100
1 code implementation • 24 Dec 2022 • Dengjie Li, Siyu Chen, Yujie Zhong, Lin Ma
In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features.
Ranked #2 on Person Re-Identification on CUHK03 detected
1 code implementation • CVPR 2023 • Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma
However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.
no code implementations • 10 Oct 2022 • Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia
However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.
7 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.
1 code implementation • 29 Aug 2022 • Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie
In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.
Ranked #3 on Exemplar-Free Counting on FSC147
1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao
Moreover, we propose two losses to facilitate and stabilize the training of action classification.
Ranked #17 on Temporal Action Localization on THUMOS’14
1 code implementation • CVPR 2022 • Sheng Guo, Zihua Xiong, Yujie Zhong, LiMin Wang, Xiaobo Guo, Bing Han, Weilin Huang
In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning.
no code implementations • CVPR 2022 • Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao
Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.
2 code implementations • 30 Mar 2022 • Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma
The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.
1 code implementation • 2 Dec 2021 • Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang
This work aims at improving instance retrieval with self-supervision.
no code implementations • 23 Sep 2021 • Xianing Chen, Chunlin Xu, Qiong Cao, Jialang Xu, Yujie Zhong, Jiale Xu, Zhengxin Li, Jingya Wang, Shenghua Gao
Transformers have shown preferable performance on many vision tasks.
1 code implementation • ICCV 2021 • Chengjian Feng, Yujie Zhong, Weilin Huang
Specifically, EBL increases the intensity of the adjustment of the decision boundary for the weak classes by a designed score-guided loss margin between any two classes.
Ranked #12 on Object Detection on LVIS v1.0 val
6 code implementations • ICCV 2021 • Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R. Scott, Weilin Huang
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.
Ranked #3 on 2D Object Detection on CeyMo
no code implementations • 9 Jul 2021 • Haoxian Tan, Sheng Guo, Yujie Zhong, Matthew R. Scott, Weilin Huang
In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS).
1 code implementation • 11 Jan 2021 • Guanting Liu, Yujie Zhong, Sheng Guo, Matthew R. Scott, Weilin Huang
To overcome this limitation, in this paper, we propose a Hierarchical Differentiable Architecture Search (H-DAS) that performs architecture search both at the cell level and at the stage level.
1 code implementation • 19 Nov 2020 • Yujie Zhong, Linhai Xie, Sen Wang, Lucia Specia, Yishu Miao
In this paper, we teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
1 code implementation • ECCV 2020 • Yujie Zhong, Zelu Deng, Sheng Guo, Matthew R. Scott, Weilin Huang
FAD consists of a designed search space and an efficient architecture search algorithm.
no code implementations • 26 Mar 2020 • Yujie Zhong, Relja Arandjelović, Andrew Zisserman
The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors.
2 code implementations • 23 Oct 2018 • Yujie Zhong, Relja Arandjelović, Andrew Zisserman
The objective of this paper is to learn a compact representation of image sets for template-based face recognition.
Ranked #3 on Face Verification on IJB-A