Search Results for author: Yujie Zhong

Found 39 papers, 27 papers with code

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution

1 code implementation4 Dec 2024 Qinwei Lin, Xiaopeng Sun, Yu Gao, Yujie Zhong, Dengjie Li, Zheng Zhao, Haoqian Wang

Our method enhances the transmission of LR information in the early stages of diffusion to guarantee image fidelity and stimulates the generation ability of the SD model itself more in the later stages to enhance the detail of generated images.

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

1 code implementation4 Dec 2024 Xiaopeng Sun, Qinwei Lin, Yu Gao, Yujie Zhong, Chengjian Feng, Dengjie Li, Zheng Zhao, Jie Hu, Lin Ma

We posit that introducing reward feedback learning to finetune the existing models can further improve the quality of the generated images.

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

1 code implementation26 Nov 2024 Cong Wei, Yujie Zhong, Haoxian Tan, Yong liu, Zheng Zhao, Jie Hu, Yujiu Yang

This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs).

 Ranked #1 on Referring Expression Segmentation on RefCOCO+ val (using extra training data)

Large Language Model Open Vocabulary Semantic Segmentation +8

Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation

no code implementations17 Oct 2024 Changcheng Xiao, Qiong Cao, Yujie Zhong, Xiang Zhang, Tao Wang, Canqun Yang, Long Lan

In addition, we introduce a novel task called Referring Multi-Object Tracking and Segmentation (RMOTS) and construct a new dataset named Ref-KITTI Segmentation.

Multi-Object Tracking and Segmentation Referring Multi-Object Tracking +3

MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference

no code implementations9 Sep 2024 Jiancheng Huang, Yu Gao, Zequn Jie, Yujie Zhong, Xintong Han, Lin Ma

For text reference, we align the text feature of stable diffusion priors with the style feature of our IRStyle to perform text-guided color style transfer (TRStyle).

Style Transfer

Matten: Video Generation with Mamba-Attention

no code implementations5 May 2024 Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation.

Mamba Video Generation

LaSagnA: Language-based Segmentation Assistant for Complex Queries

1 code implementation12 Apr 2024 Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma

Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.

Segmentation Semantic Segmentation

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

1 code implementation7 Apr 2024 Yingsen Zeng, Yujie Zhong, Chengjian Feng, Lin Ma

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.

Action Detection Moment Queries +4

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

no code implementations CVPR 2024 Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.

Object object-detection +1

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation CVPR 2024 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.

Text-to-Image Generation Visual Storytelling

SoccerNet 2023 Challenges Results

2 code implementations12 Sep 2023 Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng

More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.

Action Spotting Camera Calibration +4

MotionTrack: Learning Motion Predictor for Multiple Object Tracking

no code implementations5 Jun 2023 Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, DaCheng Tao

This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT.

motion prediction Multi-Object Tracking +2

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation1 Jun 2023 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

2 code implementations22 May 2023 Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma

Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.

Multi-Object Tracking Video Object Tracking

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

1 code implementation ICCV 2023 Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.

Classification Language Modelling +4

Adaptive Sparse Pairwise Loss for Object Re-Identification

1 code implementation CVPR 2023 Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma

To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks.

Object

DiP: Learning Discriminative Implicit Parts for Person Re-Identification

1 code implementation24 Dec 2022 Dengjie Li, Siyu Chen, Yujie Zhong, Lin Ma

In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features.

Person Re-Identification Position

AeDet: Azimuth-invariant Multi-view 3D Object Detection

1 code implementation CVPR 2023 Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma

However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.

3D Object Detection Depth Estimation +3

Contrastive Video-Language Learning with Fine-grained Frame Sampling

no code implementations10 Oct 2022 Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia

However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.

Question Answering Representation Learning +3

SoccerNet 2022 Challenges Results

7 code implementations5 Oct 2022 Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

CounTR: Transformer-based Generalised Visual Counting

1 code implementation29 Aug 2022 Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie

In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.

Exemplar-Free Counting Self-Supervised Learning

Cross-Architecture Self-supervised Video Representation Learning

1 code implementation CVPR 2022 Sheng Guo, Zihua Xiong, Yujie Zhong, LiMin Wang, Xiaobo Guo, Bing Han, Weilin Huang

In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning.

Action Recognition Contrastive Learning +4

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

no code implementations CVPR 2022 Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao

Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.

Knowledge Distillation

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations30 Mar 2022 Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

InsCLR: Improving Instance Retrieval with Self-Supervision

1 code implementation2 Dec 2021 Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang

This work aims at improving instance retrieval with self-supervision.

Retrieval

Exploring Classification Equilibrium in Long-Tailed Object Detection

1 code implementation ICCV 2021 Chengjian Feng, Yujie Zhong, Weilin Huang

Specifically, EBL increases the intensity of the adjustment of the decision boundary for the weak classes by a designed score-guided loss margin between any two classes.

Classification imbalanced classification +5

TOOD: Task-aligned One-stage Object Detection

6 code implementations ICCV 2021 Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R. Scott, Weilin Huang

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

Object object-detection +1

Mutually-aware Sub-Graphs Differentiable Architecture Search

no code implementations9 Jul 2021 Haoxian Tan, Sheng Guo, Yujie Zhong, Matthew R. Scott, Weilin Huang

In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS).

Unchain the Search Space with Hierarchical Differentiable Architecture Search

1 code implementation11 Jan 2021 Guanting Liu, Yujie Zhong, Sheng Guo, Matthew R. Scott, Weilin Huang

To overcome this limitation, in this paper, we propose a Hierarchical Differentiable Architecture Search (H-DAS) that performs architecture search both at the cell level and at the stage level.

Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision

1 code implementation19 Nov 2020 Yujie Zhong, Linhai Xie, Sen Wang, Lucia Specia, Yishu Miao

In this paper, we teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.

Retrieval Self-Supervised Learning

Compact Deep Aggregation for Set Retrieval

no code implementations26 Mar 2020 Yujie Zhong, Relja Arandjelović, Andrew Zisserman

The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors.

Retrieval

GhostVLAD for set-based face recognition

2 code implementations23 Oct 2018 Yujie Zhong, Relja Arandjelović, Andrew Zisserman

The objective of this paper is to learn a compact representation of image sets for template-based face recognition.

Face Recognition Face Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.