Search Results for author: Zehuan Yuan

Found 25 papers, 10 papers with code

Language as Queries for Referring Video Object Segmentation

2 code implementations3 Jan 2022 Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.

Object Tracking Referring Expression Segmentation +4

Trimap-guided Feature Mining and Fusion Network for Natural Image Matting

no code implementations1 Dec 2021 Weihao Jiang, Dongdong Yu, Zhaozhi Xie, Yaoyi Li, Zehuan Yuan, Hongtao Lu

For emerging content-based feature fusion, most existing matting methods only focus on local features which lack the guidance of a global feature with strong semantic information related to the interesting object.

Image Matting Matting

Disentangled Contrastive Learning on Graphs

no code implementations NeurIPS 2021 Haoyang Li, Xin Wang, Ziwei Zhang, Zehuan Yuan, Hang Li, Wenwu Zhu

Then we propose a novel factor-wise discrimination objective in a contrastive learning manner, which can force the factorized representations to independently reflect the expressive information from different latent factors.

Contrastive Learning Self-Supervised Learning

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

1 code implementation29 Nov 2021 Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Multi-Object Tracking Object Detection +1

Focal and Global Knowledge Distillation for Detectors

1 code implementation23 Nov 2021 Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan

Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation.

Knowledge Distillation Object Detection

Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

no code implementations10 Nov 2021 Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan

Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving.

Vision and Language Navigation

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

7 code implementations arXiv 2021 Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos.

 Ranked #1 on Multi-Object Tracking on MOT17 (using extra training data)

Multi-Object Tracking

Objects in Semantic Topology

no code implementations6 Oct 2021 Shuo Yang, Peize Sun, Yi Jiang, Xiaobo Xia, Ruiheng Zhang, Zehuan Yuan, Changhu Wang, Ping Luo, Min Xu

A more realistic object detection paradigm, Open-World Object Detection, has arisen increasing research interests in the community recently.

Incremental Learning Language Modelling +1

Memory Based Video Scene Parsing

no code implementations1 Sep 2021 Zhenchao Jin, Dongdong Yu, Kai Su, Zehuan Yuan, Changhu Wang

Video scene parsing is a long-standing challenging task in computer vision, aiming to assign pre-defined semantic labels to pixels of all frames in a given video.

Scene Parsing Semantic Segmentation

Center Prediction Loss for Re-identification

no code implementations30 Apr 2021 Lu Yang, Yunlong Wang, Lingqiao Liu, Peng Wang, Lu Chi, Zehuan Yuan, Changhu Wang, Yanning Zhang

In this paper, we propose a new loss based on center predictivity, that is, a sample must be positioned in a location of the feature space such that from it we can roughly predict the location of the center of same-class samples.

Conditional Meta-Network for Blind Super-Resolution with Multiple Degradations

1 code implementation8 Apr 2021 Guanghao Yin, Wei Wang, Zehuan Yuan, Wei Ji, Dongdong Yu, Shouqian Sun, Tat-Seng Chua, Changhu Wang

We extract degradation prior at task-level with the proposed ConditionNet, which will be used to adapt the parameters of the basic SR network (BaseNet).

Image Super-Resolution

Exploring Balanced Feature Spaces for Representation Learning

no code implementations ICLR 2021 Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, Jiashi Feng

Motivated by this question, we conduct a series of studies on the performance of self-supervised contrastive learning and supervised learning methods over multiple datasets where training instance distributions vary from a balanced one to a long-tailed one.

Contrastive Learning Long-tail Learning +2

Domain-Invariant Disentangled Network for Generalizable Object Detection

no code implementations ICCV 2021 Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, Jianfei Cai

By disentangling representations on both image and instance levels, DIDN is able to learn domain-invariant representations that are suitable for generalized object detection.

Domain Generalization Image Classification +1

Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

no code implementations ICCV 2021 Wei Wang, Haochen Zhang, Zehuan Yuan, Changhu Wang

A popular attempts towards the challenge is unpaired generative adversarial networks, which generate "real" LR counterparts from real HR images using image-to-image translation and then perform super-resolution from "real" LR->SR.

Domain Adaptation Image-to-Image Translation +1

TransTrack: Multiple Object Tracking with Transformer

3 code implementations31 Dec 2020 Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, Ping Luo

In this work, we propose TransTrack, a simple but efficient scheme to solve the multiple object tracking problems.

Multiple Object Tracking Object Detection

What Makes for End-to-End Object Detection?

1 code implementation10 Dec 2020 Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, Ping Luo

We identify that classification cost in matching cost is the main ingredient: (1) previous detectors only consider location cost, (2) by additionally introducing classification cost, previous detectors immediately produce one-to-one prediction during inference.

General Classification Object Detection

Slimmable Generative Adversarial Networks

1 code implementation10 Dec 2020 Liang Hou, Zehuan Yuan, Lei Huang, HuaWei Shen, Xueqi Cheng, Changhu Wang

In particular, for real-time generation tasks, different devices require generators of different sizes due to varying computing power.

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

5 code implementations CVPR 2021 Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei LI, Zehuan Yuan, Changhu Wang, Ping Luo

In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location.

Object Detection Object Recognition

Controllable Orthogonalization in Training DNNs

1 code implementation CVPR 2020 Lei Huang, Li Liu, Fan Zhu, Diwen Wan, Zehuan Yuan, Bo Li, Ling Shao

Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1 and reduce redundancy in representation.

Image Classification

Deformable Tube Network for Action Detection in Videos

no code implementations3 Jul 2019 Wei Li, Zehuan Yuan, Dashan Guo, Lei Huang, Xiangzhong Fang, Changhu Wang

To perform action detection, we design a 3D convolution network with skip connections for tube classification and regression.

Action Detection Action Recognition

Knowing Where to Look? Analysis on Attention of Visual Question Answering System

no code implementations9 Oct 2018 Wei Li, Zehuan Yuan, Xiangzhong Fang, Changhu Wang

Attention mechanisms have been widely used in Visual Question Answering (VQA) solutions due to their capacity to model deep cross-domain interactions.

Question Answering Visual Question Answering

Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification

no code implementations16 Sep 2018 Jinlai Liu, Zehuan Yuan, Changhu Wang

Leveraging both visual frames and audio has been experimentally proven effective to improve large-scale video classification.

General Classification Video Classification

Temporal Action Localization by Structured Maximal Sums

no code implementations CVPR 2017 Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng

We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores.

Action Detection General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.