Search Results for author: Yi Yang

Found 507 papers, 249 papers with code

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

3 code implementations • CVPR 2019 • Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang

In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small.

Image Classification

38,418

Paper
Code

Random Erasing Data Augmentation

18 code implementations • 16 Aug 2017 • Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang

In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN).

Ranked #4 on Image Classification on Fashion-MNIST

General Classification Image Augmentation +4

29,713

Paper
Code

Operation-aware Neural Networks for User Response Prediction

4 code implementations • 2 Apr 2019 • Yi Yang, Baile Xu, Furao Shen, Jian Zhao

Many deep models are proposed to automatically learn high-order feature interactions.

7,345

Paper
Code

Joint Discriminative and Generative Learning for Person Re-identification

12 code implementations • CVPR 2019 • Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz

To this end, we propose a joint learning framework that couples re-id learning and data generation end-to-end.

Ranked #1 on Person Re-Identification on UAV-Human

Image-to-Image Translation Unsupervised Domain Adaptation +1

3,948

Paper
Code

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

3 code implementations • 14 Apr 2020 • Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, Tao Mei

This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain.

Ranked #1 on Vehicle Re-Identification on VehicleID

Representation Learning Vehicle Re-Identification

3,948

Paper
Code

Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification

2 code implementations • CVPR 2018 • Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, Jianbin Jiao

To this end, we propose to preserve two types of unsupervised similarities, 1) self-similarity of an image before and after translation, and 2) domain-dissimilarity of a translated source image and a target image.

Ranked #3 on Unsupervised Person Re-Identification on MSMT17->DukeMTMC-reID

Generative Adversarial Network Person Re-Identification +2

3,138

Paper
Code

Segment and Track Anything

1 code implementation • 11 May 2023 • Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

2,432

Paper
Code

Network Pruning via Transformable Architecture Search

4 code implementations • NeurIPS 2019 • Xuanyi Dong, Yi Yang

The maximum probability for the size in each distribution serves as the width and depth of the pruned network, whose parameters are learned by knowledge transfer, e. g., knowledge distillation, from the original networks.

Ranked #1 on Network Pruning on CIFAR-10

Knowledge Distillation Network Pruning +2

1,547

Paper
Code

Searching for A Robust Neural Architecture in Four GPU Hours

6 code implementations • CVPR 2019 • Xuanyi Dong, Yi Yang

To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG.

Ranked #18 on Neural Architecture Search on CIFAR-10

Neural Architecture Search

1,547

Paper
Code

One-Shot Neural Architecture Search via Self-Evaluated Template Network

4 code implementations • ICCV 2019 • Xuanyi Dong, Yi Yang

In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates.

Ranked #18 on Neural Architecture Search on NAS-Bench-201, ImageNet-16-120 (Accuracy (Val) metric)

Neural Architecture Search

1,547

Paper
Code

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search

4 code implementations • ICLR 2020 • Xuanyi Dong, Yi Yang

A variety of algorithms search architectures under different search space.

Data Augmentation Neural Architecture Search

1,547

Paper
Code

Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification

3 code implementations • ICCV 2019 • Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang

We propose to automatically search for a CNN architecture that is specifically suitable for the reID task.

Ranked #9 on Person Re-Identification on CUHK03 detected

Classification General Classification +3

1,546

Paper
Code

Collaborative Video Object Segmentation by Foreground-Background Integration

2 code implementations • ECCV 2020 • Zongxin Yang, Yunchao Wei, Yi Yang

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.

Ranked #8 on Video Object Segmentation on YouTube-VOS 2019

Object One-shot visual object segmentation +3

1,413

Paper
Code

ActBERT: Learning Global-Local Video-Text Representations

1 code implementation • CVPR 2020 • Linchao Zhu, Yi Yang

In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data.

Ranked #8 on Action Segmentation on COIN

Action Segmentation Question Answering +5

1,413

Paper
Code

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

1 code implementation • CVPR 2021 • Xiaohan Wang, Linchao Zhu, Yi Yang

Moreover, a global alignment method is proposed to provide a global cross-modal measurement that is complementary to the local perspective.

Retrieval Video Retrieval

1,413

Paper
Code

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations • 7 Nov 2022 • Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

1,043

Paper
Code

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

1 code implementation • ICCV 2023 • Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.

Ranked #1 on Visual Tracking on Kinetics

Motion Estimation Visual Tracking

1,043

Paper
Code

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations • 1 Feb 2024 • Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

1,043

Paper
Code

Self-Correction for Human Parsing

2 code implementations • 22 Oct 2019 • Peike Li, Yunqiu Xu, Yunchao Wei, Yi Yang

To tackle the problem of learning with label noises, this work introduces a purification strategy, called Self-Correction for Human Parsing (SCHP), to progressively promote the reliability of the supervised labels as well as the learned models.

Ranked #2 on Human Part Segmentation on PASCAL-Part

Human Parsing Human Part Segmentation +1

935

Paper
Code

Style Aggregated Network for Facial Landmark Detection

1 code implementation • CVPR 2018 • Xuanyi Dong, Yan Yan, Wanli Ouyang, Yi Yang

In this work, we propose a style-aggregated approach to deal with the large intrinsic variance of image styles for facial landmark detection.

Ranked #2 on Facial Landmark Detection on AFLW-Front (Mean NME metric)

Face Alignment Facial Landmark Detection

917

Paper
Code

Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection

2 code implementations • ICCV 2019 • Xuanyi Dong, Yi Yang

A typical approach is to (1) train a detector on the labeled images; (2) generate new training samples using this detector's prediction as pseudo labels of unlabeled images; (3) retrain the detector on the labeled samples and partial pseudo labeled samples.

Ranked #1 on Facial Landmark Detection on 300W (Full) (using extra training data)

Facial Landmark Detection

917

Paper
Code

Supervision by Registration and Triangulation for Landmark Detection

1 code implementation • 25 Jan 2021 • Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu

End-to-end training is made possible by differentiable registration and 3D triangulation modules.

Optical Flow Estimation

917

Paper
Code

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

1 code implementation • CVPR 2018 • Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh

In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video.

Ranked #1 on Facial Landmark Detection on 300-VW (C)

Facial Landmark Detection Optical Flow Estimation

757

Paper
Code

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification

1 code implementation • ICCV 2021 • Yanbin Liu, Juho Lee, Linchao Zhu, Ling Chen, Humphrey Shi, Yi Yang

Most existing few-shot classification methods only consider generalization on one dataset (i. e., single-domain), failing to transfer across various seen and unseen domains.

Classification Domain Generalization

740

Paper
Code

DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments

2 code implementations • 22 Sep 2018 • Chao Yu, Zuxin Liu, Xinjun Liu, Fugui Xie, Yi Yang, Qi Wei, Qiao Fei

It is one of the state-of-the-art SLAM systems in high-dynamic environments.

Robotics

655

Paper
Code

A Simple Episodic Linear Probe Improves Visual Recognition in the Wild

2 code implementations • CVPR 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, we propose an episodic linear probing (ELP) classifier to reflect the generalization of visual representations in an online manner.

Ranked #13 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Long-tail Learning +1

582

Paper
Code

Associating Objects with Transformers for Video Object Segmentation

2 code implementations • NeurIPS 2021 • Zongxin Yang, Yunchao Wei, Yi Yang

The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources.

Ranked #2 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)

Object One-shot visual object segmentation +2

561

Paper
Code

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations • 22 Mar 2022 • Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Ranked #3 on Semi-Supervised Video Object Segmentation on YouTube-VOS 2019

Object Segmentation +3

561

Paper
Code

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

2 code implementations • 18 Oct 2022 • Zongxin Yang, Yi Yang

To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

Ranked #1 on Semi-Supervised Video Object Segmentation on VOT2020

Object Semantic Segmentation +2

561

Paper
Code

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations • 8 May 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

561

Paper
Code

FinBERT: A Pretrained Language Model for Financial Communications

1 code implementation • 15 Jun 2020 • Yi Yang, Mark Christopher Siy UY, Allen Huang

Contextual pretrained language models, such as BERT (Devlin et al., 2019), have made significant breakthrough in various NLP tasks by training on large scale of unlabeled text re-sources. Financial sector also accumulates large amount of financial communication text. However, there is no pretrained finance specific language models available.

Language Modelling Sentiment Analysis +1

519

Paper
Code

Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)

29 code implementations • ECCV 2018 • Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang

RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency.

Ranked #3 on Person Re-Identification on UAV-Human

Person Re-Identification Person Retrieval +1

482

Paper
Code

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

3 code implementations • 27 Feb 2020 • Zhedong Zheng, Yunchao Wei, Yi Yang

To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i. e., drone-view target localization and drone navigation.

Ranked #6 on Drone navigation on University-1652

Drone navigation Drone-view target localization +2

447

Paper
Code

Multiple-environment Self-adaptive Network for Aerial-view Geo-localization

2 code implementations • 18 Apr 2022 • Tingyu Wang, Zhedong Zheng, Yaoqi Sun, Chenggang Yan, Yi Yang, Tat-Seng Chua

This task is mostly regarded as an image retrieval problem.

Image Retrieval Retrieval

420

Paper
Code

Unsupervised Scene Adaptation with Memory Regularization in vivo

2 code implementations • 24 Dec 2019 • Zhedong Zheng, Yi Yang

We consider the unsupervised scene adaptation problem of learning from both labeled source data and unlabeled target data.

Ranked #1 on Domain Adaptation on SYNTHIA-to-Cityscapes Labels

Semantic Segmentation Synthetic-to-Real Translation +1

379

Paper
Code

Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation

3 code implementations • 8 Mar 2020 • Zhedong Zheng, Yi Yang

This paper focuses on the unsupervised domain adaptation of transferring the knowledge from the source domain to the target domain in the context of semantic segmentation.

Ranked #2 on Unsupervised Domain Adaptation on Cityscapes-to-OxfordCar

Pseudo Label Segmentation +3

379

Paper
Code

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

6 code implementations • 21 Aug 2018 • Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang

Therefore, the network trained by our method has a larger model capacity to learn from the training data.

374

Paper
Code

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

1 code implementation • 8 Feb 2024 • Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).

Ranked #1 on Conditional Text-to-Image Synthesis on COCO-MIG

Attribute Conditional Text-to-Image Synthesis +1

356

Paper
Code

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

1 code implementation • 13 Oct 2020 • Zongxin Yang, Yunchao Wei, Yi Yang

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.

Ranked #26 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (test-dev)

Object One-shot visual object segmentation +3

323

Paper
Code

Deep Hierarchical Semantic Segmentation

3 code implementations • CVPR 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang

In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.

Multi-Label Classification Segmentation +1

322

Paper
Code

Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro

8 code implementations • ICCV 2017 • Zhedong Zheng, Liang Zheng, Yi Yang

We verify the proposed method on a practical problem: person re-identification (re-ID).

Ranked #4 on Person Re-Identification on CUHK03

Fine-Grained Image Classification Generative Adversarial Network +2

321

Paper
Code

Contrastive Adaptation Network for Unsupervised Domain Adaptation

2 code implementations • CVPR 2019 • Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain.

Ranked #7 on Domain Adaptation on Office-31

Unsupervised Domain Adaptation

313

Paper
Code

Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-identification

2 code implementations • CVPR 2019 • Zhun Zhong, Liang Zheng, Zhiming Luo, Shaozi Li, Yi Yang

To achieve this goal, an exemplar memory is introduced to store features of the target domain and accommodate the three invariance properties.

Ranked #3 on Unsupervised Person Re-Identification on DukeMTMC-reID->Market-1501

Domain Adaptive Person Re-Identification Person Re-Identification +1

306

Paper
Code

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

1 code implementation • CVPR 2019 • Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang

We consider the problem of unsupervised domain adaptation in semantic segmentation.

Ranked #8 on Semantic Segmentation on DADA-seg

Semantic Segmentation Synthetic-to-Real Translation +1

290

Paper
Code

Camera Style Adaptation for Person Re-identification

10 code implementations • CVPR 2018 • Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, Yi Yang

In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation.

Ranked #71 on Person Re-Identification on DukeMTMC-reID

Data Augmentation Person Re-Identification +1

284

Paper
Code

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

2 code implementations • 15 Nov 2017 • Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, Yi-Dong Shen

In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space.

Ranked #1 on Cross-Modal Retrieval on CUHK-PEDES

Content-Based Image Retrieval Cross-Modal Retrieval +4

280

Paper
Code

A Discriminatively Learned CNN Embedding for Person Re-identification

4 code implementations • 17 Nov 2016 • Zhedong Zheng, Liang Zheng, Yi Yang

We revisit two popular convolutional neural networks (CNN) in person re-identification (re-ID), i. e, verification and classification models.

Ranked #1 on Person Re-Identification on Market-1501+500k

General Classification Image Retrieval +2

265

Paper
Code

Parameter-Efficient Person Re-identification in the 3D Space

1 code implementation • 8 Jun 2020 • Zhedong Zheng, Nenggan Zheng, Yi Yang

To our knowledge, we are among the first attempts to conduct person re-identification in the 3D space.

Ranked #1 on Person Re-Identification on DukeMTMC-reID->Market-1501

3D Point Cloud Classification Point Cloud Classification +3

260

Paper
Code

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models

3 code implementations • 5 Oct 2022 • Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature, class).

Segmentation Semantic Segmentation

253

Paper
Code

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

2 code implementations • ICLR 2019 • Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, Yi Yang

The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class.

Ranked #5 on Few-Shot Image Classification on Mini-Imagenet 10-way (1-shot)

Few-Shot Image Classification Few-Shot Learning +2

240

Paper
Code

Pedestrian Alignment Network for Large-scale Person Re-identification

1 code implementation • 3 Jul 2017 • Zhedong Zheng, Liang Zheng, Yi Yang

This task aims to search a query person in a large image pool.

Ranked #1 on Person Re-Identification on CUHK03 (detected)

Image Retrieval Large-Scale Person Re-Identification +1

234

Paper
Code

Unsupervised Person Re-identification: Clustering and Fine-tuning

1 code implementation • 30 May 2017 • Hehe Fan, Liang Zheng, Yi Yang

Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence.

Ranked #12 on Unsupervised Person Re-Identification on DukeMTMC-reID

Clustering Unsupervised Person Re-Identification

218

Paper
Code

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

2 code implementations • 29 Jan 2024 • Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving.

Ranked #1 on Scene Flow Estimation on Argoverse 2

Autonomous Driving

218

Paper
Code

Macro-Micro Adversarial Network for Human Parsing

1 code implementation • ECCV 2018 • Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang

To address the two kinds of inconsistencies, this paper proposes the Macro-Micro Adversarial Net (MMAN).

Ranked #12 on Semantic Segmentation on LIP val

Human Parsing Human Part Segmentation +1

208

Paper
Code

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

1 code implementation • ICCV 2023 • Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications.

Talking Head Generation

203

Paper
Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

201

Paper
Code

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

4 code implementations • CVPR 2019 • Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang

If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality.

Ranked #6 on Text-to-Image Generation on CUB (Inception score metric)

Generative Adversarial Network Text-to-Image Generation

182

Paper
Code

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

1 code implementation • 29 Nov 2022 • Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou

In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.

Ranked #1 on Novel View Synthesis on NeRF (Average PSNR metric)

3D Reconstruction Neural Rendering +1

180

Paper
Code

PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures

1 code implementation • 8 Apr 2023 • Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

To address this limitation and maximize the potential of each architecture, we propose Progressive Volume Distillation with Active Learning (PVD-AL), a systematic distillation method that enables any-to-any conversions between different architectures.

3D Reconstruction Novel View Synthesis

180

Paper
Code

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation • NeurIPS 2023 • Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Few-Shot Learning

164

Paper
Code

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

1 code implementation • CVPR 2021 • Hehe Fan, Yi Yang, Mohan Kankanhalli

To capture the dynamics in point cloud videos, point tracking is usually employed.

Ranked #4 on 3D Action Recognition on NTU RGB+D

3D Action Recognition Point Tracking +1

155

Paper
Code

Perception Test: A Diagnostic Benchmark for Multimodal Models

1 code implementation • Deep Mind 2022 • Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Skanda Koppula, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman and João Carreira

We propose a novel multimodal benchmark – the Perception Test – that aims to extensively evaluate perception and reasoning skills of multimodal models.

Multiple-choice Question Answering +1

150

Paper
Code

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

2 code implementations • NeurIPS 2023 • Viorica Pătrăucean, Lucas Smaira, Ankush Gupta, Adrià Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, João Carreira

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e. g. Flamingo, SeViLA, or GPT-4).

counterfactual Descriptive +2

150

Paper
Code

Self-produced Guidance for Weakly-supervised Object Localization

1 code implementation • ECCV 2018 • Xiaolin Zhang, Yunchao Wei, Guoliang Kang, Yi Yang, Thomas Huang

A stagewise approach is proposed to incorporate high confident object regions to learn the SPG masks.

Ranked #1 on Weakly-Supervised Object Localization on ILSVRC 2015

Classification General Classification +2

147

Paper
Code

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

2 code implementations • 14 May 2017 • Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, Weiran He

In this work, we propose a model that can learn object transfiguration from two unpaired sets of images: one set containing images that "have" that kind of object, and the other set being the opposite, with the mild constraint that the objects be located approximately at the same place.

Attribute Conditional Image Generation +1

144

Paper
Code

PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing

2 code implementations • 18 Oct 2019 • Hehe Fan, Yi Yang

We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements.

Moving Point Cloud Processing

141

Paper
Code

LGSDF: Continual Global Learning of Signed Distance Fields Aided by Local Updating

2 code implementations • 8 Apr 2024 • Yufeng Yue, Yinan Deng, Jiahui Wang, Yi Yang

Implicit reconstruction of ESDF (Euclidean Signed Distance Field) involves training a neural network to regress the signed distance from any point to the nearest obstacle, which has the advantages of lightweight storage and continuous querying.

Self-Supervised Learning

140

Paper
Code

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

1 code implementation • ICCV 2023 • Liangqi Li, Jiaxu Miao, Dahu Shi, Wenming Tan, Ye Ren, Yi Yang, ShiLiang Pu

Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability.

Knowledge Distillation Language Modelling +2

136

Paper
Code

D$^2$LV: A Data-Driven and Local-Verification Approach for Image Copy Detection

1 code implementation • 13 Nov 2021 • Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang

In this paper, a data-driven and local-verification (D$^2$LV) approach is proposed to compete for Image Similarity Challenge: Matching Track at NeurIPS'21.

Copy Detection Unsupervised Pre-training

135

Paper
Code

Lane Detection in Low-light Conditions Using an Efficient Data Enhancement : Light Conditions Style Transfer

1 code implementation • 4 Feb 2020 • Tong Liu, Zhaowei Chen, Yi Yang, Zehao Wu, Haowei Li

Nowadays, deep learning techniques are widely used for lane detection, but application in low-light conditions remains a challenge until this day.

Lane Detection Multi-Task Learning +1

134

Paper
Code

Generalizing A Person Retrieval Model Hetero- and Homogeneously

1 code implementation • ECCV 2018 • Zhun Zhong, Liang Zheng, Shaozi Li, Yi Yang

Person re-identification (re-ID) poses unique challenges for unsupervised domain adaptation (UDA) in that classes in the source and target sets (domains) are entirely different and that image variations are largely caused by cameras.

Person Re-Identification Person Retrieval +2

130

Paper
Code

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation • 17 May 2023 • Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Ranked #6 on Low-Light Image Enhancement on LOL

Denoising Image Generation +1

130

Paper
Code

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

1 code implementation • 8 Oct 2018 • Yang Wang, Zhenheng Yang, Peng Wang, Yi Yang, Chenxu Luo, Wei Xu

Then the whole scene is decomposed into moving foreground and static background by compar- ing the estimated optical flow and rigid flow derived from the depth and ego-motion.

Motion Estimation Optical Flow Estimation

128

Paper
Code

Gated Channel Transformation for Visual Recognition

3 code implementations • CVPR 2020 • Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

General Classification Image Classification +5

125

Paper
Code

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

1 code implementation • CVPR 2022 • Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang

In contrast, our large-scale VIdeo Panoptic Segmentation in the Wild (VIPSeg) dataset provides 3, 536 videos and 84, 750 frames with pixel-level panoptic annotations, covering a wide range of real-world scenarios and categories.

Segmentation Video Panoptic Segmentation

120

Paper
Code

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

1 code implementation • 2 May 2022 • Shuai Zhao, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.

Ranked #11 on Video Retrieval on MSVD (using extra training data)

Clustering Retrieval +1

119

Paper
Code

What You Say and How You Say It Matters: Predicting Stock Volatility Using Verbal and Vocal Cues

1 code implementation • ACL 2019 • Yu Qin, Yi Yang

Prior research has shown that textual information in a firm{'}s financial statement can be used to predict its stock{'}s risk level.

117

Paper
Code

Visual Abductive Reasoning

1 code implementation • CVPR 2022 • Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.

Benchmarking Sentence +1

112

Paper
Code

SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation

1 code implementation • 22 Oct 2018 • Xiaolin Zhang, Yunchao Wei, Yi Yang, Thomas Huang

In this way, the possibilities embedded in the produced similarity maps can be adapted to guide the process of segmenting objects.

Ranked #89 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot)

Few-Shot Semantic Segmentation Segmentation +1

111

Paper
Code

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

1 code implementation • ICCV 2015 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task.

Image Captioning Novel Concepts +1

109

Paper
Code

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

2 code implementations • 20 Dec 2014 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions.

8k Image Captioning +1

109

Paper
Code

Domain Consensus Clustering for Universal Domain Adaptation

1 code implementation • CVPR 2021 • Guangrui Li, Guoliang Kang, Yi Zhu, Yunchao Wei, Yi Yang

To better exploit the intrinsic structure of the target domain, we propose Domain Consensus Clustering (DCC), which exploits the domain consensus knowledge to discover discriminative clusters on both common samples and private ones.

Ranked #4 on Partial Domain Adaptation on Office-31

Clustering domain classification +3

107

Paper
Code

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation • 6 Mar 2023 • Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Image Captioning Text Generation

107

Paper
Code

Bag of Tricks and A Strong baseline for Image Copy Detection

1 code implementation • 13 Nov 2021 • Wenhao Wang, Weipu Zhang, Yifan Sun, Yi Yang

In this paper, a bag of tricks and a strong baseline are proposed for image copy detection.

Copy Detection Unsupervised Pre-training

103

Paper
Code

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

1 code implementation • 3 Mar 2022 • Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Yi Yang, Ling-Yu Duan

Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains for a smooth knowledge transfer.

Domain Generalization Person Re-Identification +1

Paper
Code

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation • 23 Dec 2023 • MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Paper
Code

Content-Consistent Matching for Domain Adaptive Semantic Segmentation

1 code implementation • ECCV 2020 • Guangrui Li, Guoliang Kang, Wu Liu, Yunchao Wei, Yi Yang

The target of CCM is to acquire those synthetic images that share similar distribution with the real ones in the target domain, so that the domain gap can be naturally alleviated by employing the content-consistent synthetic images for training.

Ranked #12 on Semantic Segmentation on GTAV-to-Cityscapes Labels

Domain Adaptation Semantic Segmentation +1

Paper
Code

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

1 code implementation • 31 May 2021 • Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang

In this paper, we apply one new modality, i. e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario.

Language Modelling Management +2

Paper
Code

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

1 code implementation • CVPR 2022 • Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, Yi Yang

To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.

3D-Aware Image Synthesis

Paper
Code

Few-Example Object Detection with Model Communication

1 code implementation • 26 Jun 2017 • Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng

Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.

Ranked #1 on Weakly Supervised Object Detection on MS COCO

Object object-detection

Paper
Code

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

1 code implementation • NeurIPS 2020 • Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, Alexander G. Hauptmann

The conventional solution to this task is to minimize the discrepancy between source and target to enable effective knowledge transfer.

Ranked #25 on Synthetic-to-Real Translation on SYNTHIA-to-Cityscapes

Domain Adaptation Semantic Segmentation +2

Paper
Code

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation • NeurIPS 2023 • Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

Paper
Code

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

1 code implementation • ICLR 2021 • Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.

Ranked #3 on 3D Action Recognition on NTU RGB+D

3D Action Recognition Semantic Segmentation

Paper
Code

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation • CVPR 2023 • Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

Paper
Code

Very Long Natural Scenery Image Prediction by Outpainting

1 code implementation • ICCV 2019 • Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input.

Image Inpainting Image Outpainting

Paper
Code

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

1 code implementation • 14 Nov 2022 • Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua

In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.

Ranked #1 on Semantic Segmentation on SYNTHIA-to-Cityscapes

Self-Supervised Learning Semantic Segmentation +2

Paper
Code

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective

1 code implementation • 14 Dec 2020 • Xuanmeng Zhang, Minyue Jiang, Zhedong Zheng, Xiao Tan, Errui Ding, Yi Yang

We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph.

Ranked #1 on Image Retrieval on Oxford5k

Drone-view target localization Image Retrieval +4

Paper
Code

Nanophotonic Particle Simulation and Inverse Design Using Artificial Neural Networks

1 code implementation • 18 Oct 2017 • John Peurifoy, Yichen Shen, Li Jing, Yi Yang, Fidel Cano-Renteria, Brendan Delacy, Max Tegmark, John D. Joannopoulos, Marin Soljacic

We propose a method to use artificial neural networks to approximate light scattering by multilayer nanoparticles.

Computational Physics Applied Physics Optics

Paper
Code

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation • CVPR 2021 • Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Paper
Code

SF-Net: Single-Frame Supervision for Temporal Action Localization

1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou

To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.

Ranked #5 on Weakly Supervised Action Localization on BEOID

Weakly Supervised Action Localization

Paper
Code

Few-Shot Segmentation via Cycle-Consistent Transformer

2 code implementations • NeurIPS 2021 • Gengwei Zhang, Guoliang Kang, Yi Yang, Yunchao Wei

Directly performing cross-attention may aggregate these features from support to query and bias the query features.

Ranked #52 on Few-Shot Semantic Segmentation on COCO-20i (5-shot)

Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

1 code implementation • NeurIPS 2020 • Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, Yi Yang

We aim at the problem named One-Shot Unsupervised Domain Adaptation.

Ranked #2 on One-shot Unsupervised Domain Adaptation on GTA5 to Cityscapes

domain classification One-shot Unsupervised Domain Adaptation +2

Paper
Code

Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization

1 code implementation • 26 Aug 2020 • Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, Yi Yang

Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center, but underestimate the contextual information in neighbor areas.

Ranked #3 on Drone navigation on University-1652

Drone navigation Drone-view target localization +2

Paper
Code

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

1 code implementation • 10 Mar 2024 • Wenhao Wang, Yifan Sun, Yi Yang

However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts.

Copy Detection Image Generation +3

Paper
Code

Dialog Intent Induction with Deep Multi-View Clustering

1 code implementation • IJCNLP 2019 • Hugh Perkins, Yi Yang

We introduce the dialog intent induction task and present a novel deep multi-view clustering approach to tackle the problem.

Clustering Representation Learning

Paper
Code

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation • 1 Feb 2024 • Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

Paper
Code

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

1 code implementation • 27 Apr 2022 • Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.

Ranked #1 on Single-View 3D Reconstruction on CUB-200-2011

3D Reconstruction Person Re-Identification +2

Paper
Code

Dynamic Computational Time for Visual Attention

1 code implementation • 30 Mar 2017 • Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, Shilei Wen, Wei Xu

We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM).

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Personalized Video Recommendation Using Rich Contents from Videos

1 code implementation • 21 Dec 2016 • Xingzhong Du, Hongzhi Yin, Ling Chen, Yang Wang, Yi Yang, Xiaofang Zhou

In the existing video recommender systems, the models make the recommendations based on the user-video interactions and single specific content features.

Recommendation Systems

Paper
Code

Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems

2 code implementations • NAACL 2021 • Derek Chen, Howard Chen, Yi Yang, Alex Lin, Zhou Yu

Existing goal-oriented dialogue datasets focus mainly on identifying slots and values.

Task-Oriented Dialogue Systems

Paper
Code

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation • 23 May 2023 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Ruijie Quan, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

Ranked #1 on Scene Text Recognition on WOST (using extra training data)

Language Modelling Scene Text Recognition

Paper
Code

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning

1 code implementation • 15 Sep 2023 • Yi Yang, Yixuan Tang, Kar Yan Tam

We present a new financial domain large language model, InvestLM, tuned on LLaMA-65B (Touvron et al., 2023), using a carefully curated instruction dataset related to financial investment.

Language Modelling Large Language Model

Paper
Code

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation • CVPR 2023 • Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

Ranked #46 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation

Paper
Code

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

1 code implementation • EMNLP 2020 • Yi Yang, Arzoo Katiyar

We present a simple few-shot named entity recognition (NER) system based on nearest neighbor learning and structured inference.

Few-shot NER Meta-Learning +1

Paper
Code

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

2 code implementations • CVPR 2023 • Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, WangMeng Zuo

Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.

Contrastive Learning Deblurring +2

Paper
Code

Convolutional Neural Networks with Recurrent Neural Filters

2 code implementations • EMNLP 2018 • Yi Yang

We introduce a class of convolutional neural networks (CNNs) that utilize recurrent neural networks (RNNs) as convolution filters.

Ranked #11 on Sentiment Analysis on SST-5 Fine-grained classification

Sentence Sentiment Analysis

Paper
Code

Unified Transformer Tracker for Object Tracking

1 code implementation • CVPR 2022 • Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

Paper
Code

Overcoming Language Variation in Sentiment Analysis with Social Attention

1 code implementation • TACL 2017 • Yi Yang, Jacob Eisenstein

Variation in language is ubiquitous, particularly in newer forms of writing such as social media.

Sentiment Analysis

Paper
Code

Query Attack via Opposite-Direction Feature:Towards Robust Image Retrieval

2 code implementations • 7 Sep 2018 • Zhedong Zheng, Liang Zheng, Yi Yang, Fei Wu

Opposite-Direction Feature Attack (ODFA) effectively exploits feature-level adversarial gradients and takes advantage of feature distance in the representation space.

Adversarial Attack General Classification +3

Paper
Code

MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation • 8 Jan 2024 • Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Object object-detection +1

Paper
Code

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

1 code implementation • 16 Oct 2015 • Linnan Wang, Wei Wu, Jianxiong Xiao, Yi Yang

Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing.

Distributed, Parallel, and Cluster Computing

Paper
Code

Local-Global Context Aware Transformer for Language-Guided Video Segmentation

1 code implementation • 18 Mar 2022 • Chen Liang, Wenguan Wang, Tianfei Zhou, Jiaxu Miao, Yawei Luo, Yi Yang

We explore the task of language-guided video segmentation (LVS).

Ranked #7 on Referring Expression Segmentation on A2D Sentences

Referring Expression Segmentation Referring Video Object Segmentation +5

Paper
Code

Adaptive Boosting for Domain Adaptation: Towards Robust Predictions in Scene Segmentation

2 code implementations • 29 Mar 2021 • Zhedong Zheng, Yi Yang

Domain adaptation is to transfer the shared knowledge learned from the source domain to a new environment, i. e., target domain.

Ranked #1 on Unsupervised Domain Adaptation on Cityscapes-to-OxfordCar

Scene Segmentation Semi-Supervised Image Classification +2

Paper
Code

Query-efficient Meta Attack to Deep Neural Networks

1 code implementation • ICLR 2020 • Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, Jiashi Feng

Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries.

Adversarial Attack Meta-Learning

Paper
Code

Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

1 code implementation • 13 Sep 2021 • Yi Yang, Daoye Zhu, Tengteng Qu, Qiangyu Wang, Fuhu Ren, Chengqi Cheng

In the experiments, the proposed method is applied to ResNet and UNet, and the adjusted networks are verified on three very diverse benchmark data sets (i. e., Houston2018 data, Berlin data, and MUUFL data).

Paper
Code

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

1 code implementation • 27 Jan 2024 • Yixuan Tang, Yi Yang

We hope MultiHop-RAG will be a valuable resource for the community in developing effective RAG systems, thereby facilitating greater adoption of LLMs in practice.

Benchmarking Retrieval

Paper
Code

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

1 code implementation • 14 Mar 2024 • Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments.

Zero-Shot Learning

Paper
Code

DerainCycleGAN: Rain Attentive CycleGAN for Single Image Deraining and Rainmaking

1 code implementation • 15 Dec 2019 • Yanyan Wei, Zhao Zhang, Yang Wang, Mingliang Xu, Yi Yang, Shuicheng Yan, Meng Wang

However, in practice it is rather common to have no un-paired images in real deraining task, in such cases how to remove the rain streaks in an unsupervised way will be a very challenging task due to lack of constraints between images and hence suffering from low-quality recovery results.

Single Image Deraining

Paper
Code

Feature-Proxy Transformer for Few-Shot Segmentation

2 code implementations • 13 Oct 2022 • Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head.

Ranked #1 on Few-Shot Semantic Segmentation on COCO-20i -> Pascal VOC (5-shot)

Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

DenseBox: Unifying Landmark Localization with End to End Object Detection

2 code implementations • 16 Sep 2015 • Lichao Huang, Yi Yang, Yafeng Deng, Yinan Yu

How can a single fully convolutional neural network (FCN) perform on object detection?

Face Detection Multi-Task Learning +3

Paper
Code

RFNet: Region-Aware Fusion Network for Incomplete Multi-Modal Brain Tumor Segmentation

1 code implementation • ICCV 2021 • Yuhang Ding, Xin Yu, Yi Yang

In this work, we propose a Region-aware Fusion Network (RFNet) that is able to exploit different combinations of multi-modal data adaptively and effectively for tumor segmentation.

Ranked #69 on Semantic Segmentation on NYU Depth v2

Brain Tumor Segmentation Segmentation +1

Paper
Code

Attract or Distract: Exploit the Margin of Open Set

1 code implementation • ICCV 2019 • Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang

In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.

Domain Adaptation

Paper
Code

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation • 16 Jan 2024 • Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.

Scheduling

Paper
Code

Unsupervised Domain Adaptation with Feature Embeddings

1 code implementation • 14 Dec 2014 • Yi Yang, Jacob Eisenstein

Representation learning is the dominant technique for unsupervised domain adaptation, but existing approaches often require the specification of "pivot features" that generalize across domains, which are selected by task-specific heuristics.

Representation Learning Unsupervised Domain Adaptation

Paper
Code

Unsupervised Multi-Domain Adaptation with Feature Embeddings

1 code implementation • HLT 2015 • Jacob Eisenstein, Yi Yang

Representation Learning Unsupervised Domain Adaptation

Paper
Code

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

1 code implementation • ICCV 2021 • Yikai Wang, Yi Yang, Fuchun Sun, Anbang Yao

In the low-bit quantization field, training Binary Neural Networks (BNNs) is the extreme solution to ease the deployment of deep models on resource-constrained devices, having the lowest storage cost and significantly cheaper bit-wise operations compared to 32-bit floating-point counterparts.

Quantization

Paper
Code

Adaptive Exploration for Unsupervised Person Re-Identification

1 code implementation • 9 Jul 2019 • Yuhang Ding, Hehe Fan, Mingliang Xu, Yi Yang

However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods.

Unsupervised Person Re-Identification

Paper
Code

Inter-Image Communication for Weakly Supervised Localization

1 code implementation • ECCV 2020 • Xiaolin Zhang, Yunchao Wei, Yi Yang

We learn a feature center for each category and realize the global feature consistency by forcing the object features to approach class-specific centers.

Object

Paper
Code

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

1 code implementation • 1 Jul 2022 • Naiyuan Liu, Xiaohan Wang, Xiaobo Li, Yi Yang, Yueting Zhuang

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022.

Ranked #3 on Natural Language Queries on Ego4D

Data Augmentation Natural Language Queries

Paper
Code

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation • ICCV 2023 • Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

Paper
Code

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation • CVPR 2022 • Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Paper
Code

SEEG: Semantic Energized Co-Speech Gesture Generation

1 code implementation • CVPR 2022 • Yuanzhi Liang, Qianyu Feng, Linchao Zhu, Li Hu, Pan Pan, Yi Yang

Talking gesture generation is a practical yet challenging task which aims to synthesize gestures in line with speech.

Ranked #6 on Gesture Generation on TED Gesture Dataset

Gesture Generation

Paper
Code

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation • 29 May 2023 • Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

Paper
Code

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation • 18 Sep 2023 • Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

Paper
Code

Faster Meta Update Strategy for Noise-Robust Deep Learning

1 code implementation • 30 Apr 2021 • Youjiang Xu, Linchao Zhu, Lu Jiang, Yi Yang

It has been shown that deep neural networks are prone to overfitting on biased training data.

Ranked #1 on Image Classification on CIFAR-10, 40% Symmetric Noise

Learning with noisy labels Meta-Learning

Paper
Code

Removing Raindrops and Rain Streaks in One Go

1 code implementation • CVPR 2021 • Ruijie Quan, Xin Yu, Yuanzhi Liang, Yi Yang

First, we propose a complementary cascaded network architecture, namely CCN, to remove rain streaks and raindrops in a unified framework.

Neural Architecture Search Rain Removal

Paper
Code

Vector-Decomposed Disentanglement for Domain-Invariant Object Detection

1 code implementation • ICCV 2021 • Aming Wu, Rui Liu, Yahong Han, Linchao Zhu, Yi Yang

Secondly, domain-specific representations are introduced as the differences between the input and domain-invariant representations.

Disentanglement Object +2

Paper
Code

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation • 26 Jul 2022 • Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

Paper
Code

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

1 code implementation • CVPR 2023 • Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must.

Ranked #2 on Video Question Answering on AGQA 2.0 balanced

Question Answering Video Question Answering +2

Paper
Code

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

1 code implementation • 16 Apr 2022 • Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, Jun Xiao

A thriving trend for domain adaptive segmentation endeavors to generate the high-quality pseudo labels for target domain and retrain the segmentor on them.

Ranked #12 on Unsupervised Domain Adaptation on GTAV-to-Cityscapes Labels

Pseudo Label Semantic Segmentation +2

Paper
Code

Collective Entity Disambiguation with Structured Gradient Tree Boosting

1 code implementation • NAACL 2018 • Yi Yang, Ozan .Irsoy, Kazi Shefaet Rahman

To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation.

Entity Disambiguation

Paper
Code

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

1 code implementation • 26 Aug 2021 • Wuyang Chen, Xinyu Gong, Junru Wu, Yunchao Wei, Humphrey Shi, Zhicheng Yan, Yi Yang, Zhangyang Wang

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation.

Neural Architecture Search

Paper
Code

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection

1 code implementation • CVPR 2022 • Yunqiu Xu, Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang

How to align the source and target domains is critical to the CDWSOD accuracy.

Ranked #1 on Weakly Supervised Object Detection on Clipart1k

Domain Adaptation object-detection +1

Paper
Code

Context-Aware Pretraining for Efficient Blind Image Decomposition

1 code implementation • CVPR 2023 • Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang

(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.

Attribute Image Reconstruction +1

Paper
Code

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

1 code implementation • 5 Feb 2024 • Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li, Guobin Wu

Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems.

Continual Learning Multi-Task Learning +1

Paper
Code

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

1 code implementation • ACL 2021 • James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale, Jordan Swartz, T. Greg McKelvey, Hui Dai, Yi Yang, David Sontag

In this work, we describe our creation of a dataset of clinical action items annotated over MIMIC-III, the largest publicly available dataset of real clinical notes.

Extractive Summarization Language Modelling +1

Paper
Code

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation • CVPR 2022 • Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Paper
Code

Tele-Knowledge Pre-training for Fault Analysis

1 code implementation • 20 Oct 2022 • Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, YingYing Li, Lei Cheng, Huajun Chen

In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.

Language Modelling

Paper
Code

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation • 4 Sep 2023 • Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #3 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

Paper
Code

Universal-Prototype Enhancing for Few-Shot Object Detection

1 code implementation • ICCV 2021 • Aming Wu, Yahong Han, Linchao Zhu, Yi Yang

Thus, we develop a new framework of few-shot object detection with universal prototypes ({FSOD}^{up}) that owns the merit of feature generalization towards novel objects.

Ranked #23 on Few-Shot Object Detection on MS-COCO (10-shot)

Few-Shot Object Detection Meta-Learning +3

Paper
Code

Joint Representation Learning and Keypoint Detection for Cross-view Geo-localization

1 code implementation • IEEE Transactions on Image Processing (TIP) 2022 • Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, Nicu Sebe

Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network.

Ranked #2 on Drone navigation on University-1652

Drone navigation Drone-view target localization +3

Paper
Code

Gloss-Free End-to-End Sign Language Translation

1 code implementation • 22 May 2023 • Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

Paper
Code

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation • ICCV 2023 • Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Paper
Code

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

1 code implementation • 1 Nov 2021 • Weixin Xu, Zipeng Feng, Shuangkang Fang, Song Yuan, Yi Yang, Shuchang Zhou

For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy.

Image Classification Machine Translation +2

Paper
Code

Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification

1 code implementation • IEEE Transactions on Neural Networks and Learning Systems 2022 • Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Second, we instantiate the loss function and provide a strong baseline for FGVC, where the performance of a naive backbone can be boosted and be comparable with recent methods.

Ranked #27 on Fine-Grained Image Classification on CUB-200-2011

Fine-Grained Image Classification Fine-Grained Visual Recognition

Paper
Code

Improving Person Re-identification by Attribute and Identity Learning

2 code implementations • 21 Mar 2017 • Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, Yi Yang

Person re-identification (re-ID) and attribute recognition share a common target at learning pedestrian descriptions.

Ranked #75 on Person Re-Identification on DukeMTMC-reID

Attribute Person Recognition +2

Paper
Code

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

1 code implementation • 29 Mar 2022 • Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.

Contrastive Learning Semantic Segmentation +3

Paper
Code

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

1 code implementation • 24 May 2022 • Wenhao Wang, Yifan Sun, Yi Yang

Moreover, this paper further reveals a unique difficulty for solving the hard negative problem in ICD, i. e., there is a fundamental conflict between current metric learning and ICD.

Copy Detection Metric Learning

Paper
Code

Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks

2 code implementations • 22 Aug 2018 • Yang He, Xuanyi Dong, Guoliang Kang, Yanwei Fu, Chenggang Yan, Yi Yang

With asymptotic pruning, the information of the training set would be gradually concentrated in the remaining filters, so the subsequent training and pruning process would be stable.

Image Classification

Paper
Code

Self-supervised Point Cloud Representation Learning via Separating Mixed Shapes

1 code implementation • 1 Sep 2021 • Chao Sun, Zhedong Zheng, Xiaohan Wang, Mingliang Xu, Yi Yang

Albeit simple, the pre-trained encoder can capture the key points of an unseen point cloud and surpasses the encoder trained from scratch on downstream tasks.

Ranked #43 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +3

Paper
Code

Disentangling Structured Components: Towards Adaptive, Interpretable and Scalable Time Series Forecasting

1 code implementation • 22 May 2023 • Jinliang Deng, Xiusi Chen, Renhe Jiang, Du Yin, Yi Yang, Xuan Song, Ivor W. Tsang

The core issue in MTS forecasting is how to effectively model complex spatial-temporal patterns.

Ranked #1 on Time Series Forecasting on Weather (96)

Multivariate Time Series Forecasting Time Series

Paper
Code

3D Pose Estimation for Fine-Grained Object Categories

2 code implementations • 12 Jun 2018 • Yaming Wang, Xiao Tan, Yi Yang, Xiao Liu, Errui Ding, Feng Zhou, Larry S. Davis

The new dataset is available at www. umiacs. umd. edu/~wym/3dpose. html

3D Pose Estimation Object

Paper
Code

Improving Annotation for 3D Pose Dataset of Fine-Grained Object Categories

2 code implementations • 19 Oct 2018 • Yaming Wang, Xiao Tan, Yi Yang, Ziyu Li, Xiao Liu, Feng Zhou, Larry S. Davis

Existing 3D pose datasets of object categories are limited to generic object types and lack of fine-grained information.

3D Pose Estimation Object +1

Paper
Code

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

1 code implementation • 27 Mar 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, Yi Yang

Our target is to learn visual correspondence from unlabeled videos.

Position Representation Learning +1

Paper
Code

GIF: A General Graph Unlearning Strategy via Influence Function

1 code implementation • 6 Apr 2023 • Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He

Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data.

Machine Unlearning

Paper
Code

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

1 code implementation • 26 Apr 2023 • Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang

In MOTOR, we combine two kinds of basic medical knowledge, i. e., general and specific knowledge, in a complementary manner to boost the general pretraining process.

Medical Visual Question Answering Question Answering +1

Paper
Code

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation • 19 Jan 2024 • Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

Paper
Code

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

1 code implementation • 22 Mar 2024 • Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.

Paper
Code

UTS submission to Google YouTube-8M Challenge 2017

1 code implementation • 13 Jul 2017 • Linchao Zhu, Yanbin Liu, Yi Yang

In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017.

Classification General Classification +1

Paper
Code

Connective Cognition Network for Directional Visual Commonsense Reasoning

1 code implementation • NeurIPS 2019 • Aming Wu, Linchao Zhu, Yahong Han, Yi Yang

Inspired by this idea, towards VCR, we propose a connective cognition network (CCN) to dynamically reorganize the visual neuron connectivity that is contextualized by the meaning of questions and answers.

Sentence Visual Commonsense Reasoning

Paper
Code

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

1 code implementation • 5 Aug 2022 • Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei

In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.

Instance Segmentation Semantic Segmentation +1

Paper
Code

Feature-compatible Progressive Learning for Video Copy Detection

2 code implementations • 20 Apr 2023 • Wenhao Wang, Yifan Sun, Yi Yang

Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.

Copy Detection Video Similarity

Paper
Code

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation • 28 May 2023 • Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

Paper
Code

RMP: A Random Mask Pretrain Framework for Motion Prediction

1 code implementation • 16 Sep 2023 • Yi Yang, Qingwen Zhang, Thomas Gilles, Nazre Batool, John Folkesson

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving.

Autonomous Driving motion prediction +1

Paper
Code

Fast and Accurate Factual Inconsistency Detection Over Long Documents

1 code implementation • 19 Oct 2023 • Barrett Martin Lattimer, Patrick Chen, Xinyuan Zhang, Yi Yang

We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy.

Chunking Natural Language Inference +2

Paper
Code

LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport

1 code implementation • 5 Sep 2019 • Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang

To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i. i. d.

BIG-bench Machine Learning Mutual Information Estimation

Paper
Code

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

2 code implementations • NAACL 2022 • Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, Chun Fan

To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled.

Backdoor Attack Sentence

Paper
Code

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation • ICCV 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Ranked #10 on Visual Object Tracking on LaSOT

Object Representation Learning +6

Paper
Code

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

1 code implementation • 10 Jul 2023 • Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang

In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.

Image Registration Semantic Segmentation

Paper
Code

Compositional Feature Augmentation for Unbiased Scene Graph Generation

1 code implementation • ICCV 2023 • Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively.

Graph Generation Relation +1

Paper
Code

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

1 code implementation • 20 Nov 2023 • Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang

Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process.

Ranked #1 on Generalizable Novel View Synthesis on Shiny dataset

Generalizable Novel View Synthesis

Paper
Code

Adversarial Complementary Learning for Weakly Supervised Object Localization

2 code implementations • CVPR 2018 • Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, Thomas Huang

With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together.

Ranked #2 on Weakly-Supervised Object Localization on ILSVRC 2016

General Classification Object +1

Paper
Code

Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification

1 code implementation • 30 Jan 2018 • Qingji Guan, Yaping Huang, Zhun Zhong, Zhedong Zheng, Liang Zheng, Yi Yang

This paper considers the task of thorax disease classification on chest X-ray images.

General Classification

Paper
Code

CNN-RNN: A Unified Framework for Multi-label Image Classification

1 code implementation • CVPR 2016 • Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu

While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image.

Classification General Classification +2

Paper
Code

Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior

1 code implementation • ECCV 2020 • Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang

Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored.

Adversarial Attack Video Classification

Paper
Code

VidFace: A Full-Transformer Solver for Video FaceHallucination with Unaligned Tiny Snapshots

1 code implementation • 31 May 2021 • Yuan Gan, Yawei Luo, Xin Yu, Bang Zhang, Yi Yang

In this paper, we investigate the task of hallucinating an authentic high-resolution (HR) human face from multiple low-resolution (LR) video snapshots.

Face Hallucination Hallucination

Paper
Code

Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning

1 code implementation • 9 Jun 2022 • Yi Yang, Yanqiao Zhu, Hejie Cui, Xuan Kan, Lifang He, Ying Guo, Carl Yang

Specifically, we propose to meta-train the model on datasets of large sample sizes and transfer the knowledge to small datasets.

Meta-Learning

Paper
Code

TransHP: Image Classification with Hierarchical Prompting

1 code implementation • NeurIPS 2023 • Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification Image Classification

Paper
Code

PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis

1 code implementation • 20 May 2023 • Yi Yang, Hejie Cui, Carl Yang

The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways.

Transfer Learning Unsupervised Pre-training

Paper
Code

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation • 3 Jul 2023 • Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.