Search Results for author: Yi Yang

Found 507 papers, 249 papers with code

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

3 code implementations CVPR 2019 Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang

In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small.

Image Classification

Random Erasing Data Augmentation

18 code implementations16 Aug 2017 Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang

In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN).

General Classification Image Augmentation +4

Operation-aware Neural Networks for User Response Prediction

4 code implementations2 Apr 2019 Yi Yang, Baile Xu, Furao Shen, Jian Zhao

Many deep models are proposed to automatically learn high-order feature interactions.

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

3 code implementations14 Apr 2020 Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, Tao Mei

This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain.

Representation Learning Vehicle Re-Identification

Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification

2 code implementations CVPR 2018 Weijian Deng, Liang Zheng, Qixiang Ye, Guoliang Kang, Yi Yang, Jianbin Jiao

To this end, we propose to preserve two types of unsupervised similarities, 1) self-similarity of an image before and after translation, and 2) domain-dissimilarity of a translated source image and a target image.

Generative Adversarial Network Person Re-Identification +2

Segment and Track Anything

1 code implementation11 May 2023 Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

Network Pruning via Transformable Architecture Search

4 code implementations NeurIPS 2019 Xuanyi Dong, Yi Yang

The maximum probability for the size in each distribution serves as the width and depth of the pruned network, whose parameters are learned by knowledge transfer, e. g., knowledge distillation, from the original networks.

Knowledge Distillation Network Pruning +2

Searching for A Robust Neural Architecture in Four GPU Hours

6 code implementations CVPR 2019 Xuanyi Dong, Yi Yang

To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG.

Neural Architecture Search

One-Shot Neural Architecture Search via Self-Evaluated Template Network

4 code implementations ICCV 2019 Xuanyi Dong, Yi Yang

In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates.

Neural Architecture Search

ActBERT: Learning Global-Local Video-Text Representations

1 code implementation CVPR 2020 Linchao Zhu, Yi Yang

In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data.

Action Segmentation Question Answering +5

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

1 code implementation CVPR 2021 Xiaohan Wang, Linchao Zhu, Yi Yang

Moreover, a global alignment method is proposed to provide a global cross-modal measurement that is complementary to the local perspective.

Retrieval Video Retrieval

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations7 Nov 2022 Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations1 Feb 2024 Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

Self-Correction for Human Parsing

2 code implementations22 Oct 2019 Peike Li, Yunqiu Xu, Yunchao Wei, Yi Yang

To tackle the problem of learning with label noises, this work introduces a purification strategy, called Self-Correction for Human Parsing (SCHP), to progressively promote the reliability of the supervised labels as well as the learned models.

Human Parsing Human Part Segmentation +1

Style Aggregated Network for Facial Landmark Detection

1 code implementation CVPR 2018 Xuanyi Dong, Yan Yan, Wanli Ouyang, Yi Yang

In this work, we propose a style-aggregated approach to deal with the large intrinsic variance of image styles for facial landmark detection.

Ranked #2 on Facial Landmark Detection on AFLW-Front (Mean NME metric)

Face Alignment Facial Landmark Detection

Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection

2 code implementations ICCV 2019 Xuanyi Dong, Yi Yang

A typical approach is to (1) train a detector on the labeled images; (2) generate new training samples using this detector's prediction as pseudo labels of unlabeled images; (3) retrain the detector on the labeled samples and partial pseudo labeled samples.

 Ranked #1 on Facial Landmark Detection on 300W (Full) (using extra training data)

Facial Landmark Detection

Supervision by Registration and Triangulation for Landmark Detection

1 code implementation25 Jan 2021 Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu

End-to-end training is made possible by differentiable registration and 3D triangulation modules.

Optical Flow Estimation

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification

1 code implementation ICCV 2021 Yanbin Liu, Juho Lee, Linchao Zhu, Ling Chen, Humphrey Shi, Yi Yang

Most existing few-shot classification methods only consider generalization on one dataset (i. e., single-domain), failing to transfer across various seen and unseen domains.

Classification Domain Generalization

DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments

2 code implementations22 Sep 2018 Chao Yu, Zuxin Liu, Xinjun Liu, Fugui Xie, Yi Yang, Qi Wei, Qiao Fei

It is one of the state-of-the-art SLAM systems in high-dynamic environments.

Robotics

Associating Objects with Transformers for Video Object Segmentation

2 code implementations NeurIPS 2021 Zongxin Yang, Yunchao Wei, Yi Yang

The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources.

Ranked #2 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)

Object One-shot visual object segmentation +2

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations22 Mar 2022 Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Object Segmentation +3

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

2 code implementations18 Oct 2022 Zongxin Yang, Yi Yang

To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

Object Semantic Segmentation +2

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations8 May 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

FinBERT: A Pretrained Language Model for Financial Communications

1 code implementation15 Jun 2020 Yi Yang, Mark Christopher Siy UY, Allen Huang

Contextual pretrained language models, such as BERT (Devlin et al., 2019), have made significant breakthrough in various NLP tasks by training on large scale of unlabeled text re-sources. Financial sector also accumulates large amount of financial communication text. However, there is no pretrained finance specific language models available.

Language Modelling Sentiment Analysis +1

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

3 code implementations27 Feb 2020 Zhedong Zheng, Yunchao Wei, Yi Yang

To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i. e., drone-view target localization and drone navigation.

Drone navigation Drone-view target localization +2

Unsupervised Scene Adaptation with Memory Regularization in vivo

2 code implementations24 Dec 2019 Zhedong Zheng, Yi Yang

We consider the unsupervised scene adaptation problem of learning from both labeled source data and unlabeled target data.

Semantic Segmentation Synthetic-to-Real Translation +1

Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation

3 code implementations8 Mar 2020 Zhedong Zheng, Yi Yang

This paper focuses on the unsupervised domain adaptation of transferring the knowledge from the source domain to the target domain in the context of semantic segmentation.

Pseudo Label Segmentation +3

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

6 code implementations21 Aug 2018 Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang

Therefore, the network trained by our method has a larger model capacity to learn from the training data.

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

1 code implementation8 Feb 2024 Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).

Attribute Conditional Text-to-Image Synthesis +1

Deep Hierarchical Semantic Segmentation

3 code implementations CVPR 2022 Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang

In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.

Multi-Label Classification Segmentation +1

Contrastive Adaptation Network for Unsupervised Domain Adaptation

2 code implementations CVPR 2019 Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain.

Unsupervised Domain Adaptation

A Discriminatively Learned CNN Embedding for Person Re-identification

4 code implementations17 Nov 2016 Zhedong Zheng, Liang Zheng, Yi Yang

We revisit two popular convolutional neural networks (CNN) in person re-identification (re-ID), i. e, verification and classification models.

General Classification Image Retrieval +2

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models

3 code implementations5 Oct 2022 Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature, class).

Segmentation Semantic Segmentation

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

2 code implementations29 Jan 2024 Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving.

Autonomous Driving

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations CVPR 2023 Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Action Classification Action Recognition +3

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

4 code implementations CVPR 2019 Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang

If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality.

Ranked #6 on Text-to-Image Generation on CUB (Inception score metric)

Generative Adversarial Network Text-to-Image Generation

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

1 code implementation29 Nov 2022 Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou

In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.

 Ranked #1 on Novel View Synthesis on NeRF (Average PSNR metric)

3D Reconstruction Neural Rendering +1

PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures

1 code implementation8 Apr 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

To address this limitation and maximize the potential of each architecture, we propose Progressive Volume Distillation with Active Learning (PVD-AL), a systematic distillation method that enables any-to-any conversions between different architectures.

3D Reconstruction Novel View Synthesis

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation NeurIPS 2023 Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Few-Shot Learning

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

2 code implementations14 May 2017 Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, Weiran He

In this work, we propose a model that can learn object transfiguration from two unpaired sets of images: one set containing images that "have" that kind of object, and the other set being the opposite, with the mild constraint that the objects be located approximately at the same place.

Attribute Conditional Image Generation +1

PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing

2 code implementations18 Oct 2019 Hehe Fan, Yi Yang

We apply PointRNN, PointGRU and PointLSTM to moving point cloud prediction, which aims to predict the future trajectories of points in a set given their history movements.

Moving Point Cloud Processing

LGSDF: Continual Global Learning of Signed Distance Fields Aided by Local Updating

2 code implementations8 Apr 2024 Yufeng Yue, Yinan Deng, Jiahui Wang, Yi Yang

Implicit reconstruction of ESDF (Euclidean Signed Distance Field) involves training a neural network to regress the signed distance from any point to the nearest obstacle, which has the advantages of lightweight storage and continuous querying.

Self-Supervised Learning

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

1 code implementation ICCV 2023 Liangqi Li, Jiaxu Miao, Dahu Shi, Wenming Tan, Ye Ren, Yi Yang, ShiLiang Pu

Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability.

Knowledge Distillation Language Modelling +2

D$^2$LV: A Data-Driven and Local-Verification Approach for Image Copy Detection

1 code implementation13 Nov 2021 Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang

In this paper, a data-driven and local-verification (D$^2$LV) approach is proposed to compete for Image Similarity Challenge: Matching Track at NeurIPS'21.

Copy Detection Unsupervised Pre-training

Lane Detection in Low-light Conditions Using an Efficient Data Enhancement : Light Conditions Style Transfer

1 code implementation4 Feb 2020 Tong Liu, Zhaowei Chen, Yi Yang, Zehao Wu, Haowei Li

Nowadays, deep learning techniques are widely used for lane detection, but application in low-light conditions remains a challenge until this day.

Lane Detection Multi-Task Learning +1

Generalizing A Person Retrieval Model Hetero- and Homogeneously

1 code implementation ECCV 2018 Zhun Zhong, Liang Zheng, Shaozi Li, Yi Yang

Person re-identification (re-ID) poses unique challenges for unsupervised domain adaptation (UDA) in that classes in the source and target sets (domains) are entirely different and that image variations are largely caused by cameras.

Person Re-Identification Person Retrieval +2

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation17 May 2023 Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Denoising Image Generation +1

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

1 code implementation8 Oct 2018 Yang Wang, Zhenheng Yang, Peng Wang, Yi Yang, Chenxu Luo, Wei Xu

Then the whole scene is decomposed into moving foreground and static background by compar- ing the estimated optical flow and rigid flow derived from the depth and ego-motion.

Motion Estimation Optical Flow Estimation

Gated Channel Transformation for Visual Recognition

3 code implementations CVPR 2020 Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

General Classification Image Classification +5

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

1 code implementation CVPR 2022 Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang

In contrast, our large-scale VIdeo Panoptic Segmentation in the Wild (VIPSeg) dataset provides 3, 536 videos and 84, 750 frames with pixel-level panoptic annotations, covering a wide range of real-world scenarios and categories.

Segmentation Video Panoptic Segmentation

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

1 code implementation2 May 2022 Shuai Zhao, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.

Ranked #11 on Video Retrieval on MSVD (using extra training data)

Clustering Retrieval +1

What You Say and How You Say It Matters: Predicting Stock Volatility Using Verbal and Vocal Cues

1 code implementation ACL 2019 Yu Qin, Yi Yang

Prior research has shown that textual information in a firm{'}s financial statement can be used to predict its stock{'}s risk level.

Visual Abductive Reasoning

1 code implementation CVPR 2022 Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.

Benchmarking Sentence +1

SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation

1 code implementation22 Oct 2018 Xiaolin Zhang, Yunchao Wei, Yi Yang, Thomas Huang

In this way, the possibilities embedded in the produced similarity maps can be adapted to guide the process of segmenting objects.

Few-Shot Semantic Segmentation Segmentation +1

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

1 code implementation ICCV 2015 Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task.

Image Captioning Novel Concepts +1

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

2 code implementations20 Dec 2014 Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions.

8k Image Captioning +1

Domain Consensus Clustering for Universal Domain Adaptation

1 code implementation CVPR 2021 Guangrui Li, Guoliang Kang, Yi Zhu, Yunchao Wei, Yi Yang

To better exploit the intrinsic structure of the target domain, we propose Domain Consensus Clustering (DCC), which exploits the domain consensus knowledge to discover discriminative clusters on both common samples and private ones.

Clustering domain classification +3

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation6 Mar 2023 Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Image Captioning Text Generation

Bag of Tricks and A Strong baseline for Image Copy Detection

1 code implementation13 Nov 2021 Wenhao Wang, Weipu Zhang, Yifan Sun, Yi Yang

In this paper, a bag of tricks and a strong baseline are proposed for image copy detection.

Copy Detection Unsupervised Pre-training

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

1 code implementation3 Mar 2022 Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Yi Yang, Ling-Yu Duan

Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains for a smooth knowledge transfer.

Domain Generalization Person Re-Identification +1

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation23 Dec 2023 MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Content-Consistent Matching for Domain Adaptive Semantic Segmentation

1 code implementation ECCV 2020 Guangrui Li, Guoliang Kang, Wu Liu, Yunchao Wei, Yi Yang

The target of CCM is to acquire those synthetic images that share similar distribution with the real ones in the target domain, so that the domain gap can be naturally alleviated by employing the content-consistent synthetic images for training.

Domain Adaptation Semantic Segmentation +1

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

1 code implementation31 May 2021 Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang

In this paper, we apply one new modality, i. e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario.

Language Modelling Management +2

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

1 code implementation CVPR 2022 Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, Yi Yang

To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.

3D-Aware Image Synthesis

Few-Example Object Detection with Model Communication

1 code implementation26 Jun 2017 Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng

Experiments on PASCAL VOC'07, MS COCO'14, and ILSVRC'13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.

Object object-detection

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation NeurIPS 2023 Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

1 code implementation ICLR 2021 Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.

3D Action Recognition Semantic Segmentation

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation CVPR 2023 Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

Very Long Natural Scenery Image Prediction by Outpainting

1 code implementation ICCV 2019 Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input.

Image Inpainting Image Outpainting

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

1 code implementation14 Nov 2022 Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua

In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.

Self-Supervised Learning Semantic Segmentation +2

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective

1 code implementation14 Dec 2020 Xuanmeng Zhang, Minyue Jiang, Zhedong Zheng, Xiao Tan, Errui Ding, Yi Yang

We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph.

Drone-view target localization Image Retrieval +4

Nanophotonic Particle Simulation and Inverse Design Using Artificial Neural Networks

1 code implementation18 Oct 2017 John Peurifoy, Yichen Shen, Li Jing, Yi Yang, Fidel Cano-Renteria, Brendan Delacy, Max Tegmark, John D. Joannopoulos, Marin Soljacic

We propose a method to use artificial neural networks to approximate light scattering by multilayer nanoparticles.

Computational Physics Applied Physics Optics

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization

1 code implementation26 Aug 2020 Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, Yi Yang

Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center, but underestimate the contextual information in neighbor areas.

Drone navigation Drone-view target localization +2

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

1 code implementation10 Mar 2024 Wenhao Wang, Yifan Sun, Yi Yang

However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts.

Copy Detection Image Generation +3

Dialog Intent Induction with Deep Multi-View Clustering

1 code implementation IJCNLP 2019 Hugh Perkins, Yi Yang

We introduce the dialog intent induction task and present a novel deep multi-view clustering approach to tackle the problem.

Clustering Representation Learning

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation1 Feb 2024 Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

1 code implementation27 Apr 2022 Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.

3D Reconstruction Person Re-Identification +2

Dynamic Computational Time for Visual Attention

1 code implementation30 Mar 2017 Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, Shilei Wen, Wei Xu

We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM).

reinforcement-learning Reinforcement Learning (RL)

Personalized Video Recommendation Using Rich Contents from Videos

1 code implementation21 Dec 2016 Xingzhong Du, Hongzhi Yin, Ling Chen, Yang Wang, Yi Yang, Xiaofang Zhou

In the existing video recommender systems, the models make the recommendations based on the user-video interactions and single specific content features.

Recommendation Systems

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation23 May 2023 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Ruijie Quan, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

 Ranked #1 on Scene Text Recognition on WOST (using extra training data)

Language Modelling Scene Text Recognition

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning

1 code implementation15 Sep 2023 Yi Yang, Yixuan Tang, Kar Yan Tam

We present a new financial domain large language model, InvestLM, tuned on LLaMA-65B (Touvron et al., 2023), using a carefully curated instruction dataset related to financial investment.

Language Modelling Large Language Model

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation CVPR 2023 Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

3D human pose and shape estimation

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

1 code implementation EMNLP 2020 Yi Yang, Arzoo Katiyar

We present a simple few-shot named entity recognition (NER) system based on nearest neighbor learning and structured inference.

Few-shot NER Meta-Learning +1

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

2 code implementations CVPR 2023 Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, WangMeng Zuo

Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.

Contrastive Learning Deblurring +2

Convolutional Neural Networks with Recurrent Neural Filters

2 code implementations EMNLP 2018 Yi Yang

We introduce a class of convolutional neural networks (CNNs) that utilize recurrent neural networks (RNNs) as convolution filters.

Sentence Sentiment Analysis

Unified Transformer Tracker for Object Tracking

1 code implementation CVPR 2022 Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

Overcoming Language Variation in Sentiment Analysis with Social Attention

1 code implementation TACL 2017 Yi Yang, Jacob Eisenstein

Variation in language is ubiquitous, particularly in newer forms of writing such as social media.

Sentiment Analysis

Query Attack via Opposite-Direction Feature:Towards Robust Image Retrieval

2 code implementations7 Sep 2018 Zhedong Zheng, Liang Zheng, Yi Yang, Fei Wu

Opposite-Direction Feature Attack (ODFA) effectively exploits feature-level adversarial gradients and takes advantage of feature distance in the representation space.

Adversarial Attack General Classification +3

MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation8 Jan 2024 Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Object object-detection +1

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

1 code implementation16 Oct 2015 Linnan Wang, Wei Wu, Jianxiong Xiao, Yi Yang

Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing.

Distributed, Parallel, and Cluster Computing

Query-efficient Meta Attack to Deep Neural Networks

1 code implementation ICLR 2020 Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, Jiashi Feng

Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries.

Adversarial Attack Meta-Learning

Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

1 code implementation13 Sep 2021 Yi Yang, Daoye Zhu, Tengteng Qu, Qiangyu Wang, Fuhu Ren, Chengqi Cheng

In the experiments, the proposed method is applied to ResNet and UNet, and the adjusted networks are verified on three very diverse benchmark data sets (i. e., Houston2018 data, Berlin data, and MUUFL data).

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

1 code implementation27 Jan 2024 Yixuan Tang, Yi Yang

We hope MultiHop-RAG will be a valuable resource for the community in developing effective RAG systems, thereby facilitating greater adoption of LLMs in practice.

Benchmarking Retrieval

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

1 code implementation14 Mar 2024 Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments.

Zero-Shot Learning

DerainCycleGAN: Rain Attentive CycleGAN for Single Image Deraining and Rainmaking

1 code implementation15 Dec 2019 Yanyan Wei, Zhao Zhang, Yang Wang, Mingliang Xu, Yi Yang, Shuicheng Yan, Meng Wang

However, in practice it is rather common to have no un-paired images in real deraining task, in such cases how to remove the rain streaks in an unsupervised way will be a very challenging task due to lack of constraints between images and hence suffering from low-quality recovery results.

Single Image Deraining

Feature-Proxy Transformer for Few-Shot Segmentation

2 code implementations13 Oct 2022 Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head.

Few-Shot Semantic Segmentation Segmentation +1

RFNet: Region-Aware Fusion Network for Incomplete Multi-Modal Brain Tumor Segmentation

1 code implementation ICCV 2021 Yuhang Ding, Xin Yu, Yi Yang

In this work, we propose a Region-aware Fusion Network (RFNet) that is able to exploit different combinations of multi-modal data adaptively and effectively for tumor segmentation.

Brain Tumor Segmentation Segmentation +1

Attract or Distract: Exploit the Margin of Open Set

1 code implementation ICCV 2019 Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang

In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.

Domain Adaptation

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation16 Jan 2024 Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.

Scheduling

Unsupervised Domain Adaptation with Feature Embeddings

1 code implementation14 Dec 2014 Yi Yang, Jacob Eisenstein

Representation learning is the dominant technique for unsupervised domain adaptation, but existing approaches often require the specification of "pivot features" that generalize across domains, which are selected by task-specific heuristics.

Representation Learning Unsupervised Domain Adaptation

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

1 code implementation ICCV 2021 Yikai Wang, Yi Yang, Fuchun Sun, Anbang Yao

In the low-bit quantization field, training Binary Neural Networks (BNNs) is the extreme solution to ease the deployment of deep models on resource-constrained devices, having the lowest storage cost and significantly cheaper bit-wise operations compared to 32-bit floating-point counterparts.

Quantization

Adaptive Exploration for Unsupervised Person Re-Identification

1 code implementation9 Jul 2019 Yuhang Ding, Hehe Fan, Mingliang Xu, Yi Yang

However, a problem of the adaptive selection is that, when an image has too many neighborhoods, it is more likely to attract other images as its neighborhoods.

Unsupervised Person Re-Identification

Inter-Image Communication for Weakly Supervised Localization

1 code implementation ECCV 2020 Xiaolin Zhang, Yunchao Wei, Yi Yang

We learn a feature center for each category and realize the global feature consistency by forcing the object features to approach class-specific centers.

Object

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

1 code implementation1 Jul 2022 Naiyuan Liu, Xiaohan Wang, Xiaobo Li, Yi Yang, Yueting Zhuang

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022.

Data Augmentation Natural Language Queries

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation ICCV 2023 Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation CVPR 2022 Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

SEEG: Semantic Energized Co-Speech Gesture Generation

1 code implementation CVPR 2022 Yuanzhi Liang, Qianyu Feng, Linchao Zhu, Li Hu, Pan Pan, Yi Yang

Talking gesture generation is a practical yet challenging task which aims to synthesize gestures in line with speech.

Gesture Generation

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation29 May 2023 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation18 Sep 2023 Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

Removing Raindrops and Rain Streaks in One Go

1 code implementation CVPR 2021 Ruijie Quan, Xin Yu, Yuanzhi Liang, Yi Yang

First, we propose a complementary cascaded network architecture, namely CCN, to remove rain streaks and raindrops in a unified framework.

Neural Architecture Search Rain Removal

Vector-Decomposed Disentanglement for Domain-Invariant Object Detection

1 code implementation ICCV 2021 Aming Wu, Rui Liu, Yahong Han, Linchao Zhu, Yi Yang

Secondly, domain-specific representations are introduced as the differences between the input and domain-invariant representations.

Disentanglement Object +2

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation26 Jul 2022 Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

1 code implementation CVPR 2023 Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must.

Question Answering Video Question Answering +2

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

1 code implementation16 Apr 2022 Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, Jun Xiao

A thriving trend for domain adaptive segmentation endeavors to generate the high-quality pseudo labels for target domain and retrain the segmentor on them.

Pseudo Label Semantic Segmentation +2

Collective Entity Disambiguation with Structured Gradient Tree Boosting

1 code implementation NAACL 2018 Yi Yang, Ozan .Irsoy, Kazi Shefaet Rahman

To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation.

Entity Disambiguation

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

1 code implementation26 Aug 2021 Wuyang Chen, Xinyu Gong, Junru Wu, Yunchao Wei, Humphrey Shi, Zhicheng Yan, Yi Yang, Zhangyang Wang

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation.

Neural Architecture Search

Context-Aware Pretraining for Efficient Blind Image Decomposition

1 code implementation CVPR 2023 Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang

(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.

Attribute Image Reconstruction +1

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation CVPR 2022 Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Tele-Knowledge Pre-training for Fault Analysis

1 code implementation20 Oct 2022 Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, YingYing Li, Lei Cheng, Huajun Chen

In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.

Language Modelling

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation4 Sep 2023 Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #3 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

Universal-Prototype Enhancing for Few-Shot Object Detection

1 code implementation ICCV 2021 Aming Wu, Yahong Han, Linchao Zhu, Yi Yang

Thus, we develop a new framework of few-shot object detection with universal prototypes ({FSOD}^{up}) that owns the merit of feature generalization towards novel objects.

Few-Shot Object Detection Meta-Learning +3

Joint Representation Learning and Keypoint Detection for Cross-view Geo-localization

1 code implementation IEEE Transactions on Image Processing (TIP) 2022 Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, Nicu Sebe

Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network.

Drone navigation Drone-view target localization +3

Gloss-Free End-to-End Sign Language Translation

1 code implementation22 May 2023 Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation ICCV 2023 Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

1 code implementation1 Nov 2021 Weixin Xu, Zipeng Feng, Shuangkang Fang, Song Yuan, Yi Yang, Shuchang Zhou

For example, Transformer Networks do not have native support on many popular chips, and hence are difficult to deploy.

Image Classification Machine Translation +2

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

1 code implementation29 Mar 2022 Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.

Contrastive Learning Semantic Segmentation +3

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

1 code implementation24 May 2022 Wenhao Wang, Yifan Sun, Yi Yang

Moreover, this paper further reveals a unique difficulty for solving the hard negative problem in ICD, i. e., there is a fundamental conflict between current metric learning and ICD.

Copy Detection Metric Learning

Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks

2 code implementations22 Aug 2018 Yang He, Xuanyi Dong, Guoliang Kang, Yanwei Fu, Chenggang Yan, Yi Yang

With asymptotic pruning, the information of the training set would be gradually concentrated in the remaining filters, so the subsequent training and pruning process would be stable.

Image Classification

Self-supervised Point Cloud Representation Learning via Separating Mixed Shapes

1 code implementation1 Sep 2021 Chao Sun, Zhedong Zheng, Xiaohan Wang, Mingliang Xu, Yi Yang

Albeit simple, the pre-trained encoder can capture the key points of an unseen point cloud and surpasses the encoder trained from scratch on downstream tasks.

3D Part Segmentation 3D Point Cloud Classification +3

Improving Annotation for 3D Pose Dataset of Fine-Grained Object Categories

2 code implementations19 Oct 2018 Yaming Wang, Xiao Tan, Yi Yang, Ziyu Li, Xiao Liu, Feng Zhou, Larry S. Davis

Existing 3D pose datasets of object categories are limited to generic object types and lack of fine-grained information.

3D Pose Estimation Object +1

GIF: A General Graph Unlearning Strategy via Influence Function

1 code implementation6 Apr 2023 Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He

Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data.

Machine Unlearning

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation19 Jan 2024 Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

1 code implementation22 Mar 2024 Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.

UTS submission to Google YouTube-8M Challenge 2017

1 code implementation13 Jul 2017 Linchao Zhu, Yanbin Liu, Yi Yang

In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017.

Classification General Classification +1

Connective Cognition Network for Directional Visual Commonsense Reasoning

1 code implementation NeurIPS 2019 Aming Wu, Linchao Zhu, Yahong Han, Yi Yang

Inspired by this idea, towards VCR, we propose a connective cognition network (CCN) to dynamically reorganize the visual neuron connectivity that is contextualized by the meaning of questions and answers.

Sentence Visual Commonsense Reasoning

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

1 code implementation5 Aug 2022 Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei

In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.

Instance Segmentation Semantic Segmentation +1

Feature-compatible Progressive Learning for Video Copy Detection

2 code implementations20 Apr 2023 Wenhao Wang, Yifan Sun, Yi Yang

Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.

Copy Detection Video Similarity

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation28 May 2023 Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

RMP: A Random Mask Pretrain Framework for Motion Prediction

1 code implementation16 Sep 2023 Yi Yang, Qingwen Zhang, Thomas Gilles, Nazre Batool, John Folkesson

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving.

Autonomous Driving motion prediction +1

Fast and Accurate Factual Inconsistency Detection Over Long Documents

1 code implementation19 Oct 2023 Barrett Martin Lattimer, Patrick Chen, Xinyuan Zhang, Yi Yang

We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy.

Chunking Natural Language Inference +2

LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport

1 code implementation5 Sep 2019 Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang

To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i. i. d.

BIG-bench Machine Learning Mutual Information Estimation

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

2 code implementations NAACL 2022 Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu, Yi Yang, Shangwei Guo, Chun Fan

To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled.

Backdoor Attack Sentence

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation ICCV 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Object Representation Learning +6

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

1 code implementation10 Jul 2023 Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang

In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.

Image Registration Semantic Segmentation

Compositional Feature Augmentation for Unbiased Scene Graph Generation

1 code implementation ICCV 2023 Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively.

Graph Generation Relation +1

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

1 code implementation20 Nov 2023 Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang

Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process.

Generalizable Novel View Synthesis

Adversarial Complementary Learning for Weakly Supervised Object Localization

2 code implementations CVPR 2018 Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, Thomas Huang

With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together.

General Classification Object +1

CNN-RNN: A Unified Framework for Multi-label Image Classification

1 code implementation CVPR 2016 Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu

While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image.

Classification General Classification +2

Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior

1 code implementation ECCV 2020 Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang

Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored.

Adversarial Attack Video Classification

VidFace: A Full-Transformer Solver for Video FaceHallucination with Unaligned Tiny Snapshots

1 code implementation31 May 2021 Yuan Gan, Yawei Luo, Xin Yu, Bang Zhang, Yi Yang

In this paper, we investigate the task of hallucinating an authentic high-resolution (HR) human face from multiple low-resolution (LR) video snapshots.

Face Hallucination Hallucination

Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning

1 code implementation9 Jun 2022 Yi Yang, Yanqiao Zhu, Hejie Cui, Xuan Kan, Lifang He, Ying Guo, Carl Yang

Specifically, we propose to meta-train the model on datasets of large sample sizes and transfer the knowledge to small datasets.

Meta-Learning

TransHP: Image Classification with Hierarchical Prompting

1 code implementation NeurIPS 2023 Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification Image Classification

PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis

1 code implementation20 May 2023 Yi Yang, Hejie Cui, Carl Yang

The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways.

Transfer Learning Unsupervised Pre-training

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation3 Jul 2023 Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.