Search Results for author: Yan Lu

Found 102 papers, 33 papers with code

A General Theory for Compositional Generalization

no code implementations • 20 May 2024 • Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng

Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement.

Paper
Add Code

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

no code implementations • 13 May 2024 • Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng

Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace.

Scene Text Detection Text Detection

Paper
Add Code

Uncertainty-Aware Deep Video Compression with Ensembles

no code implementations • 28 Mar 2024 • Wufei Ma, Jiahao Li, Bin Li, Yan Lu

Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error.

Motion Estimation Quantization +1

Paper
Add Code

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

no code implementations • 19 Mar 2024 • Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo

The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved.

Language Modelling

Paper
Add Code

Neural Video Compression with Feature Modulation

1 code implementation • 27 Feb 2024 • Jiahao Li, Bin Li, Yan Lu

This results in a better learning of the quantization scaler and helps our NVC support about 11. 4 dB PSNR range.

Blocking Quantization +1

336

Paper
Code

Slot-VLM: SlowFast Slots for Video-Language Modeling

no code implementations • 20 Feb 2024 • Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs.

Language Modelling Object +3

Paper
Add Code

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

no code implementations • 15 Feb 2024 • Tao Yang, Cuiling Lan, Yan Lu, Nanning Zheng

Disentangled representation learning strives to extract the intrinsic factors within observed data.

Disentanglement Image Reconstruction +1

Paper
Add Code

Retrieval-based Video Language Model for Efficient Long Video Question Answering

no code implementations • 8 Dec 2023 • Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

To address these issues, we introduce a simple yet effective retrieval-based video language model (R-VLM) for efficient and interpretable long video QA.

Language Modelling Natural Language Understanding +4

Paper
Add Code

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

128

Paper
Code

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

no code implementations • 7 Oct 2023 • Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu

In this work, we build a multimodal model to ground natural language instructions in given UI screenshots as a generic UI task automation executor.

Decoder document understanding +1

Paper
Add Code

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

3 code implementations • 29 Sep 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.

Quantization

Paper
Code

Breaking through the learning plateaus of in-context learning in Transformer

no code implementations • 12 Sep 2023 • Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

To study the mechanism behind the learning plateaus, we conceptually seperate a component within the model's internal representation that is exclusively affected by the model's weights.

In-Context Learning Representation Learning

Paper
Add Code

Transferability analysis of data-driven additive manufacturing knowledge: a case study between powder bed fusion and directed energy deposition

no code implementations • 12 Sep 2023 • Mutahar Safdar, Jiarui Xie, Hyunwoong Ko, Yan Lu, Guy Lamouche, Yaoyao Fiona Zhao

The framework consists of pre-transfer, transfer, and post-transfer steps to accomplish knowledge transfer.

Transfer Learning

Paper
Add Code

Efficient View Synthesis with Neural Radiance Distribution Field

no code implementations • ICCV 2023 • Yushuang Wu, Xiao Li, Jinglu Wang, Xiaoguang Han, Shuguang Cui, Yan Lu

Specifically, we use a small network similar to NeRF while preserving the rendering speed with a single network forwarding per pixel as in NeLF.

Paper
Add Code

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

1,341

Paper
Code

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

1 code implementation • 31 Jul 2023 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

Ranked #1 on zero-shot long video global-mode question answering on MovieChat-1K

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +10

426

Paper
Code

Adaptive Frequency Filters As Efficient Global Token Mixers

2 code implementations • ICCV 2023 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo

With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer.

123

Paper
Code

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

no code implementations • 19 Jun 2023 • Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.

Paper
Add Code

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

no code implementations • 16 Jun 2023 • Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu

Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space.

Ranked #91 on 3D Human Pose Estimation on Human3.6M

3D Human Pose Estimation

Paper
Add Code

Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators

no code implementations • 2 Jun 2023 • Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu

In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e. g., the protection of users' privacy).

Prompt Engineering

Paper
Add Code

Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization

no code implementations • 29 May 2023 • Tao Yang, Yuwang Wang, Cuiling Lan, Yan Lu, Nanning Zheng

In this paper, we study several typical disentangled representation learning works in terms of both disentanglement and compositional generalization abilities, and we provide an important insight: vector-based representation (using a vector instead of a scalar to represent a concept) is the key to empower both good disentanglement and strong compositional generalization.

Disentanglement

Paper
Add Code

ReSup: Reliable Label Noise Suppression for Facial Expression Recognition

1 code implementation • 29 May 2023 • Xiang Zhang, Yan Lu, Huan Yan, Jingyang Huang, Yusheng Ji, Yu Gu

To further enhance the reliability of our noise decision results, ReSup uses two networks to jointly achieve noise suppression.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Code

ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

no code implementations • 26 May 2023 • Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu

To begin with, we are among the first to comprehensively investigate mainstream KD techniques on DNS models to resolve the two challenges.

Knowledge Distillation

Paper
Add Code

Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification

no code implementations • 10 May 2023 • Xulin Li, Yan Lu, Bin Liu, Yuenan Hou, Yating Liu, Qi Chu, Wanli Ouyang, Nenghai Yu

Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID).

Clothes Changing Person Re-Identification

Paper
Add Code

High-Fidelity and Freely Controllable Talking Head Video Generation

no code implementations • CVPR 2023 • Yue Gao, Yuan Zhou, Jinglu Wang, Xiao Li, Xiang Ming, Yan Lu

Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion.

Face Model Talking Head Generation +2

Paper
Add Code

Mask-Based Modeling for Neural Radiance Fields

1 code implementation • 11 Apr 2023 • Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, Dong Liu

Most Neural Radiance Fields (NeRFs) exhibit limited generalization capabilities, which restrict their applicability in representing multiple scenes using a single model.

Representation Learning

Paper
Code

Two-shot Video Object Segmentation

1 code implementation • CVPR 2023 • Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu

The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.

Object Pseudo Label +5

Paper
Code

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

no code implementations • CVPR 2023 • Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu

The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.

3D Reconstruction

Paper
Add Code

Unifying Layout Generation with a Decoupled Diffusion Model

no code implementations • CVPR 2023 • Mude Hui, Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yuwang Wang, Yan Lu

Since different attributes have their individual semantics and characteristics, we propose to decouple the diffusion processes for them to improve the diversity of training samples and learn the reverse process jointly to exploit global-scope contexts for facilitating generation.

Paper
Add Code

Neural Video Compression with Diverse Contexts

2 code implementations • CVPR 2023 • Jiahao Li, Bin Li, Yan Lu

Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR.

Optical Flow Estimation Video Compression

336

Paper
Code

Time-Variance Aware Real-Time Speech Enhancement

no code implementations • 25 Feb 2023 • Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

Time-variant factors often occur in real-world full-duplex communication applications.

Acoustic echo cancellation Speech Enhancement

Paper
Add Code

Real-time speech enhancement with dynamic attention span

no code implementations • 21 Feb 2023 • Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge.

Acoustic echo cancellation Decoder +1

Paper
Add Code

EVC: Towards Real-Time Neural Image Compression with Mask Decay

1 code implementation • 10 Feb 2023 • Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu

Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder.

Decoder Image Compression +1

336

Paper
Code

Versatile Neural Processes for Learning Implicit Neural Representations

1 code implementation • 21 Jan 2023 • Zongyu Guo, Cuiling Lan, Zhizheng Zhang, Yan Lu, Zhibo Chen

In this paper, we propose an efficient NP framework dubbed Versatile Neural Processes (VNP), which largely increases the capability of approximating functions.

Decoder

Paper
Code

Motion Information Propagation for Neural Video Compression

no code implementations • CVPR 2023 • Linfeng Qi, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

Meanwhile, besides assisting frame coding at the current time step, the feature from context generation will be propagated as motion condition when coding the subsequent motion latent.

Decoder Video Compression

Paper
Add Code

VideoTrack: Learning To Track Objects via Video Transformer

no code implementations • CVPR 2023 • Fei Xie, Lei Chu, Jiahao Li, Yan Lu, Chao Ma

Existing Siamese tracking methods, which are built on pair-wise matching between two single frames, heavily rely on additional sophisticated mechanism to exploit temporal information among successive video frames, hindering them from high efficiency and industrial deployments.

Visual Tracking

Paper
Add Code

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Paper
Add Code

Robust Referring Video Object Segmentation with Cyclic Structural Consensus

no code implementations • ICCV 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu

Our model achieves state-of-the-art performance on R-VOS benchmarks, Ref-DAVIS17 and Ref-Youtube-VOS, and also our RRYTVOS dataset.

Object Referring Video Object Segmentation +2

Paper
Add Code

Disentangled Feature Learning for Real-Time Neural Speech Coding

no code implementations • 22 Nov 2022 • Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs.

Disentanglement Voice Conversion

Paper
Add Code

Estimating Neural Reflectance Field from Radiance Field using Tree Structures

no code implementations • 9 Oct 2022 • Xiu Li, Xiao Li, Yan Lu

A high-quality NeRF decomposition relies on good geometry information extraction as well as good prior terms to properly resolve ambiguities between different components.

Paper
Add Code

Alignment-guided Temporal Attention for Video Action Recognition

no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu

Temporal modeling is crucial for various video learning tasks.

Action Recognition Attribute +1

Paper
Add Code

A Comprehensive Review of Digital Twin -- Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives

no code implementations • 27 Aug 2022 • Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu

This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions.

Uncertainty Quantification

Paper
Add Code

A Comprehensive Review of Digital Twin -- Part 1: Modeling and Twinning Enabling Technologies

no code implementations • 26 Aug 2022 • Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu

In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.

Uncertainty Quantification

Paper
Add Code

Neural Capture of Animatable 3D Human from Monocular Video

no code implementations • 18 Aug 2022 • Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu

We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.

Paper
Add Code

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

no code implementations • 1 Aug 2022 • Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.

counterfactual Person Re-Identification

Paper
Add Code

Latent-Domain Predictive Neural Speech Coding

no code implementations • 18 Jul 2022 • Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods.

Quantization

Paper
Add Code

Neighbor Correspondence Matching for Flow-based Video Frame Synthesis

no code implementations • 14 Jul 2022 • Zhaoyang Jia, Yan Lu, Houqiang Li

Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.

Ranked #2 on Video Frame Interpolation on X4K1000FPS

4k Video Compression +1

Paper
Add Code

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

1 code implementation • 13 Jul 2022 • Jiahao Li, Bin Li, Yan Lu

Besides estimating the probability distribution, our entropy model also generates the quantization step at spatial-channel-wise.

Quantization Video Compression

336

Paper
Code

Online Video Instance Segmentation via Robust Context Fusion

no code implementations • 12 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, Yan Lu

We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.

Instance Segmentation Segmentation +2

Paper
Add Code

Cross-Scale Vector Quantization for Scalable Neural Speech Coding

no code implementations • 7 Jul 2022 • Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement.

Quantization

Paper
Add Code

Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus

1 code implementation • 4 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu

Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression.

Ranked #11 on Referring Video Object Segmentation on Refer-YouTube-VOS

Referring Expression Segmentation Referring Video Object Segmentation +2

Paper
Code

Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation

no code implementations • 4 Jul 2022 • Xiaoyu Wang, Xiangyu Kong, Xiulian Peng, Yan Lu

In this paper we propose a multi-modal multi-correlation learning framework targeting at the task of audio-visual speech separation.

Contrastive Learning Speech Separation

Paper
Add Code

Towards Robust Video Object Segmentation with Adaptive Object Calibration

1 code implementation • 2 Jul 2022 • Xiaohao Xu, Jinglu Wang, Xiang Ming, Yan Lu

We consolidate this conditional mask calibration process in a progressive manner, where the object representations and proto-masks evolve to be discriminative iteratively.

Ranked #1 on Visual Object Tracking on YouTube-VOS

Object Segmentation +5

Paper
Code

Turbo: Opportunistic Enhancement for Edge Video Analytics

no code implementations • 29 Jun 2022 • Yan Lu, Shiqi Jiang, Ting Cao, Yuanchao Shu

Edge computing is being widely used for video analytics.

Edge-computing object-detection +1

Paper
Add Code

Visual Concepts Tokenization

2 code implementations • 20 May 2022 • Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts.

Representation Learning

Paper
Code

Test-time Batch Normalization

no code implementations • 20 May 2022 • Tao Yang, Shenglong Zhou, Yuwang Wang, Yan Lu, Nanning Zheng

Deep neural networks often suffer the data distribution shift between training and testing, and the batch statistics are observed to reflect the shift.

Domain Generalization

Paper
Add Code

Self-Supervised Image Representation Learning with Geometric Set Consistency

no code implementations • CVPR 2022 • Nenglun Chen, Lei Chu, Hao Pan, Yan Lu, Wenping Wang

We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.

Contrastive Learning Instance Segmentation +5

Paper
Add Code

Deep Frequency Filtering for Domain Generalization

no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.

Domain Generalization Retrieval

Paper
Add Code

Semantic-aligned Fusion Transformer for One-shot Object Detection

no code implementations • CVPR 2022 • Yizhou Zhao, Xun Guo, Yan Lu

One-shot object detection aims at detecting novel objects according to merely one given instance.

Attribute Object +2

Paper
Add Code

Neural Compression-Based Feature Learning for Video Restoration

no code implementations • CVPR 2022 • Cong Huang, Jiahao Li, Bin Li, Dong Liu, Yan Lu

The temporal features usually contain various noisy and uncorrelated information, and they may interfere with the restoration of the current frame.

Denoising Quantization +3

Paper
Add Code

Rethinking Minimal Sufficient Representation in Contrastive Learning

1 code implementation • CVPR 2022 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

It significantly improves the performance of several classic contrastive learning models in downstream tasks.

Contrastive Learning Representation Learning

Paper
Code

Active Token Mixer

2 code implementations • 11 Mar 2022 • Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextual information distributed across different channels from other tokens into the given query token.

Ranked #64 on Object Detection on COCO minival

Image Classification Instance Segmentation +2

123

Paper
Code

Robust Nonparametric Distribution Forecast with Backtest-based Bootstrap and Adaptive Residual Selection

no code implementations • 16 Feb 2022 • Longshaokan Wang, Lingda Wang, Mina Georgieva, Paulo Machado, Abinaya Ulagappa, Safwan Ahmed, Yan Lu, Arjun Bakshi, Farhad Ghassemi

Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities.

Paper
Add Code

Mask-based Latent Reconstruction for Reinforcement Learning

1 code implementation • 28 Jan 2022 • Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

End-to-End Neural Speech Coding for Real-Time Communications

no code implementations • 24 Jan 2022 • Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC).

Decoder Packet Loss Concealment

Paper
Add Code

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

2 code implementations • CVPR 2020 • Jin Gao, Yan Lu, Xiaojuan Qi, Yutong Kou, Bing Li, Liang Li, Shan Yu, Weiming Hu

In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training.

Continual Learning One-Shot Learning +1

Paper
Code

Continuous Human Action Detection Based on Wearable Inertial Data

no code implementations • 11 Dec 2021 • Xia Gong, Yan Lu, Haoran Wei

Human action detection is a hot topic, which is widely used in video surveillance, human machine interface, healthcare monitoring, gaming, dancing training and musical instrument teaching.

Action Detection Gesture Recognition

Paper
Add Code

Reliable Propagation-Correction Modulation for Video Object Segmentation

1 code implementation • 6 Dec 2021 • Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu

We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively.

Ranked #3 on Video Object Segmentation on DAVIS 2017 (test-dev)

Object Semantic Segmentation +2

Paper
Code

Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation

no code implementations • 3 Dec 2021 • Xiang Li, Jinglu Wang, Xiao Li, Yan Lu

Based on this representation, we introduce a cropping-free temporal fusion approach to model the temporal consistency between video frames.

Image Segmentation Instance Segmentation +2

Paper
Add Code

Temporal Context Mining for Learned Video Compression

1 code implementation • 27 Nov 2021 • Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, Yan Lu

From the stored propagated features, we propose to learn multi-scale temporal contexts, and re-fill the learned temporal contexts into the modules of our compression scheme, including the contextual encoder-decoder, the frame generator, and the temporal context encoder.

Decoder MS-SSIM +2

336

Paper
Code

Video Instance Segmentation by Instance Flow Assembly

no code implementations • 20 Oct 2021 • Xiang Li, Jinglu Wang, Xiao Li, Yan Lu

Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes.

Instance Segmentation Object +3

Paper
Add Code

Deep Contextual Video Compression

1 code implementation • NeurIPS 2021 • Jiahao Li, Bin Li, Yan Lu

In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding.

Decoder Video Compression

Paper
Code

Cross-Stage Transformer for Video Learning

no code implementations • 29 Sep 2021 • Yuanze Lin, Xun Guo, Yan Lu

By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.

Action Recognition Temporal Action Localization

Paper
Add Code

What Makes for Good Representations for Contrastive Learning

no code implementations • 29 Sep 2021 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.

Contrastive Learning Representation Learning

Paper
Add Code

Self-Supervised Video Representation Learning with Meta-Contrastive Network

no code implementations • ICCV 2021 • Yuanze Lin, Xun Guo, Yan Lu

Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.

Ranked #28 on Self-Supervised Action Recognition on HMDB51

Contrastive Learning Meta-Learning +6

Paper
Add Code

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

1 code implementation • ICCV 2021 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang

In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.

Ranked #2 on 3D Object Detection From Monocular Images on Waymo Open Dataset

3D Object Detection From Monocular Images Depth Estimation +3

128

Paper
Code

SSAN: Separable Self-Attention Network for Video Representation Learning

no code implementations • CVPR 2021 • Xudong Guo, Xun Guo, Yan Lu

However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.

Action Recognition Representation Learning +3

Paper
Add Code

MonoGRNet: A General Framework for Monocular 3D Object Detection

no code implementations • 18 Apr 2021 • Zengyi Qin, Jinglu Wang, Yan Lu

Detecting and localizing objects in the real 3D space, which plays a crucial role in scene understanding, is particularly challenging given only a monocular image due to the geometric information loss during imagery projection.

Depth Estimation Monocular 3D Object Detection +4

Paper
Add Code

Phoneme-based Distribution Regularization for Speech Enhancement

no code implementations • 8 Apr 2021 • Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu

Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner.

Speech Enhancement

Paper
Add Code

Custom Object Detection via Multi-Camera Self-Supervised Learning

no code implementations • 5 Feb 2021 • Yan Lu, Yuanchao Shu

This paper proposes MCSSL, a self-supervised learning approach for building custom object detection models in multi-camera networks.

Object object-detection +2

Paper
Add Code

T-Net: Effective Permutation-Equivariant Network for Two-View Correspondence Learning

1 code implementation • ICCV 2021 • Zhen Zhong, Guobao Xiao, Linxin Zheng, Yan Lu, Jiayi Ma

We develop a conceptually simple, flexible, and effective framework (named T-Net) for two-view correspondence learning.

Vocal Bursts Valence Prediction

Paper
Code

Interactive Speech and Noise Modeling for Speech Enhancement

no code implementations • 17 Dec 2020 • Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu

In this paper, we propose a novel idea to model speech and noise simultaneously in a two-branch convolutional neural network, namely SN-Net.

Ranked #1 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge (SI-SDR-NB metric)

Speaker Separation Speech Enhancement

Paper
Add Code

Weakly Supervised 3D Object Detection from Point Clouds

1 code implementation • 28 Jul 2020 • Zengyi Qin, Jinglu Wang, Yan Lu

A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.

3D Object Detection Knowledge Distillation +4

101

Paper
Code

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

2 code implementations • 12 Jun 2020 • Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun

Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.

Ranked #6 on Weakly Supervised Action Localization on ActivityNet-1.2

Action Classification Multiple Instance Learning +4

123

Paper
Code

Scattering under Linear Non Self-Adjoint Operators: Case of in-Plane Elastic Waves

no code implementations • 6 Mar 2020 • Amir Ashkan Mokhtari, Yan Lu, Qiyuan Zhou, Alireza V. Amirkhizi, Ankit Srivastava

In this paper, we consider the problem of the scattering of in-plane waves at an interface between a homogeneous medium and a metamaterial.

Applied Physics

Paper
Add Code

Cross-modality Person re-identification with Shared-Specific Feature Transfer

no code implementations • CVPR 2020 • Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu

In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance.

Cross-Modality Person Re-identification Person Re-Identification

Paper
Add Code

Reinforcement learning for bandwidth estimation and congestion control in real-time communications

no code implementations • 4 Dec 2019 • Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke

Bandwidth estimation and congestion control for real-time communications (i. e., audio and video conferencing) remains a difficult problem, despite many years of research.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Triangulation Learning Network: from Monocular to Stereo 3D Object Detection

1 code implementation • CVPR 2019 • Zengyi Qin, Jinglu Wang, Yan Lu

In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information.

Ranked #12 on 3D Object Detection From Stereo Images on KITTI Cars Moderate

3D Object Detection From Stereo Images Object +1

108

Paper
Code

Relational Knowledge Distillation

3 code implementations • CVPR 2019 • Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller.

Knowledge Distillation Metric Learning

1,295

Paper
Code

See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification

4 code implementations • 26 Jan 2019 • Tao Hu, Honggang Qi, Qingming Huang, Yan Lu

Specifically, for each training image, we first generate attention maps to represent the object's discriminative parts by weakly supervised learning.

Ranked #12 on Fine-Grained Image Classification on CUB-200-2011

Data Augmentation Fine-Grained Image Classification +3

164

Paper
Code

Real-Time Anomaly Detection With HMOF Feature

no code implementations • 12 Dec 2018 • Huihui Zhu, Bin Liu, Guojun Yin, Yan Lu, Weihai Li, Nenghai Yu

Most existing methods are computation consuming, which cannot satisfy the real-time requirement.

Anomaly Detection Clustering +1

Paper
Add Code

Affinity Derivation and Graph Merge for Instance Segmentation

1 code implementation • ECCV 2018 • Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu

We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance.

Instance Segmentation Semantic Segmentation

Paper
Code

MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization

1 code implementation • 26 Nov 2018 • Zengyi Qin, Jinglu Wang, Yan Lu

We propose MonoGRNet for the amodal 3D object detection from a monocular RGB image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension.

Ranked #26 on Monocular 3D Object Detection on KITTI Cars Moderate

Depth Estimation Monocular 3D Object Detection +3

245

Paper
Code

MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single Image

no code implementations • 23 Nov 2018 • Jinglu Wang, Bo Sun, Yan Lu

In this paper, we address the problem of reconstructing an object's surface from a single image using generative networks.

3D Object Reconstruction From A Single Image Decoder +1

Paper
Add Code

In Defense of the Classification Loss for Person Re-Identification

1 code implementation • 16 Sep 2018 • Yao Zhai, Xun Guo, Yan Lu, Houqiang Li

The recent research for person re-identification has been focused on two trends.

Classification General Classification +2

Paper
Code

Weakly Supervised Bilinear Attention Network for Fine-Grained Visual Classification

no code implementations • 6 Aug 2018 • Tao Hu, Jizheng Xu, Cong Huang, Honggang Qi, Qingming Huang, Yan Lu

Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps.

Classification Fine-Grained Image Classification +1

Paper
Add Code

Local Descriptors Optimized for Average Precision

no code implementations • CVPR 2018 • Kun He, Yan Lu, Stan Sclaroff

In this paper, we improve the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows descriptor extraction in local feature based pipelines, and can be formulated as nearest neighbor retrieval.

Learning-To-Rank Retrieval

Paper
Add Code

Feature Selective Networks for Object Detection

no code implementations • CVPR 2018 • Yao Zhai, Jingjing Fu, Yan Lu, Houqiang Li

The RoI-based sub-region attention map and aspect ratio attention map are selectively pooled from the banks, and then used to refine the original RoI features for RoI classification.

Object object-detection +2

Paper
Add Code

Robust RGB-D Odometry Using Point and Line Features

no code implementations • ICCV 2015 • Yan Lu, Dezhen Song

To meet the challenges, we fuse point and line features to form a robust odometry algorithm.

Visual Odometry

Paper
Add Code

Content adaptive screen image scaling

no code implementations • 21 Oct 2015 • Yao Zhai, Qifei Wang, Yan Lu, Shipeng Li

This paper proposes an efficient content adaptive screen image scaling scheme for the real-time screen applications like remote desktop and screen sharing.

General Classification

Paper
Add Code

Human Activity Recognition using Smartphone

no code implementations • 30 Jan 2014 • Amin Rasekh, Chien-An Chen, Yan Lu

In this project, we design a robust activity recognition system based on a smartphone.

Active Learning Dimensionality Reduction +3

Paper
Add Code

Entends-tu mes attitudes ? Perception de la prosodie des affects sociaux en chinois Mandarin (Do you hear my attitudes? Perception of Mandarin Chinese social affects' prosody) [in French]

no code implementations • JEPTALNRECITAL 2012 • Yan Lu, V{\'e}ronique Auberg{\'e}, Albert Rilliard

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.