Search Results for author: Yan Lu

Found 100 papers, 33 papers with code

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

no code implementations19 Mar 2024 Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo

The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved.

Language Modelling

Neural Video Compression with Feature Modulation

1 code implementation27 Feb 2024 Jiahao Li, Bin Li, Yan Lu

This results in a better learning of the quantization scaler and helps our NVC support about 11. 4 dB PSNR range.

Blocking Quantization +1

Slot-VLM: SlowFast Slots for Video-Language Modeling

no code implementations20 Feb 2024 Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs.

Language Modelling Object +3

Retrieval-based Video Language Model for Efficient Long Video Question Answering

no code implementations8 Dec 2023 Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

To address these issues, we introduce a simple yet effective retrieval-based video language model (R-VLM) for efficient and interpretable long video QA.

Language Modelling Natural Language Understanding +4

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation24 Oct 2023 Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

no code implementations7 Oct 2023 Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu

In this work, we build a multimodal model to ground natural language instructions in given UI screenshots as a generic UI task automation executor.

document understanding Reinforcement Learning (RL)

Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

3 code implementations29 Sep 2023 Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.

Quantization

Breaking through the learning plateaus of in-context learning in Transformer

no code implementations12 Sep 2023 Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

To study the mechanism behind the learning plateaus, we conceptually seperate a component within the model's internal representation that is exclusively affected by the model's weights.

In-Context Learning Representation Learning

Efficient View Synthesis with Neural Radiance Distribution Field

no code implementations ICCV 2023 Yushuang Wu, Xiao Li, Jinglu Wang, Xiaoguang Han, Shuguang Cui, Yan Lu

Specifically, we use a small network similar to NeRF while preserving the rendering speed with a single network forwarding per pixel as in NeLF.

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation ICCV 2023 Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

no code implementations19 Jun 2023 Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

no code implementations16 Jun 2023 Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu

Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space.

3D Human Pose Estimation

Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators

no code implementations2 Jun 2023 Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu

In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e. g., the protection of users' privacy).

Prompt Engineering

ReSup: Reliable Label Noise Suppression for Facial Expression Recognition

1 code implementation29 May 2023 Xiang Zhang, Yan Lu, Huan Yan, Jingyang Huang, Yusheng Ji, Yu Gu

To further enhance the reliability of our noise decision results, ReSup uses two networks to jointly achieve noise suppression.

Facial Expression Recognition Facial Expression Recognition (FER)

Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization

no code implementations29 May 2023 Tao Yang, Yuwang Wang, Cuiling Lan, Yan Lu, Nanning Zheng

In this paper, we study several typical disentangled representation learning works in terms of both disentanglement and compositional generalization abilities, and we provide an important insight: vector-based representation (using a vector instead of a scalar to represent a concept) is the key to empower both good disentanglement and strong compositional generalization.

Disentanglement

ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

no code implementations26 May 2023 Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu

To begin with, we are among the first to comprehensively investigate mainstream KD techniques on DNS models to resolve the two challenges.

Knowledge Distillation

Mask-Based Modeling for Neural Radiance Fields

1 code implementation11 Apr 2023 Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, Dong Liu

Most Neural Radiance Fields (NeRFs) exhibit limited generalization capabilities, which restrict their applicability in representing multiple scenes using a single model.

Representation Learning

Two-shot Video Object Segmentation

1 code implementation CVPR 2023 Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu

The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.

Object Pseudo Label +5

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

no code implementations CVPR 2023 Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu

The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.

3D Reconstruction

Unifying Layout Generation with a Decoupled Diffusion Model

no code implementations CVPR 2023 Mude Hui, Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yuwang Wang, Yan Lu

Since different attributes have their individual semantics and characteristics, we propose to decouple the diffusion processes for them to improve the diversity of training samples and learn the reverse process jointly to exploit global-scope contexts for facilitating generation.

Neural Video Compression with Diverse Contexts

2 code implementations CVPR 2023 Jiahao Li, Bin Li, Yan Lu

Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR.

Optical Flow Estimation Video Compression

Real-time speech enhancement with dynamic attention span

no code implementations21 Feb 2023 Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge.

Acoustic echo cancellation Speech Enhancement

EVC: Towards Real-Time Neural Image Compression with Mask Decay

1 code implementation10 Feb 2023 Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu

Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder.

Image Compression Representation Learning

Versatile Neural Processes for Learning Implicit Neural Representations

1 code implementation21 Jan 2023 Zongyu Guo, Cuiling Lan, Zhizheng Zhang, Yan Lu, Zhibo Chen

In this paper, we propose an efficient NP framework dubbed Versatile Neural Processes (VNP), which largely increases the capability of approximating functions.

Motion Information Propagation for Neural Video Compression

no code implementations CVPR 2023 Linfeng Qi, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

Meanwhile, besides assisting frame coding at the current time step, the feature from context generation will be propagated as motion condition when coding the subsequent motion latent.

Video Compression

VideoTrack: Learning To Track Objects via Video Transformer

no code implementations CVPR 2023 Fei Xie, Lei Chu, Jiahao Li, Yan Lu, Chao Ma

Existing Siamese tracking methods, which are built on pair-wise matching between two single frames, heavily rely on additional sophisticated mechanism to exploit temporal information among successive video frames, hindering them from high efficiency and industrial deployments.

Visual Tracking

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Disentangled Feature Learning for Real-Time Neural Speech Coding

no code implementations22 Nov 2022 Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs.

Disentanglement Voice Conversion

Estimating Neural Reflectance Field from Radiance Field using Tree Structures

no code implementations9 Oct 2022 Xiu Li, Xiao Li, Yan Lu

A high-quality NeRF decomposition relies on good geometry information extraction as well as good prior terms to properly resolve ambiguities between different components.

A Comprehensive Review of Digital Twin -- Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives

no code implementations27 Aug 2022 Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu

This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions.

Uncertainty Quantification

A Comprehensive Review of Digital Twin -- Part 1: Modeling and Twinning Enabling Technologies

no code implementations26 Aug 2022 Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu

In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.

Uncertainty Quantification

Neural Capture of Animatable 3D Human from Monocular Video

no code implementations18 Aug 2022 Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu

We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.

Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification

no code implementations1 Aug 2022 Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu

But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.

counterfactual Person Re-Identification

Latent-Domain Predictive Neural Speech Coding

no code implementations18 Jul 2022 Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods.

Quantization

Neighbor Correspondence Matching for Flow-based Video Frame Synthesis

no code implementations14 Jul 2022 Zhaoyang Jia, Yan Lu, Houqiang Li

Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.

Video Compression Video Frame Interpolation

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

1 code implementation13 Jul 2022 Jiahao Li, Bin Li, Yan Lu

Besides estimating the probability distribution, our entropy model also generates the quantization step at spatial-channel-wise.

Quantization Video Compression

Online Video Instance Segmentation via Robust Context Fusion

no code implementations12 Jul 2022 Xiang Li, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, Yan Lu

We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.

Instance Segmentation Segmentation +2

Cross-Scale Vector Quantization for Scalable Neural Speech Coding

no code implementations7 Jul 2022 Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement.

Quantization

Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation

no code implementations4 Jul 2022 Xiaoyu Wang, Xiangyu Kong, Xiulian Peng, Yan Lu

In this paper we propose a multi-modal multi-correlation learning framework targeting at the task of audio-visual speech separation.

Contrastive Learning Speech Separation

Towards Robust Video Object Segmentation with Adaptive Object Calibration

1 code implementation2 Jul 2022 Xiaohao Xu, Jinglu Wang, Xiang Ming, Yan Lu

We consolidate this conditional mask calibration process in a progressive manner, where the object representations and proto-masks evolve to be discriminative iteratively.

Object Segmentation +5

Test-time Batch Normalization

no code implementations20 May 2022 Tao Yang, Shenglong Zhou, Yuwang Wang, Yan Lu, Nanning Zheng

Deep neural networks often suffer the data distribution shift between training and testing, and the batch statistics are observed to reflect the shift.

Domain Generalization

Visual Concepts Tokenization

2 code implementations20 May 2022 Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts.

Representation Learning

Deep Frequency Filtering for Domain Generalization

no code implementations CVPR 2023 Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.

Domain Generalization Retrieval

Semantic-aligned Fusion Transformer for One-shot Object Detection

no code implementations CVPR 2022 Yizhou Zhao, Xun Guo, Yan Lu

One-shot object detection aims at detecting novel objects according to merely one given instance.

Attribute Object +2

Neural Compression-Based Feature Learning for Video Restoration

no code implementations CVPR 2022 Cong Huang, Jiahao Li, Bin Li, Dong Liu, Yan Lu

The temporal features usually contain various noisy and uncorrelated information, and they may interfere with the restoration of the current frame.

Denoising Quantization +3

Active Token Mixer

2 code implementations11 Mar 2022 Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextual information distributed across different channels from other tokens into the given query token.

Image Classification Instance Segmentation +2

Robust Nonparametric Distribution Forecast with Backtest-based Bootstrap and Adaptive Residual Selection

no code implementations16 Feb 2022 Longshaokan Wang, Lingda Wang, Mina Georgieva, Paulo Machado, Abinaya Ulagappa, Safwan Ahmed, Yan Lu, Arjun Bakshi, Farhad Ghassemi

Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities.

Mask-based Latent Reconstruction for Reinforcement Learning

1 code implementation28 Jan 2022 Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance.

reinforcement-learning Reinforcement Learning (RL) +1

End-to-End Neural Speech Coding for Real-Time Communications

no code implementations24 Jan 2022 Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC).

Packet Loss Concealment

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

2 code implementations CVPR 2020 Jin Gao, Yan Lu, Xiaojuan Qi, Yutong Kou, Bing Li, Liang Li, Shan Yu, Weiming Hu

In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training.

Continual Learning One-Shot Learning +1

Continuous Human Action Detection Based on Wearable Inertial Data

no code implementations11 Dec 2021 Xia Gong, Yan Lu, Haoran Wei

Human action detection is a hot topic, which is widely used in video surveillance, human machine interface, healthcare monitoring, gaming, dancing training and musical instrument teaching.

Action Detection Gesture Recognition

Reliable Propagation-Correction Modulation for Video Object Segmentation

1 code implementation6 Dec 2021 Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu

We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively.

Object Semantic Segmentation +2

Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation

no code implementations3 Dec 2021 Xiang Li, Jinglu Wang, Xiao Li, Yan Lu

Based on this representation, we introduce a cropping-free temporal fusion approach to model the temporal consistency between video frames.

Image Segmentation Instance Segmentation +2

Temporal Context Mining for Learned Video Compression

1 code implementation27 Nov 2021 Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, Yan Lu

From the stored propagated features, we propose to learn multi-scale temporal contexts, and re-fill the learned temporal contexts into the modules of our compression scheme, including the contextual encoder-decoder, the frame generator, and the temporal context encoder.

MS-SSIM SSIM +1

Video Instance Segmentation by Instance Flow Assembly

no code implementations20 Oct 2021 Xiang Li, Jinglu Wang, Xiao Li, Yan Lu

Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes.

Instance Segmentation Object +3

Deep Contextual Video Compression

1 code implementation NeurIPS 2021 Jiahao Li, Bin Li, Yan Lu

In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding.

Video Compression

Cross-Stage Transformer for Video Learning

no code implementations29 Sep 2021 Yuanze Lin, Xun Guo, Yan Lu

By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.

Action Recognition Temporal Action Localization

What Makes for Good Representations for Contrastive Learning

no code implementations29 Sep 2021 Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.

Contrastive Learning Representation Learning

Uncertainty-Aware Deep Video Compression with Ensembles

no code implementations29 Sep 2021 Wufei Ma, Jiahao Li, Bin Li, Yan Lu

Deep learning-based video compression is a challenging task and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error.

Video Compression

Self-Supervised Video Representation Learning with Meta-Contrastive Network

no code implementations ICCV 2021 Yuanze Lin, Xun Guo, Yan Lu

Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.

Contrastive Learning Meta-Learning +6

SSAN: Separable Self-Attention Network for Video Representation Learning

no code implementations CVPR 2021 Xudong Guo, Xun Guo, Yan Lu

However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.

Action Recognition Representation Learning +3

MonoGRNet: A General Framework for Monocular 3D Object Detection

no code implementations18 Apr 2021 Zengyi Qin, Jinglu Wang, Yan Lu

Detecting and localizing objects in the real 3D space, which plays a crucial role in scene understanding, is particularly challenging given only a monocular image due to the geometric information loss during imagery projection.

Depth Estimation Monocular 3D Object Detection +4

Phoneme-based Distribution Regularization for Speech Enhancement

no code implementations8 Apr 2021 Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu

Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner.

Speech Enhancement

Custom Object Detection via Multi-Camera Self-Supervised Learning

no code implementations5 Feb 2021 Yan Lu, Yuanchao Shu

This paper proposes MCSSL, a self-supervised learning approach for building custom object detection models in multi-camera networks.

Object object-detection +2

Interactive Speech and Noise Modeling for Speech Enhancement

no code implementations17 Dec 2020 Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu

In this paper, we propose a novel idea to model speech and noise simultaneously in a two-branch convolutional neural network, namely SN-Net.

Speaker Separation Speech Enhancement

Weakly Supervised 3D Object Detection from Point Clouds

1 code implementation28 Jul 2020 Zengyi Qin, Jinglu Wang, Yan Lu

A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.

3D Object Detection Knowledge Distillation +4

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

2 code implementations12 Jun 2020 Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun

Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.

Action Classification Multiple Instance Learning +4

Scattering under Linear Non Self-Adjoint Operators: Case of in-Plane Elastic Waves

no code implementations6 Mar 2020 Amir Ashkan Mokhtari, Yan Lu, Qiyuan Zhou, Alireza V. Amirkhizi, Ankit Srivastava

In this paper, we consider the problem of the scattering of in-plane waves at an interface between a homogeneous medium and a metamaterial.

Applied Physics

Cross-modality Person re-identification with Shared-Specific Feature Transfer

no code implementations CVPR 2020 Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu

In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance.

Cross-Modality Person Re-identification Person Re-Identification

Triangulation Learning Network: from Monocular to Stereo 3D Object Detection

1 code implementation CVPR 2019 Zengyi Qin, Jinglu Wang, Yan Lu

In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information.

3D Object Detection From Stereo Images Object +1

Relational Knowledge Distillation

3 code implementations CVPR 2019 Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller.

Knowledge Distillation Metric Learning

Real-Time Anomaly Detection With HMOF Feature

no code implementations12 Dec 2018 Huihui Zhu, Bin Liu, Guojun Yin, Yan Lu, Weihai Li, Nenghai Yu

Most existing methods are computation consuming, which cannot satisfy the real-time requirement.

Anomaly Detection Clustering +1

Affinity Derivation and Graph Merge for Instance Segmentation

1 code implementation ECCV 2018 Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu

We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance.

Instance Segmentation Semantic Segmentation

MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization

1 code implementation26 Nov 2018 Zengyi Qin, Jinglu Wang, Yan Lu

We propose MonoGRNet for the amodal 3D object detection from a monocular RGB image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension.

Depth Estimation Monocular 3D Object Detection +3

MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single Image

no code implementations23 Nov 2018 Jinglu Wang, Bo Sun, Yan Lu

In this paper, we address the problem of reconstructing an object's surface from a single image using generative networks.

3D Object Reconstruction From A Single Image regression

Weakly Supervised Bilinear Attention Network for Fine-Grained Visual Classification

no code implementations6 Aug 2018 Tao Hu, Jizheng Xu, Cong Huang, Honggang Qi, Qingming Huang, Yan Lu

Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps.

Classification Fine-Grained Image Classification +1

Local Descriptors Optimized for Average Precision

no code implementations CVPR 2018 Kun He, Yan Lu, Stan Sclaroff

In this paper, we improve the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows descriptor extraction in local feature based pipelines, and can be formulated as nearest neighbor retrieval.

Learning-To-Rank Retrieval

Feature Selective Networks for Object Detection

no code implementations CVPR 2018 Yao Zhai, Jingjing Fu, Yan Lu, Houqiang Li

The RoI-based sub-region attention map and aspect ratio attention map are selectively pooled from the banks, and then used to refine the original RoI features for RoI classification.

Object object-detection +2

Robust RGB-D Odometry Using Point and Line Features

no code implementations ICCV 2015 Yan Lu, Dezhen Song

To meet the challenges, we fuse point and line features to form a robust odometry algorithm.

Visual Odometry

Content adaptive screen image scaling

no code implementations21 Oct 2015 Yao Zhai, Qifei Wang, Yan Lu, Shipeng Li

This paper proposes an efficient content adaptive screen image scaling scheme for the real-time screen applications like remote desktop and screen sharing.

General Classification

Human Activity Recognition using Smartphone

no code implementations30 Jan 2014 Amin Rasekh, Chien-An Chen, Yan Lu

In this project, we design a robust activity recognition system based on a smartphone.

Active Learning Dimensionality Reduction +3

Cannot find the paper you are looking for? You can Submit a new open access paper.