no code implementations • 15 Apr 2025 • Xinyi Liu, Xiaoyi Zhang, Ziyun Zhang, Yan Lu
In this vision-based paradigm, the GUI instruction grounding, which maps user instruction to the location of corresponding element on the given screenshot, remains a critical challenge, particularly due to limited public training dataset and resource-intensive manual instruction data annotation.
no code implementations • 14 Apr 2025 • Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen
Building on PAR, we propose PALLE, a two-stage TTS system that leverages PAR for initial generation followed by NAR refinement.
no code implementations • 24 Mar 2025 • Lijiang Li, Jinglu Wang, Xiang Ming, Yan Lu
In the Generative AI era, safeguarding 3D models has become increasingly urgent.
no code implementations • 15 Mar 2025 • Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu
Current large speech language models are mainly based on semantic tokens from discretization of self-supervised learned representations and acoustic tokens from a neural codec, following a semantic-modeling and acoustic-synthesis paradigm.
no code implementations • 8 Mar 2025 • Yang Li, Jinglu Wang, Lei Chu, Xiao Li, Shiu-hong Kao, Ying-Cong Chen, Yan Lu
The advent of 3D Gaussian Splatting (3DGS) has advanced 3D scene reconstruction and novel view synthesis.
no code implementations • 3 Mar 2025 • Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu
DLF decomposes the latent into semantic and detail elements, compressing them through two distinct branches.
1 code implementation • 28 Feb 2025 • Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu
In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls.
no code implementations • 16 Feb 2025 • Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin
To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching.
1 code implementation • 15 Feb 2025 • Jinouwen Zhang, Junjie Ren, Aobo Yang, Yan Lu, Lu Chen, Hairun Xie, Jing Wang, Miao Zhang, Wanli Ouyang, Shixiang Tang
Aircraft manufacturing is the jewel in the crown of industry, among which generating high-fidelity airfoil geometries with controllable and editable representations remains a fundamental challenge.
no code implementations • 10 Feb 2025 • Jiaqi Xu, Cuiling Lan, Xuejin Chen, Yan Lu
Particularly, we found that there are neurons responsible for only visual or text information, or both, respectively, which we refer to them as visual neurons, text neurons, and multi-modal neurons, respectively.
no code implementations • 16 Jan 2025 • Shiu-hong Kao, Xiao Li, Jinglu Wang, Chi-Keung Tang, Yu-Wing Tai, Yan Lu
Training 3D reconstruction models with 2D visual data traditionally requires prior knowledge of camera poses for the training samples, a process that is both time-consuming and prone to errors.
1 code implementation • 22 Dec 2024 • Xingrui Wang, Cuiling Lan, Hanxin Zhu, Zhibo Chen, Yan Lu
To ensure effective and reliable learning of semantic features in 3D space, we employ a dual-feature approach that leverages both region-specific and context-aware semantic features as supervision in the 2D space.
no code implementations • 20 Dec 2024 • Yifan Yang, Ziyang Ma, Shujie Liu, Jinyu Li, Hui Wang, Lingwei Meng, Haiyang Sun, Yuzhe Liang, Ruiyang Xu, Yuxuan Hu, Yan Lu, Rui Zhao, Xie Chen
This paper introduces Interleaved Speech-Text Language Model (IST-LM) for streaming zero-shot Text-to-Speech (TTS).
1 code implementation • 20 Dec 2024 • Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen
Recent advancements highlight the potential of end-to-end real-time spoken dialogue systems, showcasing their low latency and high quality.
1 code implementation • 14 Dec 2024 • Kaichen Xu, Qilong Wu, Yan Lu, Yinan Zheng, Wenlin Li, Xingjie Tang, Jun Wang, Xiaobo Sun
Remarkably, MEATRD also proves adept at discerning ATRs that only show slight visual deviations from normal tissues.
no code implementations • 14 Oct 2024 • Shikun Feng, Yuyan Ni, Yan Lu, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan
Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain that effectively addresses both molecular generation and property prediction tasks.
no code implementations • 19 Sep 2024 • Jiarui Xie, Zhuo Yang, Chun-Chun Hu, Haw-Ching Yang, Yan Lu, Yaoyao Fiona Zhao
Compared with the model trained only using the source dataset, this pipeline increased the melt pool anomaly detection accuracy by 31% without any labeled training data from the target dataset.
no code implementations • 20 Aug 2024 • Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He
The DeepDRiD dataset was used to externally assess the contribution of generated UWF-FA images to DR classification, using area under the receiver operating characteristic curve (AUROC) as outcome metrics.
no code implementations • 19 Aug 2024 • Zhijun Jia, Huaying Xue, Xiulian Peng, Yan Lu
We propose a two-stage generative framework "convert-and-speak" in which the conversion is only operated on the semantic token level and the speech is synthesized conditioned on the converted semantic token with a speech generative model in target accent domain.
no code implementations • 20 May 2024 • Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng
Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement.
no code implementations • CVPR 2024 • Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace.
no code implementations • 28 Mar 2024 • Wufei Ma, Jiahao Li, Bin Li, Yan Lu
Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error.
no code implementations • 19 Mar 2024 • Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo
The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved.
1 code implementation • CVPR 2024 • Jiahao Li, Bin Li, Yan Lu
This results in a better learning of the quantization scaler and helps our NVC support about 11. 4 dB PSNR range.
no code implementations • 20 Feb 2024 • Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu
A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs.
no code implementations • 15 Feb 2024 • Tao Yang, Cuiling Lan, Yan Lu, Nanning Zheng
Disentangled representation learning strives to extract the intrinsic factors within observed data.
no code implementations • CVPR 2024 • Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu
To address this issue we introduce a Generative Latent Coding (GLC) architecture which performs transform coding in the latent space of a generative vector-quantized variational auto-encoder (VQ-VAE) instead of in the pixel space.
no code implementations • CVPR 2024 • Xin Kang, Lei Chu, Jiahao Li, Xuejin Chen, Yan Lu
Recent methods for label-free 3D semantic segmentation aim to assist 3D model training by leveraging the open-world recognition ability of pre-trained vision language models.
1 code implementation • CVPR 2024 • Yue Gao, Jiahao Li, Lei Chu, Yan Lu
Recent advancements in video modeling extensively rely on optical flow to represent the relationships across frames but this approach often lacks efficiency and fails to model the probability of the intrinsic motion of objects.
no code implementations • 8 Dec 2023 • Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu
To address these issues, we introduce a simple yet effective retrieval-based video language model (R-VLM) for efficient and interpretable long video QA.
1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
no code implementations • 7 Oct 2023 • Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu
In this work, we build a multimodal model to ground natural language instructions in given UI screenshots as a generic UI task automation executor.
3 code implementations • CVPR 2024 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj
We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.
no code implementations • 12 Sep 2023 • Mutahar Safdar, Jiarui Xie, Hyunwoong Ko, Yan Lu, Guy Lamouche, Yaoyao Fiona Zhao
The framework consists of pre-transfer, transfer, and post-transfer steps to accomplish knowledge transfer.
no code implementations • 12 Sep 2023 • Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng
To study the mechanism behind the learning plateaus, we conceptually seperate a component within the model's internal representation that is exclusively affected by the model's weights.
no code implementations • ICCV 2023 • Yushuang Wu, Xiao Li, Jinglu Wang, Xiaoguang Han, Shuguang Cui, Yan Lu
Specifically, we use a small network similar to NeRF while preserving the rendering speed with a single network forwarding per pixel as in NeLF.
1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.
1 code implementation • CVPR 2024 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
Multiple-choice
Video-based Generative Performance Benchmarking (Consistency)
+11
2 code implementations • ICCV 2023 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo
With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer.
no code implementations • 19 Jun 2023 • Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang
Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.
no code implementations • 16 Jun 2023 • Yaqi Zhang, Yan Lu, Bin Liu, Zhiwei Zhao, Qi Chu, Nenghai Yu
Transformer is popular in recent 3D human pose estimation, which utilizes long-term modeling to lift 2D keypoints into the 3D space.
Ranked #100 on
3D Human Pose Estimation
on Human3.6M
no code implementations • 2 Jun 2023 • Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu
In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e. g., the protection of users' privacy).
no code implementations • 29 May 2023 • Tao Yang, Yuwang Wang, Cuiling Lan, Yan Lu, Nanning Zheng
In this paper, we study several typical disentangled representation learning works in terms of both disentanglement and compositional generalization abilities, and we provide an important insight: vector-based representation (using a vector instead of a scalar to represent a concept) is the key to empower both good disentanglement and strong compositional generalization.
1 code implementation • 29 May 2023 • Xiang Zhang, Yan Lu, Huan Yan, Jingyang Huang, Yusheng Ji, Yu Gu
To further enhance the reliability of our noise decision results, ReSup uses two networks to jointly achieve noise suppression.
Facial Expression Recognition
Facial Expression Recognition (FER)
no code implementations • 26 May 2023 • Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu
To begin with, we are among the first to comprehensively investigate mainstream KD techniques on DNS models to resolve the two challenges.
no code implementations • 10 May 2023 • Xulin Li, Yan Lu, Bin Liu, Yuenan Hou, Yating Liu, Qi Chu, Wanli Ouyang, Nenghai Yu
Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID).
no code implementations • CVPR 2023 • Yue Gao, Yuan Zhou, Jinglu Wang, Xiao Li, Xiang Ming, Yan Lu
Our method leverages both self-supervised learned landmarks and 3D face model-based landmarks to model the motion.
1 code implementation • 11 Apr 2023 • Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, Dong Liu
Most Neural Radiance Fields (NeRFs) exhibit limited generalization capabilities, which restrict their applicability in representing multiple scenes using a single model.
1 code implementation • CVPR 2023 • Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu
The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.
no code implementations • CVPR 2023 • Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu
The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.
no code implementations • CVPR 2023 • Mude Hui, Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yuwang Wang, Yan Lu
Since different attributes have their individual semantics and characteristics, we propose to decouple the diffusion processes for them to improve the diversity of training samples and learn the reverse process jointly to exploit global-scope contexts for facilitating generation.
2 code implementations • CVPR 2023 • Jiahao Li, Bin Li, Yan Lu
Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR.
no code implementations • 25 Feb 2023 • Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu
Time-variant factors often occur in real-world full-duplex communication applications.
no code implementations • 21 Feb 2023 • Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu
For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge.
1 code implementation • 10 Feb 2023 • Guo-Hua Wang, Jiahao Li, Bin Li, Yan Lu
Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder.
1 code implementation • 21 Jan 2023 • Zongyu Guo, Cuiling Lan, Zhizheng Zhang, Yan Lu, Zhibo Chen
In this paper, we propose an efficient NP framework dubbed Versatile Neural Processes (VNP), which largely increases the capability of approximating functions.
no code implementations • CVPR 2023 • Linfeng Qi, Jiahao Li, Bin Li, Houqiang Li, Yan Lu
Meanwhile, besides assisting frame coding at the current time step, the feature from context generation will be propagated as motion condition when coding the subsequent motion latent.
no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang
In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.
no code implementations • CVPR 2023 • Fei Xie, Lei Chu, Jiahao Li, Yan Lu, Chao Ma
Existing Siamese tracking methods, which are built on pair-wise matching between two single frames, heavily rely on additional sophisticated mechanism to exploit temporal information among successive video frames, hindering them from high efficiency and industrial deployments.
no code implementations • ICCV 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu
Our model achieves state-of-the-art performance on R-VOS benchmarks, Ref-DAVIS17 and Ref-Youtube-VOS, and also our RRYTVOS dataset.
no code implementations • 22 Nov 2022 • Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu
Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs.
no code implementations • 9 Oct 2022 • Xiu Li, Xiao Li, Yan Lu
A high-quality NeRF decomposition relies on good geometry information extraction as well as good prior terms to properly resolve ambiguities between different components.
no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu
Temporal modeling is crucial for various video learning tasks.
no code implementations • 27 Aug 2022 • Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu
This second paper presents a literature review of key enabling technologies of digital twins, with an emphasis on uncertainty quantification, optimization methods, open source datasets and tools, major findings, challenges, and future directions.
no code implementations • 26 Aug 2022 • Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu
In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.
no code implementations • 18 Aug 2022 • Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
no code implementations • 1 Aug 2022 • Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu
But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.
no code implementations • 18 Jul 2022 • Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu
Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods.
no code implementations • 14 Jul 2022 • Zhaoyang Jia, Yan Lu, Houqiang Li
Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
Ranked #4 on
Video Frame Interpolation
on X4K1000FPS
1 code implementation • 13 Jul 2022 • Jiahao Li, Bin Li, Yan Lu
Besides estimating the probability distribution, our entropy model also generates the quantization step at spatial-channel-wise.
no code implementations • 12 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, Yan Lu
We propose a robust context fusion network to tackle VIS in an online fashion, which predicts instance segmentation frame-by-frame with a few preceding frames.
no code implementations • 7 Jul 2022 • Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu
In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement.
1 code implementation • 4 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu
Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression.
Ranked #15 on
Referring Video Object Segmentation
on Refer-YouTube-VOS
Referring Expression Segmentation
Referring Video Object Segmentation
+2
no code implementations • 4 Jul 2022 • Xiaoyu Wang, Xiangyu Kong, Xiulian Peng, Yan Lu
In this paper we propose a multi-modal multi-correlation learning framework targeting at the task of audio-visual speech separation.
1 code implementation • 2 Jul 2022 • Xiaohao Xu, Jinglu Wang, Xiang Ming, Yan Lu
We consolidate this conditional mask calibration process in a progressive manner, where the object representations and proto-masks evolve to be discriminative iteratively.
Ranked #1 on
Visual Object Tracking
on YouTube-VOS
no code implementations • 29 Jun 2022 • Yan Lu, Shiqi Jiang, Ting Cao, Yuanchao Shu
Edge computing is being widely used for video analytics.
2 code implementations • 20 May 2022 • Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng
We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts.
no code implementations • 20 May 2022 • Tao Yang, Shenglong Zhou, Yuwang Wang, Yan Lu, Nanning Zheng
Deep neural networks often suffer the data distribution shift between training and testing, and the batch statistics are observed to reflect the shift.
no code implementations • CVPR 2022 • Nenglun Chen, Lei Chu, Hao Pan, Yan Lu, Wenping Wang
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen
Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.
no code implementations • CVPR 2022 • Yizhou Zhao, Xun Guo, Yan Lu
One-shot object detection aims at detecting novel objects according to merely one given instance.
no code implementations • CVPR 2022 • Cong Huang, Jiahao Li, Bin Li, Dong Liu, Yan Lu
The temporal features usually contain various noisy and uncorrelated information, and they may interfere with the restoration of the current frame.
1 code implementation • CVPR 2022 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu
It significantly improves the performance of several classic contrastive learning models in downstream tasks.
2 code implementations • 11 Mar 2022 • Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen
In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextual information distributed across different channels from other tokens into the given query token.
Ranked #65 on
Object Detection
on COCO minival
no code implementations • 16 Feb 2022 • Longshaokan Wang, Lingda Wang, Mina Georgieva, Paulo Machado, Abinaya Ulagappa, Safwan Ahmed, Yan Lu, Arjun Bakshi, Farhad Ghassemi
Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities.
1 code implementation • 28 Jan 2022 • Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen
For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance.
no code implementations • 24 Jan 2022 • Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu
Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC).
2 code implementations • CVPR 2020 • Jin Gao, Yan Lu, Xiaojuan Qi, Yutong Kou, Bing Li, Liang Li, Shan Yu, Weiming Hu
In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training.
no code implementations • 11 Dec 2021 • Xia Gong, Yan Lu, Haoran Wei
Human action detection is a hot topic, which is widely used in video surveillance, human machine interface, healthcare monitoring, gaming, dancing training and musical instrument teaching.
1 code implementation • 6 Dec 2021 • Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu
We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively.
Ranked #3 on
Video Object Segmentation
on DAVIS 2017 (test-dev)
no code implementations • 3 Dec 2021 • Xiang Li, Jinglu Wang, Xiao Li, Yan Lu
Based on this representation, we introduce a cropping-free temporal fusion approach to model the temporal consistency between video frames.
1 code implementation • 27 Nov 2021 • Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, Yan Lu
From the stored propagated features, we propose to learn multi-scale temporal contexts, and re-fill the learned temporal contexts into the modules of our compression scheme, including the contextual encoder-decoder, the frame generator, and the temporal context encoder.
no code implementations • 20 Oct 2021 • Xiang Li, Jinglu Wang, Xiao Li, Yan Lu
Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes.
2 code implementations • NeurIPS 2021 • Jiahao Li, Bin Li, Yan Lu
In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding.
no code implementations • 29 Sep 2021 • Yuanze Lin, Xun Guo, Yan Lu
By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.
no code implementations • 29 Sep 2021 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu
Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.
no code implementations • ICCV 2021 • Yuanze Lin, Xun Guo, Yan Lu
Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.
Ranked #28 on
Self-Supervised Action Recognition
on HMDB51
1 code implementation • ICCV 2021 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang
In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
3D Object Detection From Monocular Images
Depth Estimation
+3
no code implementations • CVPR 2021 • Xudong Guo, Xun Guo, Yan Lu
However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.
no code implementations • 18 Apr 2021 • Zengyi Qin, Jinglu Wang, Yan Lu
Detecting and localizing objects in the real 3D space, which plays a crucial role in scene understanding, is particularly challenging given only a monocular image due to the geometric information loss during imagery projection.
no code implementations • 8 Apr 2021 • Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu
Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner.
no code implementations • 5 Feb 2021 • Yan Lu, Yuanchao Shu
This paper proposes MCSSL, a self-supervised learning approach for building custom object detection models in multi-camera networks.
1 code implementation • ICCV 2021 • Zhen Zhong, Guobao Xiao, Linxin Zheng, Yan Lu, Jiayi Ma
We develop a conceptually simple, flexible, and effective framework (named T-Net) for two-view correspondence learning.
no code implementations • 17 Dec 2020 • Chengyu Zheng, Xiulian Peng, Yuan Zhang, Sriram Srinivasan, Yan Lu
In this paper, we propose a novel idea to model speech and noise simultaneously in a two-branch convolutional neural network, namely SN-Net.
Ranked #1 on
Speech Enhancement
on Deep Noise Suppression (DNS) Challenge
(SI-SDR-NB metric)
1 code implementation • 28 Jul 2020 • Zengyi Qin, Jinglu Wang, Yan Lu
A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
2 code implementations • 12 Jun 2020 • Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun
Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles.
Ranked #6 on
Weakly Supervised Action Localization
on THUMOS14
no code implementations • 6 Mar 2020 • Amir Ashkan Mokhtari, Yan Lu, Qiyuan Zhou, Alireza V. Amirkhizi, Ankit Srivastava
In this paper, we consider the problem of the scattering of in-plane waves at an interface between a homogeneous medium and a metamaterial.
Applied Physics
no code implementations • CVPR 2020 • Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu
In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the re-identification performance.
Cross-Modality Person Re-identification
Person Re-Identification
no code implementations • 4 Dec 2019 • Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke
Bandwidth estimation and congestion control for real-time communications (i. e., audio and video conferencing) remains a difficult problem, despite many years of research.
1 code implementation • CVPR 2019 • Zengyi Qin, Jinglu Wang, Yan Lu
In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information.
3 code implementations • CVPR 2019 • Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho
Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller.
4 code implementations • 26 Jan 2019 • Tao Hu, Honggang Qi, Qingming Huang, Yan Lu
Specifically, for each training image, we first generate attention maps to represent the object's discriminative parts by weakly supervised learning.
Ranked #13 on
Fine-Grained Image Classification
on CUB-200-2011
no code implementations • 12 Dec 2018 • Huihui Zhu, Bin Liu, Guojun Yin, Yan Lu, Weihai Li, Nenghai Yu
Most existing methods are computation consuming, which cannot satisfy the real-time requirement.
1 code implementation • ECCV 2018 • Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu
We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance.
1 code implementation • 26 Nov 2018 • Zengyi Qin, Jinglu Wang, Yan Lu
We propose MonoGRNet for the amodal 3D object detection from a monocular RGB image via geometric reasoning in both the observed 2D projection and the unobserved depth dimension.
Ranked #27 on
Monocular 3D Object Detection
on KITTI Cars Moderate
no code implementations • 23 Nov 2018 • Jinglu Wang, Bo Sun, Yan Lu
In this paper, we address the problem of reconstructing an object's surface from a single image using generative networks.
1 code implementation • 16 Sep 2018 • Yao Zhai, Xun Guo, Yan Lu, Houqiang Li
The recent research for person re-identification has been focused on two trends.
no code implementations • 6 Aug 2018 • Tao Hu, Jizheng Xu, Cong Huang, Honggang Qi, Qingming Huang, Yan Lu
Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps.
no code implementations • CVPR 2018 • Kun He, Yan Lu, Stan Sclaroff
In this paper, we improve the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows descriptor extraction in local feature based pipelines, and can be formulated as nearest neighbor retrieval.
no code implementations • CVPR 2018 • Yao Zhai, Jingjing Fu, Yan Lu, Houqiang Li
The RoI-based sub-region attention map and aspect ratio attention map are selectively pooled from the banks, and then used to refine the original RoI features for RoI classification.
no code implementations • ICCV 2015 • Yan Lu, Dezhen Song
To meet the challenges, we fuse point and line features to form a robust odometry algorithm.
no code implementations • 21 Oct 2015 • Yao Zhai, Qifei Wang, Yan Lu, Shipeng Li
This paper proposes an efficient content adaptive screen image scaling scheme for the real-time screen applications like remote desktop and screen sharing.
no code implementations • 30 Jan 2014 • Amin Rasekh, Chien-An Chen, Yan Lu
In this project, we design a robust activity recognition system based on a smartphone.