Search Results for author: Yiran Zhong

Found 55 papers, 29 papers with code

Audio-Visual Segmentation

1 code implementation11 Jul 2022 Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

Audio-Visual Segmentation with Semantics

1 code implementation30 Jan 2023 Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

Hierarchical Neural Architecture Search for Deep Stereo Matching

1 code implementation NeurIPS 2020 Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Tom Drummond, Hongdong Li, ZongYuan Ge

To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation.

Neural Architecture Search Semantic Segmentation +3

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations27 Jul 2023 Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

2 code implementations9 Jan 2024 Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i. e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption.

cosFormer: Rethinking Softmax in Attention

3 code implementations ICLR 2022 Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

D4RL Language Modelling +1

Positive Sample Propagation along the Audio-Visual Event Line

2 code implementations CVPR 2021 Jinxing Zhou, Liang Zheng, Yiran Zhong, Shijie Hao, Meng Wang

To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.

audio-visual event localization

Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation

3 code implementations NeurIPS 2020 Jianyuan Wang, Yiran Zhong, Yuchao Dai, Kaihao Zhang, Pan Ji, Hongdong Li

Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume.

Optical Flow Estimation Stereo Matching

Toeplitz Neural Network for Sequence Modeling

2 code implementations8 May 2023 Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Sequence modeling has important applications in natural language processing and computer vision.

Language Modelling Position

Deblurring by Realistic Blurring

1 code implementation CVPR 2020 Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Bjorn Stenger, Wei Liu, Hongdong Li

To address this problem, we propose a new method which combines two GAN models, i. e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images.

Deblurring Image Deblurring

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

1 code implementation ICCV 2021 Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao

In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.

Saliency Detection Thermal Image Segmentation

The Devil in Linear Transformer

1 code implementation19 Oct 2022 Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong

In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.

Language Modelling Text Classification

Noise-Aware Unsupervised Deep Lidar-Stereo Fusion

3 code implementations CVPR 2019 Xuelian Cheng, Yiran Zhong, Yuchao Dao, Pan Ji, Hongdong Li

In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps.

Depth Completion Stereo Matching +1

Implicit Motion Handling for Video Camouflaged Object Detection

1 code implementation CVPR 2022 Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, ZongYuan Ge

We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames.

Camouflaged Object Segmentation Motion Estimation +4

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

1 code implementation CVPR 2023 Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes

Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.

Contrastive Learning

Adversarial Spatio-Temporal Learning for Video Deblurring

1 code implementation28 Mar 2018 Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Wei Liu, Hongdong Li

To tackle the second challenge, we leverage the developed DBLRNet as a generator in the GAN (generative adversarial network) architecture, and employ a content loss in addition to an adversarial loss for efficient adversarial training.

Deblurring Generative Adversarial Network

Invertible Attention

1 code implementation16 Jun 2021 Jiajun Zha, Yiran Zhong, Jing Zhang, Richard Hartley, Liang Zheng

Attention has been proved to be an efficient mechanism to capture long-range dependencies.

Image Reconstruction

Vicinity Vision Transformer

1 code implementation21 Jun 2022 Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.

Image Classification

Accelerating Toeplitz Neural Network with Constant-time Inference Complexity

1 code implementation15 Nov 2023 Zhen Qin, Yiran Zhong

On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity.

Language Modelling

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

1 code implementation ICCV 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.

Attribute Representation Learning

Deep Laparoscopic Stereo Matching with Transformers

1 code implementation25 Jul 2022 Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zhiyong Wang, ZongYuan Ge

The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection.

object-detection Object Detection +2

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

1 code implementation8 Aug 2023 Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.

Object Localization Weakly supervised Semantic Segmentation +1

Memory-Free Generative Replay For Class-Incremental Learning

1 code implementation1 Sep 2021 Xiaomeng Xin, Yiran Zhong, Yunzhong Hou, Jinjun Wang, Liang Zheng

With the absence of old task images, they often assume that old knowledge is well preserved if the classifier produces similar output on new images.

Class Incremental Learning Incremental Learning

Image-based Geolocalization by Ground-to-2.5D Map Matching

1 code implementation11 Aug 2023 Mengjie Zhou, Liu Liu, Yiran Zhong, Andrew Calway

In this paper, we lift cross-view matching to a 2. 5D space, where heights of structures (e. g., trees and buildings) provide geometric information to guide the cross-view matching.

Image-Based Localization

Transcribing Natural Languages for The Deaf via Neural Editing Programs

1 code implementation17 Dec 2021 Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li

This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.

Sentence

Self-Supervised Learning for Stereo Matching with Self-Improving Ability

no code implementations4 Sep 2017 Yiran Zhong, Yuchao Dai, Hongdong Li

Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations.

Self-Supervised Learning Stereo Matching +1

Robust Multi-body Feature Tracker: A Segmentation-free Approach

no code implementations CVPR 2016 Pan Ji, Hongdong Li, Mathieu Salzmann, Yiran Zhong

Feature tracking is a fundamental problem in computer vision, with applications in many computer vision tasks, such as visual SLAM and action recognition.

Action Recognition Motion Segmentation +2

3D Geometry-Aware Semantic Labeling of Outdoor Street Scenes

no code implementations13 Aug 2018 Yiran Zhong, Yuchao Dai, Hongdong Li

This paper is concerned with the problem of how to better exploit 3D geometric information for dense semantic image labeling.

Open-World Stereo Video Matching with Deep RNN

no code implementations ECCV 2018 Yiran Zhong, Hongdong Li, Yuchao Dai

Deep Learning based stereo matching methods have shown great successes and achieved top scores across different benchmarks.

Stereo Matching Stereo Matching Hand

Stereo Computation for a Single Mixture Image

no code implementations ECCV 2018 Yiran Zhong, Yuchao Dai, Hongdong Li

This paper proposes an original problem of \emph{stereo computation from a single mixture image}-- a challenging problem that had not been researched before.

Stereo Matching Stereo Matching Hand +1

Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes

no code implementations CVPR 2019 Yiran Zhong, Pan Ji, Jianyuan Wang, Yuchao Dai, Hongdong Li

In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning.

Benchmarking Optical Flow Estimation

Efficient Depth Completion Using Learned Bases

no code implementations2 Dec 2020 Yiran Zhong, Yuchao Dai, Hongdong Li

The given sparse depth points are served as a data term to constrain the weighting process.

Depth Completion

Depth Completion using Piecewise Planar Model

no code implementations6 Dec 2020 Yiran Zhong, Yuchao Dai, Hongdong Li

More specifically, we represent the desired depth map as a collection of 3D planar and the reconstruction problem is formulated as the optimization of planar parameters.

Depth Completion Visual Odometry

Exploring Depth Contribution for Camouflaged Object Detection

no code implementations24 Jun 2021 Mochu Xiang, Jing Zhang, Yunqiu Lv, Aixuan Li, Yiran Zhong, Yuchao Dai

In this paper, we study the depth contribution for camouflaged object detection, where the depth maps are generated with existing monocular depth estimation (MDE) methods.

Generative Adversarial Network Monocular Depth Estimation +5

IDENTIFYING CONCEALED OBJECTS FROM VIDEOS

no code implementations29 Sep 2021 Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, ZongYuan Ge

The proposed SLT-Net leverages on both short-term dynamics and long-term temporal consistency to detect concealed objects in continuous video frames.

object-detection Object Detection

Dense Uncertainty Estimation via an Ensemble-based Conditional Latent Variable Model

no code implementations22 Nov 2021 Jing Zhang, Yuchao Dai, Mehrtash Harandi, Yiran Zhong, Nick Barnes, Richard Hartley

Uncertainty estimation has been extensively studied in recent literature, which can usually be classified as aleatoric uncertainty and epistemic uncertainty.

Attribute object-detection +1

MUNet: Motion Uncertainty-aware Semi-supervised Video Object Segmentation

no code implementations29 Nov 2021 Jiadai Sun, Yuxin Mao, Yuchao Dai, Yiran Zhong, Jianyuan Wang

The task of semi-supervised video object segmentation (VOS) has been greatly advanced and state-of-the-art performance has been made by dense matching-based methods.

Object Semantic Segmentation +2

Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective

no code implementations10 Apr 2022 Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li

In this paper, we propose to model deep NRSfM from a sequence-to-sequence translation perspective, where the input 2D frame sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape sequence.

3D Reconstruction Translation

Neural Architecture Search on Efficient Transformers and Beyond

no code implementations28 Jul 2022 Zexiang Liu, Dong Li, Kaiyue Lu, Zhen Qin, Weixuan Sun, Jiacheng Xu, Yiran Zhong

To address this issue, we propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique.

Computational Efficiency Image Classification +2

Linear Video Transformer with Feature Fixation

no code implementations15 Oct 2022 Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Feature Importance Video Classification

Improving Audio-Visual Video Parsing with Pseudo Visual Labels

no code implementations4 Mar 2023 Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

We perform extensive experiments on the LLP dataset and demonstrate that our method can generate high-quality segment-level pseudo labels with the help of our newly proposed loss and the label denoising strategy.

Denoising Pseudo Label

Linearized Relative Positional Encoding

no code implementations18 Jul 2023 Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modelling +2

Exploring Transformer Extrapolation

no code implementations19 Jul 2023 Zhen Qin, Yiran Zhong, Hui Deng

While these methods perform well on a variety of corpora, the conditions for length extrapolation have yet to be investigated.

Language Modelling

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

no code implementations31 Jul 2023 Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.

Contrastive Learning Denoising +2

Improving Audio-Visual Segmentation with Bidirectional Generation

no code implementations16 Aug 2023 Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong

In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework.

Motion Estimation Object +2

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

no code implementations29 Jan 2024 Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Cannot find the paper you are looking for? You can Submit a new open access paper.