Search Results for author: Weixuan Sun

Found 24 papers, 17 papers with code

HGRN2: Gated Linear RNNs with State Expansion

3 code implementations11 Apr 2024 Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Hierarchically gated linear RNN (HGRN, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference.

Image Classification Language Modelling

Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

no code implementations24 Mar 2024 Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass.

Denoising

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

no code implementations30 Jan 2024 Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji

A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed.

Denoising Scene Generation

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

1 code implementation29 Jan 2024 Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

2 code implementations9 Jan 2024 Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong

With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i. e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption.

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

1 code implementation8 Aug 2023 Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.

Object Localization Weakly supervised Semantic Segmentation +1

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

2 code implementations27 Jul 2023 Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong

TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.

Language Modelling Large Language Model

Linearized Relative Positional Encoding

no code implementations18 Jul 2023 Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modelling +2

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

2 code implementations25 May 2023 Zheyuan Liu, Weixuan Sun, Damien Teney, Stephen Gould

An alternative approach is to allow interactions between the query and every possible candidate, i. e., reference-text-candidate triplets, and pick the best from the entire set.

Composed Image Retrieval (CoIR) Re-Ranking +1

Toeplitz Neural Network for Sequence Modeling

2 code implementations8 May 2023 Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Sequence modeling has important applications in natural language processing and computer vision.

Language Modelling Position

Bi-directional Training for Composed Image Retrieval via Text Prompt Learning

1 code implementation29 Mar 2023 Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould

Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes.

Composed Image Retrieval (CoIR) Retrieval

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

1 code implementation CVPR 2023 Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes

Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.

Contrastive Learning

Audio-Visual Segmentation with Semantics

1 code implementation30 Jan 2023 Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

The Devil in Linear Transformer

1 code implementation19 Oct 2022 Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong

In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.

Language Modelling Text Classification

Linear Video Transformer with Feature Fixation

no code implementations15 Oct 2022 Kaiyue Lu, Zexiang Liu, Jianyuan Wang, Weixuan Sun, Zhen Qin, Dong Li, Xuyang Shen, Hui Deng, Xiaodong Han, Yuchao Dai, Yiran Zhong

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Feature Importance Video Classification

Neural Architecture Search on Efficient Transformers and Beyond

no code implementations28 Jul 2022 Zexiang Liu, Dong Li, Kaiyue Lu, Zhen Qin, Weixuan Sun, Jiacheng Xu, Yiran Zhong

To address this issue, we propose a new framework to find optimal architectures for efficient Transformers with the neural architecture search (NAS) technique.

Computational Efficiency Image Classification +2

Audio-Visual Segmentation

1 code implementation11 Jul 2022 Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

Vicinity Vision Transformer

1 code implementation21 Jun 2022 Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.

Image Classification

cosFormer: Rethinking Softmax in Attention

3 code implementations ICLR 2022 Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

D4RL Language Modelling +1

Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

1 code implementation27 Oct 2021 Weixuan Sun, Jing Zhang, Nick Barnes

To solve this, most existing approaches follow a multi-training pipeline to refine CAMs for better pseudo-labels, which includes: 1) re-training the classification model to generate CAMs; 2) post-processing CAMs to obtain pseudo labels; and 3) training a semantic segmentation model with the obtained pseudo labels.

Segmentation Weakly supervised Semantic Segmentation +1

3D Guided Weakly Supervised Semantic Segmentation

no code implementations1 Dec 2020 Weixuan Sun, Jing Zhang, Nick Barnes

In this paper, we propose a weakly supervised 2D semantic segmentation model by incorporating sparse bounding box labels with available 3D information, which is much easier to obtain with advanced sensors.

2D Semantic Segmentation Segmentation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.