Search Results for author: Shuang Xu

Found 45 papers, 25 papers with code

Asymmetric Two-Stream Architecture for Accurate RGB-D Saliency Detection

1 code implementation • ECCV 2020 • Miao Zhang, Sun Xiao Fei, Jie Liu, Shuang Xu, Yongri Piao, Huchuan Lu

In this paper, we propose an asymmetric two-stream architecture taking account of the inherent differences between RGB and depth data for saliency detection.

Ranked #19 on Thermal Image Segmentation on RGB-T-Glass-Segmentation

Saliency Detection Thermal Image Segmentation +1

Paper
Code

Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation

no code implementations • EMNLP 2020 • Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou

Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.

Dialogue Generation Knowledge Distillation

Paper
Add Code

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation • 6 Feb 2024 • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

771

Paper
Code

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.

AutoML Language Modelling

771

Paper
Code

ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss via Meta-Learning

no code implementations • 13 Dec 2023 • Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Shuang Xu, Baisong Jiang

To ensure the fusion module maximally preserves the information from the source images, enabling the reconstruction of the source images from the fused image, we adopt a meta-learning strategy to train the loss proposal module using reconstruction loss.

Meta-Learning Multi-Exposure Image Fusion

Paper
Add Code

RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency Learning

no code implementations • 2 Dec 2023 • Shuang Xu, Sifan Zhou, Zhi Tian, Jizhou Ma, Qiong Nie, Xiangxiang Chu

Current traditional methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts, while learning-based approaches resort to iterative refinement for calibration results, posing constraints on their generalization and application in on-board systems.

Paper
Add Code

Neural Gradient Regularizer

1 code implementation • 31 Aug 2023 • Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc van Gool

NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer.

Zero-Shot Learning

Paper
Code

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

1 code implementation • 30 Jul 2023 • Zefa Hu, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term.

Paper
Code

Equivariant Multi-Modality Image Fusion

2 code implementations • 19 May 2023 • Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc van Gool

These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior.

Self-Supervised Learning

318

Paper
Code

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

2 code implementations • 7 May 2023 • Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.

Attribute Instruction Following +4

895

Paper
Code

Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution

no code implementations • ICCV 2023 • Zixiang Zhao, Jiangshe Zhang, Xiang Gu, Chengli Tan, Shuang Xu, Yulun Zhang, Radu Timofte, Luc van Gool

Then, the extracted features are mapped to the spherical space to complete the separation of private features and the alignment of shared features.

Contrastive Learning Depth Map Super-Resolution

Paper
Add Code

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

2 code implementations • ICCV 2023 • Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, Luc van Gool

To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM).

Denoising

318

Paper
Code

Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding

1 code implementation • 2 Mar 2023 • Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

Medical Slot Filling (MSF) task aims to convert medical queries into structured information, playing an essential role in diagnosis dialogue systems.

slot-filling Slot Filling

Paper
Code

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

2 code implementations • 30 Jan 2023 • Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu

Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

2 code implementations • CVPR 2023 • Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, Luc van Gool

We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information.

object-detection Object Detection +1

318

Paper
Code

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

no code implementations • 15 Apr 2022 • Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu

Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history.

Contrastive Learning Question Answering +2

Paper
Add Code

Two-Level Supervised Contrastive Learning for Response Selection in Multi-Turn Dialogue

no code implementations • 1 Mar 2022 • Wentao Zhang, Shuang Xu, Haoran Huang

We further develop a new method for supervised contrastive learning, referred to as two-level supervised contrastive learning, and employ the method in response selection in multi-turn dialogue.

Ranked #2 on Conversational Response Selection on E-commerce

Contrastive Learning Conversational Response Selection +3

Paper
Add Code

VLP: A Survey on Vision-Language Pre-training

1 code implementation • 18 Feb 2022 • Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu

Finally, we discuss the new frontiers in VLP.

277

Paper
Code

Counterfactual Supporting Facts Extraction for Explainable Medical Record Based Diagnosis with Graph Network

1 code implementation • NAACL 2021 • Haoran Wu, Wei Chen, Shuang Xu, Bo Xu

Specifically, we first structure the sequence of EMR into a hierarchical graph network and then obtain the causal relationship between multi-granularity features and diagnosis results through counterfactual intervention on the graph.

counterfactual

Paper
Code

Discrete Cosine Transform Network for Guided Depth Map Super-Resolution

2 code implementations • CVPR 2022 • Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Zudi Lin, Hanspeter Pfister

Guided depth super-resolution (GDSR) is an essential topic in multi-modal image processing, which reconstructs high-resolution (HR) depth maps from low-resolution ones collected with suboptimal conditions with the help of HR RGB images of the same scene.

Depth Map Super-Resolution

318

Paper
Code

Deep Convolutional Sparse Coding Network for Pansharpening with Guidance of Side Information

1 code implementation • 10 Mar 2021 • Shuang Xu, Jiangshe Zhang, Kai Sun, Zixiang Zhao, Lu Huang, Junmin Liu, Chunxia Zhang

Pansharpening is a fundamental issue in remote sensing field.

Pansharpening Rolling Shutter Correction

Paper
Code

Deep Gradient Projection Networks for Pan-sharpening

1 code implementation • CVPR 2021 • Shuang Xu, Jiangshe Zhang, Zixiang Zhao, Kai Sun, Junmin Liu, Chunxia Zhang

Specifically, two optimization problems regularized by the deep prior are formulated, and they are separately responsible for the generative models for panchromatic images and low resolution multispectral images.

Paper
Code

FGF-GAN: A Lightweight Generative Adversarial Network for Pansharpening via Fast Guided Filter

no code implementations • 31 Dec 2020 • Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Kai Sun, Lu Huang, Junmin Liu, Chunxia Zhang

In addition, the latent information of features can be preserved effectively through adversarial training.

Generative Adversarial Network Image Enhancement +1

Paper
Add Code

DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency Detection

no code implementations • 30 Dec 2020 • Yongri Piao, Zhengkun Rong, Shuang Xu, Miao Zhang, Huchuan Lu

The success of learning-based light field saliency detection is heavily dependent on how a comprehensive dataset can be constructed for higher generalizability of models, how high dimensional light field data can be effectively exploited, and how a flexible model can be designed to achieve versatility for desktop computers and mobile devices.

Saliency Detection

Paper
Add Code

Towards Reducing Severe Defocus Spread Effects for Multi-Focus Image Fusion via an Optimization Based Strategy

1 code implementation • 29 Dec 2020 • Shuang Xu, Lizhen Ji, Zhe Wang, Pengfei Li, Kai Sun, Chunxia Zhang, Jiangshe Zhang

According to the idea that each local region in the fused image should be similar to the sharpest one among source images, this paper presents an optimization-based approach to reduce defocus spread effects.

SSIM

Paper
Code

Knowledge Aware Emotion Recognition in Textual Conversations via Multi-Task Incremental Transformer

no code implementations • COLING 2020 • Duzhen Zhang, Xiuyi Chen, Shuang Xu, Bo Xu

For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance.

Emotion Recognition Graph Attention +3

Paper
Add Code

Consecutive Decoding for Speech-to-text Translation

1 code implementation • 21 Sep 2020 • Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei LI

The key idea is to generate source transcript and target translation text with a single decoder.

Machine Translation speech-recognition +3

Paper
Code

"Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation

1 code implementation • 21 Sep 2020 • Qianqian Dong, Rong Ye, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei LI

Can we build a system to fully utilize signals in a parallel ST corpus?

Speech-to-Text Translation Translation

Paper
Code

MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

no code implementations • 21 Sep 2020 • Yicheng Wang, Shuang Xu, Junmin Liu, Zixiang Zhao, Chun-Xia Zhang, Jiangshe Zhang

Multi-Focus Image Fusion (MFIF) is a promising image enhancement technique to obtain all-in-focus images meeting visual needs and it is a precondition of other computer vision tasks.

Generative Adversarial Network Image Enhancement

Paper
Add Code

When Image Decomposition Meets Deep Learning: A Novel Infrared and Visible Image Fusion Method

no code implementations • 2 Sep 2020 • Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Kai Sun, Chunxia Zhang, Junmin Liu

The core idea is that the encoder decomposes an image into base and detail feature maps with low- and high-frequency information, respectively, and that the decoder is responsible for the original image reconstruction.

Image Enhancement Image Reconstruction +1

Paper
Add Code

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

no code implementations • 20 May 2020 • Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu

End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Deep Convolutional Sparse Coding Networks for Image Fusion

2 code implementations • 18 May 2020 • Shuang Xu, Zixiang Zhao, Yicheng Wang, Chun-Xia Zhang, Junmin Liu, Jiangshe Zhang

Image fusion is a significant problem in many fields including digital photography, computational imaging and remote sensing, to name but a few.

Infrared And Visible Image Fusion Multi-Exposure Image Fusion

Paper
Code

Bayesian Fusion for Infrared and Visible Images

2 code implementations • 12 May 2020 • Zixiang Zhao, Shuang Xu, Chun-Xia Zhang, Junmin Liu, Jiangshe Zhang

In this paper, a novel Bayesian fusion model is established for infrared and visible images.

Infrared And Visible Image Fusion

Paper
Code

Efficient and Model-Based Infrared and Visible Image Fusion Via Algorithm Unrolling

no code implementations • 12 May 2020 • Zixiang Zhao, Shuang Xu, Jiangshe Zhang, Chengyang Liang, Chunxia Zhang, Junmin Liu

The proposed AUIF model starts with the iterative formulas of two traditional optimization models, which are established to accomplish two-scale decomposition, i. e., separating low-frequency base information and high-frequency detail information from source images.

Infrared And Visible Image Fusion Rolling Shutter Correction

Paper
Add Code

DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion

2 code implementations • 20 Mar 2020 • Zixiang Zhao, Shuang Xu, Chun-Xia Zhang, Junmin Liu, Pengfei Li, Jiangshe Zhang

Infrared and visible image fusion, a hot topic in the field of image processing, aims at obtaining fused images keeping the advantages of source images.

Ranked #5 on Semantic Segmentation on FMB Dataset

Infrared And Visible Image Fusion Semantic Segmentation

Paper
Code

MFFW: A new dataset for multi-focus image fusion

no code implementations • 12 Feb 2020 • Shuang Xu, Xiaoli Wei, Chunxia Zhang, Junmin Liu, Jiangshe Zhang

It is found that current methods are evaluated on simulated image sets or Lytro dataset.

Paper
Add Code

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

no code implementations • 9 Feb 2019 • Xiao-Hui Yang, Li Tian, Yun-Mei Chen, Li-Jun Yang, Shuang Xu, Wen-Ming Wu

In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples.

Classification General Classification +1

Paper
Add Code

Adaptive Quantile Low-Rank Matrix Factorization

1 code implementation • 1 Jan 2019 • Shuang Xu, Chun-Xia Zhang, Jiangshe Zhang

By assuming noise to come from a Gaussian, Laplace or mixture of Gaussian distributions, significant efforts have been made on optimizing the (weighted) $L_1$ or $L_2$-norm loss between an observed matrix and its bilinear factorization.

Paper
Code

Variational Bayesian Weighted Complex Network Reconstruction

1 code implementation • 11 Dec 2018 • Shuang Xu, Chun-Xia Zhang, Pei Wang, Jiangshe Zhang

Complex network reconstruction is a hot topic in many fields.

regression

Paper
Code

Semi-Supervised Disfluency Detection

no code implementations • COLING 2018 • Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, Bo Xu

While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity.

Generative Adversarial Network Machine Translation +1

Paper
Add Code

Single-channel Speech Dereverberation via Generative Adversarial Training

no code implementations • 25 Jun 2018 • Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu

In this paper, we propose a single-channel speech dereverberation system (DeReGAT) based on convolutional, bidirectional long short-term memory and deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT).

Speech Dereverberation

Paper
Add Code

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

no code implementations • 12 Jun 2018 • Shiyu Zhou, Shuang Xu, Bo Xu

Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can obtain relatively 10. 5\% average word error rate (WER) reduction compared to SHL-MLSTM with residual learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

no code implementations • 16 May 2018 • Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Experiments on HKUST datasets demonstrate that the lexicon free modeling units can outperform lexicon related modeling units in terms of character error rate (CER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

1 code implementation • 28 Apr 2018 • Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

289

Paper
Code

Towards Compact and Fast Neural Machine Translation Using a Combined Method

no code implementations • EMNLP 2017 • Xiaowei Zhang, Wei Chen, Feng Wang, Shuang Xu, Bo Xu

Neural Machine Translation (NMT) lays intensive burden on computation and memory cost.

Language Modelling Machine Translation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.