Search Results for author: Shuang Xu

Found 45 papers, 25 papers with code

Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation

no code implementations EMNLP 2020 Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou

Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.

Decoder Dialogue Generation +1

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation6 Feb 2024 Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss via Meta-Learning

no code implementations13 Dec 2023 Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Shuang Xu, Baisong Jiang

To ensure the fusion module maximally preserves the information from the source images, enabling the reconstruction of the source images from the fused image, we adopt a meta-learning strategy to train the loss proposal module using reconstruction loss.

Meta-Learning Multi-Exposure Image Fusion

RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency Learning

no code implementations2 Dec 2023 Shuang Xu, Sifan Zhou, Zhi Tian, Jizhou Ma, Qiong Nie, Xiangxiang Chu

Current traditional methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts, while learning-based approaches resort to iterative refinement for calibration results, posing constraints on their generalization and application in on-board systems.

Neural Gradient Regularizer

1 code implementation31 Aug 2023 Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc van Gool

NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer.

Zero-Shot Learning

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

1 code implementation30 Jul 2023 Zefa Hu, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term.

Equivariant Multi-Modality Image Fusion

3 code implementations19 May 2023 Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc van Gool

These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior.

Self-Supervised Learning

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

2 code implementations7 May 2023 Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.

Attribute Instruction Following +4

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

3 code implementations ICCV 2023 Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, Luc van Gool

To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM).


Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding

1 code implementation2 Mar 2023 Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

Medical Slot Filling (MSF) task aims to convert medical queries into structured information, playing an essential role in diagnosis dialogue systems.

slot-filling Slot Filling

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

3 code implementations CVPR 2023 Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, Luc van Gool

We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information.

object-detection Object Detection +1

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

no code implementations15 Apr 2022 Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu

Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history.

Contrastive Learning Question Answering +2

Two-Level Supervised Contrastive Learning for Response Selection in Multi-Turn Dialogue

no code implementations1 Mar 2022 Wentao Zhang, Shuang Xu, Haoran Huang

We further develop a new method for supervised contrastive learning, referred to as two-level supervised contrastive learning, and employ the method in response selection in multi-turn dialogue.

Contrastive Learning Conversational Response Selection +3

Counterfactual Supporting Facts Extraction for Explainable Medical Record Based Diagnosis with Graph Network

1 code implementation NAACL 2021 Haoran Wu, Wei Chen, Shuang Xu, Bo Xu

Specifically, we first structure the sequence of EMR into a hierarchical graph network and then obtain the causal relationship between multi-granularity features and diagnosis results through counterfactual intervention on the graph.


Discrete Cosine Transform Network for Guided Depth Map Super-Resolution

2 code implementations CVPR 2022 Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Zudi Lin, Hanspeter Pfister

Guided depth super-resolution (GDSR) is an essential topic in multi-modal image processing, which reconstructs high-resolution (HR) depth maps from low-resolution ones collected with suboptimal conditions with the help of HR RGB images of the same scene.

Depth Map Super-Resolution

Deep Gradient Projection Networks for Pan-sharpening

1 code implementation CVPR 2021 Shuang Xu, Jiangshe Zhang, Zixiang Zhao, Kai Sun, Junmin Liu, Chunxia Zhang

Specifically, two optimization problems regularized by the deep prior are formulated, and they are separately responsible for the generative models for panchromatic images and low resolution multispectral images.

DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency Detection

no code implementations30 Dec 2020 Yongri Piao, Zhengkun Rong, Shuang Xu, Miao Zhang, Huchuan Lu

The success of learning-based light field saliency detection is heavily dependent on how a comprehensive dataset can be constructed for higher generalizability of models, how high dimensional light field data can be effectively exploited, and how a flexible model can be designed to achieve versatility for desktop computers and mobile devices.

Saliency Detection

Towards Reducing Severe Defocus Spread Effects for Multi-Focus Image Fusion via an Optimization Based Strategy

1 code implementation29 Dec 2020 Shuang Xu, Lizhen Ji, Zhe Wang, Pengfei Li, Kai Sun, Chunxia Zhang, Jiangshe Zhang

According to the idea that each local region in the fused image should be similar to the sharpest one among source images, this paper presents an optimization-based approach to reduce defocus spread effects.


Knowledge Aware Emotion Recognition in Textual Conversations via Multi-Task Incremental Transformer

no code implementations COLING 2020 Duzhen Zhang, Xiuyi Chen, Shuang Xu, Bo Xu

For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance.

Emotion Recognition Graph Attention +3

MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

no code implementations21 Sep 2020 Yicheng Wang, Shuang Xu, Junmin Liu, Zixiang Zhao, Chun-Xia Zhang, Jiangshe Zhang

Multi-Focus Image Fusion (MFIF) is a promising image enhancement technique to obtain all-in-focus images meeting visual needs and it is a precondition of other computer vision tasks.

Generative Adversarial Network Image Enhancement

Consecutive Decoding for Speech-to-text Translation

1 code implementation21 Sep 2020 Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei LI

The key idea is to generate source transcript and target translation text with a single decoder.

Decoder Machine Translation +4

When Image Decomposition Meets Deep Learning: A Novel Infrared and Visible Image Fusion Method

no code implementations2 Sep 2020 Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Kai Sun, Chunxia Zhang, Junmin Liu

The core idea is that the encoder decomposes an image into base and detail feature maps with low- and high-frequency information, respectively, and that the decoder is responsible for the original image reconstruction.

Decoder Image Enhancement +2

Deep Convolutional Sparse Coding Networks for Image Fusion

2 code implementations18 May 2020 Shuang Xu, Zixiang Zhao, Yicheng Wang, Chun-Xia Zhang, Junmin Liu, Jiangshe Zhang

Image fusion is a significant problem in many fields including digital photography, computational imaging and remote sensing, to name but a few.

Infrared And Visible Image Fusion Multi-Exposure Image Fusion

Bayesian Fusion for Infrared and Visible Images

2 code implementations12 May 2020 Zixiang Zhao, Shuang Xu, Chun-Xia Zhang, Junmin Liu, Jiangshe Zhang

In this paper, a novel Bayesian fusion model is established for infrared and visible images.

Infrared And Visible Image Fusion

Efficient and Model-Based Infrared and Visible Image Fusion Via Algorithm Unrolling

no code implementations12 May 2020 Zixiang Zhao, Shuang Xu, Jiangshe Zhang, Chengyang Liang, Chunxia Zhang, Junmin Liu

The proposed AUIF model starts with the iterative formulas of two traditional optimization models, which are established to accomplish two-scale decomposition, i. e., separating low-frequency base information and high-frequency detail information from source images.

Decoder Infrared And Visible Image Fusion +1

DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion

2 code implementations20 Mar 2020 Zixiang Zhao, Shuang Xu, Chun-Xia Zhang, Junmin Liu, Pengfei Li, Jiangshe Zhang

Infrared and visible image fusion, a hot topic in the field of image processing, aims at obtaining fused images keeping the advantages of source images.

Decoder Infrared And Visible Image Fusion +1

MFFW: A new dataset for multi-focus image fusion

no code implementations12 Feb 2020 Shuang Xu, Xiaoli Wei, Chunxia Zhang, Junmin Liu, Jiangshe Zhang

It is found that current methods are evaluated on simulated image sets or Lytro dataset.

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

no code implementations9 Feb 2019 Xiao-Hui Yang, Li Tian, Yun-Mei Chen, Li-Jun Yang, Shuang Xu, Wen-Ming Wu

In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples.

Classification General Classification +1

Adaptive Quantile Low-Rank Matrix Factorization

1 code implementation1 Jan 2019 Shuang Xu, Chun-Xia Zhang, Jiangshe Zhang

By assuming noise to come from a Gaussian, Laplace or mixture of Gaussian distributions, significant efforts have been made on optimizing the (weighted) $L_1$ or $L_2$-norm loss between an observed matrix and its bilinear factorization.

Semi-Supervised Disfluency Detection

no code implementations COLING 2018 Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, Bo Xu

While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity.

Generative Adversarial Network Machine Translation +1

Single-channel Speech Dereverberation via Generative Adversarial Training

no code implementations25 Jun 2018 Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu

In this paper, we propose a single-channel speech dereverberation system (DeReGAT) based on convolutional, bidirectional long short-term memory and deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT).

Speech Dereverberation

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

no code implementations12 Jun 2018 Shiyu Zhou, Shuang Xu, Bo Xu

Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can obtain relatively 10. 5\% average word error rate (WER) reduction compared to SHL-MLSTM with residual learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

no code implementations16 May 2018 Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Experiments on HKUST datasets demonstrate that the lexicon free modeling units can outperform lexicon related modeling units in terms of character error rate (CER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

1 code implementation28 Apr 2018 Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Cannot find the paper you are looking for? You can Submit a new open access paper.