1 code implementation • ECCV 2020 • Miao Zhang, Sun Xiao Fei, Jie Liu, Shuang Xu, Yongri Piao, Huchuan Lu
In this paper, we propose an asymmetric two-stream architecture taking account of the inherent differences between RGB and depth data for saliency detection.
Ranked #19 on Thermal Image Segmentation on RGB-T-Glass-Segmentation
no code implementations • EMNLP 2020 • Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou
Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.
1 code implementation • 6 Feb 2024 • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen
We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.
1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.
no code implementations • 13 Dec 2023 • Haowen Bai, Zixiang Zhao, Jiangshe Zhang, Yichen Wu, Lilun Deng, Yukun Cui, Shuang Xu, Baisong Jiang
To ensure the fusion module maximally preserves the information from the source images, enabling the reconstruction of the source images from the fused image, we adopt a meta-learning strategy to train the loss proposal module using reconstruction loss.
no code implementations • 2 Dec 2023 • Shuang Xu, Sifan Zhou, Zhi Tian, Jizhou Ma, Qiong Nie, Xiangxiang Chu
Current traditional methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts, while learning-based approaches resort to iterative refinement for calibration results, posing constraints on their generalization and application in on-board systems.
1 code implementation • 31 Aug 2023 • Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc van Gool
NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer.
1 code implementation • 30 Jul 2023 • Zefa Hu, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu
However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term.
2 code implementations • 19 May 2023 • Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc van Gool
These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior.
2 code implementations • 7 May 2023 • Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu
(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.
no code implementations • ICCV 2023 • Zixiang Zhao, Jiangshe Zhang, Xiang Gu, Chengli Tan, Shuang Xu, Yulun Zhang, Radu Timofte, Luc van Gool
Then, the extracted features are mapped to the spherical space to complete the separation of private features and the alignment of shared features.
2 code implementations • ICCV 2023 • Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, Luc van Gool
To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM).
1 code implementation • 2 Mar 2023 • Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu
Medical Slot Filling (MSF) task aims to convert medical queries into structured information, playing an essential role in diagnosis dialogue systems.
2 code implementations • 30 Jan 2023 • Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu
Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
2 code implementations • CVPR 2023 • Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, Luc van Gool
We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information.
no code implementations • 15 Apr 2022 • Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu
Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history.
no code implementations • 1 Mar 2022 • Wentao Zhang, Shuang Xu, Haoran Huang
We further develop a new method for supervised contrastive learning, referred to as two-level supervised contrastive learning, and employ the method in response selection in multi-turn dialogue.
Ranked #2 on Conversational Response Selection on E-commerce
1 code implementation • 18 Feb 2022 • Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu
Finally, we discuss the new frontiers in VLP.
1 code implementation • NAACL 2021 • Haoran Wu, Wei Chen, Shuang Xu, Bo Xu
Specifically, we first structure the sequence of EMR into a hierarchical graph network and then obtain the causal relationship between multi-granularity features and diagnosis results through counterfactual intervention on the graph.
2 code implementations • CVPR 2022 • Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Zudi Lin, Hanspeter Pfister
Guided depth super-resolution (GDSR) is an essential topic in multi-modal image processing, which reconstructs high-resolution (HR) depth maps from low-resolution ones collected with suboptimal conditions with the help of HR RGB images of the same scene.
1 code implementation • 10 Mar 2021 • Shuang Xu, Jiangshe Zhang, Kai Sun, Zixiang Zhao, Lu Huang, Junmin Liu, Chunxia Zhang
Pansharpening is a fundamental issue in remote sensing field.
1 code implementation • CVPR 2021 • Shuang Xu, Jiangshe Zhang, Zixiang Zhao, Kai Sun, Junmin Liu, Chunxia Zhang
Specifically, two optimization problems regularized by the deep prior are formulated, and they are separately responsible for the generative models for panchromatic images and low resolution multispectral images.
no code implementations • 31 Dec 2020 • Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Kai Sun, Lu Huang, Junmin Liu, Chunxia Zhang
In addition, the latent information of features can be preserved effectively through adversarial training.
no code implementations • 30 Dec 2020 • Yongri Piao, Zhengkun Rong, Shuang Xu, Miao Zhang, Huchuan Lu
The success of learning-based light field saliency detection is heavily dependent on how a comprehensive dataset can be constructed for higher generalizability of models, how high dimensional light field data can be effectively exploited, and how a flexible model can be designed to achieve versatility for desktop computers and mobile devices.
1 code implementation • 29 Dec 2020 • Shuang Xu, Lizhen Ji, Zhe Wang, Pengfei Li, Kai Sun, Chunxia Zhang, Jiangshe Zhang
According to the idea that each local region in the fused image should be similar to the sharpest one among source images, this paper presents an optimization-based approach to reduce defocus spread effects.
no code implementations • COLING 2020 • Duzhen Zhang, Xiuyi Chen, Shuang Xu, Bo Xu
For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance.
1 code implementation • 21 Sep 2020 • Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei LI
The key idea is to generate source transcript and target translation text with a single decoder.
1 code implementation • 21 Sep 2020 • Qianqian Dong, Rong Ye, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei LI
Can we build a system to fully utilize signals in a parallel ST corpus?
no code implementations • 21 Sep 2020 • Yicheng Wang, Shuang Xu, Junmin Liu, Zixiang Zhao, Chun-Xia Zhang, Jiangshe Zhang
Multi-Focus Image Fusion (MFIF) is a promising image enhancement technique to obtain all-in-focus images meeting visual needs and it is a precondition of other computer vision tasks.
no code implementations • 2 Sep 2020 • Zixiang Zhao, Jiangshe Zhang, Shuang Xu, Kai Sun, Chunxia Zhang, Junmin Liu
The core idea is that the encoder decomposes an image into base and detail feature maps with low- and high-frequency information, respectively, and that the decoder is responsible for the original image reconstruction.
no code implementations • 20 May 2020 • Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu
End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 18 May 2020 • Shuang Xu, Zixiang Zhao, Yicheng Wang, Chun-Xia Zhang, Junmin Liu, Jiangshe Zhang
Image fusion is a significant problem in many fields including digital photography, computational imaging and remote sensing, to name but a few.
Infrared And Visible Image Fusion Multi-Exposure Image Fusion
2 code implementations • 12 May 2020 • Zixiang Zhao, Shuang Xu, Chun-Xia Zhang, Junmin Liu, Jiangshe Zhang
In this paper, a novel Bayesian fusion model is established for infrared and visible images.
no code implementations • 12 May 2020 • Zixiang Zhao, Shuang Xu, Jiangshe Zhang, Chengyang Liang, Chunxia Zhang, Junmin Liu
The proposed AUIF model starts with the iterative formulas of two traditional optimization models, which are established to accomplish two-scale decomposition, i. e., separating low-frequency base information and high-frequency detail information from source images.
Infrared And Visible Image Fusion Rolling Shutter Correction
2 code implementations • 20 Mar 2020 • Zixiang Zhao, Shuang Xu, Chun-Xia Zhang, Junmin Liu, Pengfei Li, Jiangshe Zhang
Infrared and visible image fusion, a hot topic in the field of image processing, aims at obtaining fused images keeping the advantages of source images.
Ranked #5 on Semantic Segmentation on FMB Dataset
no code implementations • 12 Feb 2020 • Shuang Xu, Xiaoli Wei, Chunxia Zhang, Junmin Liu, Jiangshe Zhang
It is found that current methods are evaluated on simulated image sets or Lytro dataset.
no code implementations • 9 Feb 2019 • Xiao-Hui Yang, Li Tian, Yun-Mei Chen, Li-Jun Yang, Shuang Xu, Wen-Ming Wu
In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples.
1 code implementation • 1 Jan 2019 • Shuang Xu, Chun-Xia Zhang, Jiangshe Zhang
By assuming noise to come from a Gaussian, Laplace or mixture of Gaussian distributions, significant efforts have been made on optimizing the (weighted) $L_1$ or $L_2$-norm loss between an observed matrix and its bilinear factorization.
1 code implementation • 11 Dec 2018 • Shuang Xu, Chun-Xia Zhang, Pei Wang, Jiangshe Zhang
Complex network reconstruction is a hot topic in many fields.
no code implementations • COLING 2018 • Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, Bo Xu
While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity.
no code implementations • 25 Jun 2018 • Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu
In this paper, we propose a single-channel speech dereverberation system (DeReGAT) based on convolutional, bidirectional long short-term memory and deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT).
no code implementations • 12 Jun 2018 • Shiyu Zhou, Shuang Xu, Bo Xu
Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can obtain relatively 10. 5\% average word error rate (WER) reduction compared to SHL-MLSTM with residual learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 May 2018 • Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu
Experiments on HKUST datasets demonstrate that the lexicon free modeling units can outperform lexicon related modeling units in terms of character error rate (CER).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 28 Apr 2018 • Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu
Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • EMNLP 2017 • Xiaowei Zhang, Wei Chen, Feng Wang, Shuang Xu, Bo Xu
Neural Machine Translation (NMT) lays intensive burden on computation and memory cost.