Search Results for author: Bo Xu

Found 148 papers, 50 papers with code

结合标签转移关系的多任务笑点识别方法(Multi-task punchlines recognition method combined with label transfer relationship)

no code implementations CCL 2021 Tongyue Zhang, Shaowu Zhang, Bo Xu, Liang Yang, Hongfei Lin

“幽默在人类交流中扮演着重要角色, 并大量存在于情景喜剧中。笑点(punchline)是情景喜剧实现幽默效果的形式之一, 在情景喜剧笑点识别任务中, 每条句子的标签代表该句是否为笑点, 但是以往的笑点识别工作通常只通过建模上下文语义关系识别笑点, 对标签的利用并不充分。为了充分利用标签序列中的信息, 本文提出了一种新的识别方法, 即结合条件随机场的单词级-句子级多任务学习模型, 该模型在两方面进行了改进, 首先将标签序列中相邻两个标签之间的转移关系看作幽默理论中不一致性的一种体现, 并使用条件随机场学习这种转移关系, 其次由于学习相邻标签之间的转移关系以及上下文语义关系均能够学习到铺垫和笑点之间的不一致性, 两者之间存在相关性, 为了使模型通过利用这种相关性提高笑点识别的效果, 该模型引入了多任务学习方法, 使用多任务学习方法同时学习每条句子的句义、组成每条句子的所有字符的词义, 单词级别的标签转移关系以及句子级别的标签转移关系。本文在CCL2020“小牛杯”幽默计算—情景喜剧笑点识别评测任务的英文数据集上进行实验, 结果表明, 本文提出的方法比目前最好的方法提高了3. 2%, 在情景喜剧幽默笑点识别任务上取得了最好的效果, 并通过消融实验证明了上述两方面改进的有效性。”

软件标识符的自然语言规范性研究(Research on the Natural Language Normalness of Software Identifiers)

no code implementations CCL 2021 Dongzhen Wen, Fan Zhang, Xiao Zhang, Liang Yang, Yuan Lin, Bo Xu, Hongfei Lin

“软件源代码的理解则是软件协同开发与维护的核心, 而源代码中占半数以上的标识符的理解则在软件理解中起到重要作用, 传统软件工程主要研究通过命名规范限制标识符的命名过程以构造更易理解和交流的标识符。本文则在梳理分析常见编程语言命名规范的基础上, 提出一种全新的标识符可理解性评价标准。具体而言, 本文首先总结梳理了常见主流编程语言中的命名规范并类比自然语言语素概念本文提出基于软件语素的标识符构成过程, 即标识符的构成可被视为软件语素的生成、排列和连接过程。在此基础上, 本文提出一种结合自然语料库的软件标识符规范性评价方法, 用来衡量软件标识符是否易于理解。最后, 本文通过源代码理解数据集和乇乩乴乨乵乢平台中开源项目对规范性指标进行了验证性实验, 结果表明本文提出的规范性分数能够很好衡量软件项目的可理解性。”

Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation

no code implementations EMNLP 2020 Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou

Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.

Dialogue Generation Knowledge Distillation

RealMedDial: A Real Telemedical Dialogue Dataset Collected from Online Chinese Short-Video Clips

no code implementations COLING 2022 Bo Xu, Hongtong Zhang, Jian Wang, Xiaokun Zhang, Dezhi Hao, Linlin Zong, Hongfei Lin, Fenglong Ma

We collected and annotated a wide range of meta-data with respect to medical dialogue including doctor profiles, hospital departments, diseases and symptoms for fine-grained analysis on language usage pattern and clinical diagnosis.

Response Generation

Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification

no code implementations27 Mar 2024 Qingyu Wang, Duzhen Zhang, Tilelin Zhang, Bo Xu

Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost.

An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

no code implementations18 Mar 2024 Zewen Xu, Yijia He, Hao Wei, Bo Xu, BinJian Xie, Yihong Wu

First, a high-precision rotation estimation method based on normal vector coplanarity constraints that consider the uncertainty of observations is proposed, which can be solved by Levenberg-Marquardt (LM) algorithm efficiently.

Pose Estimation Translation +2

URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields

no code implementations15 Mar 2024 Bo Xu, Ziao Liu, Mengqi Guo, Jiancheng Li, Gim Hee Lee

We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation.

Side Information-Driven Session-based Recommendation: A Survey

no code implementations27 Feb 2024 Xiaokun Zhang, Bo Xu, Chenliang Li, Yao Zhou, Liangyue Li, Hongfei Lin

Emerging efforts incorporate various kinds of side information into their methods for enhancing task performance.

Session-Based Recommendations

Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making

no code implementations18 Dec 2023 Jingqing Ruan, Kaishen Wang, Qingyang Zhang, Dengpeng Xing, Bo Xu

Many complicated real-world tasks can be broken down into smaller, more manageable parts, and planning with prior knowledge extracted from these simplified pieces is crucial for humans to make accurate decisions.

Decision Making Representation Learning

Universal Deoxidation of Semiconductor Substrates Assisted by Machine-Learning and Real-Time-Feedback-Control

no code implementations4 Dec 2023 Chao Shen, Wenkang Zhan, Jian Tang, Zhaofeng Wu, Bo Xu, Chao Zhao, Zhanguo Wang

It standardizes deoxidation temperatures across various equipment and substrate materials, advancing the standardization research process in semiconductor preparation, a significant milestone in thin film growth technology.

MetaDefa: Meta-learning based on Domain Enhancement and Feature Alignment for Single Domain Generalization

no code implementations27 Nov 2023 Can Sun, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng, Bo Xu

The single domain generalization(SDG) based on meta-learning has emerged as an effective technique for solving the domain-shift problem.

Domain Generalization Meta-Learning

Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification

no code implementations26 Nov 2023 Bo Xu, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data.

Self-Knowledge Distillation

Local Convolution Enhanced Global Fourier Neural Operator For Multiscale Dynamic Spaces Prediction

no code implementations21 Nov 2023 Xuanle Zhao, Yue Sun, Tielin Zhang, Bo Xu

One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain.

Bi-Preference Learning Heterogeneous Hypergraph Networks for Session-based Recommendation

1 code implementation2 Nov 2023 Xiaokun Zhang, Bo Xu, Fenglong Ma, Chenliang Li, Yuan Lin, Hongfei Lin

Secondly, price preference and interest preference are interdependent and collectively determine user choice, necessitating that we jointly consider both price and interest preference for intent modeling.

Multi-Task Learning Session-Based Recommendations

A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations

1 code implementation31 Oct 2023 Hui Ma, Jian Wang, Hongfei Lin, Bo Zhang, Yijia Zhang, Bo Xu

Emotion recognition in conversations (ERC), the task of recognizing the emotion of each utterance in a conversation, is crucial for building empathetic machines.

Multimodal Emotion Recognition

Beyond Co-occurrence: Multi-modal Session-based Recommendation

1 code implementation29 Sep 2023 Xiaokun Zhang, Bo Xu, Fenglong Ma, Chenliang Li, Liang Yang, Hongfei Lin

(2) How to fuse these heterogeneous descriptive information to comprehensively infer user interests?

Contrastive Learning Descriptive +2

Fast Locality Sensitive Hashing with Theoretical Guarantee

no code implementations27 Sep 2023 Zongyuan Tan, Hongya Wang, Bo Xu, Minjie Luo, Ming Du

Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks.

Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

no code implementations18 Aug 2023 Hongqiu Wang, Lei Zhu, Guang Yang, Yike Guo, Shichen Zhang, Bo Xu, Yueming Jin

Our method is verified on these datasets, and experimental results exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods.

Robot Navigation Segmentation

Inherent Redundancy in Spiking Neural Networks

1 code implementation ICCV 2023 Man Yao, Jiakui Hu, Guangshe Zhao, Yaoyuan Wang, Ziyang Zhang, Bo Xu, Guoqi Li

In this work, we pose and focus on three key questions regarding the inherent redundancy in SNNs.

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

no code implementations5 Aug 2023 Fangyuan Wang, Ming Hao, Yuhai Shi, Bo Xu

The conventional recipe for Automatic Speech Recognition (ASR) models is to 1) train multiple checkpoints on a training set while relying on a validation set to prevent overfitting using early stopping and 2) average several last checkpoints or that of the lowest validation losses to obtain the final model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Attention-free Spikformer: Mixing Spike Sequences with Simple Linear Transforms

no code implementations2 Aug 2023 Qingyu Wang, Duzhen Zhang, Tielin Zhang, Bo Xu

The results indicate that compared to the SOTA Spikformer with SSA, Spikformer with LT achieves higher Top-1 accuracy on neuromorphic datasets (i. e., CIFAR10-DVS and DVS128 Gesture) and comparable Top-1 accuracy on static datasets (i. e., CIFAR-10 and CIFAR-100).

Image Classification

ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation

1 code implementation1 Aug 2023 Bo Zhang, Jian Wang, Hui Ma, Bo Xu, Hongfei Lin

To overcome this challenge, we propose an innovative multimodal framework, called ZRIGF, which assimilates image-grounded information for dialogue generation in zero-resource situations.

Dialogue Generation Response Generation

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

1 code implementation30 Jul 2023 Zefa Hu, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term.

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

1 code implementation22 Jul 2023 Qingyang Zhang, Yiming Yang, Jingqing Ruan, Xuantang Xiong, Dengpeng Xing, Bo Xu

However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation.

Continuous Control Hierarchical Reinforcement Learning +2

Hate Speech Detection via Dual Contrastive Learning

no code implementations10 Jul 2023 Junyu Lu, Hongfei Lin, Xiaokun Zhang, Zhaoqing Li, Tongyue Zhang, Linlin Zong, Fenglong Ma, Bo Xu

Our framework jointly optimizes the self-supervised and the supervised contrastive learning loss for capturing span-level information beyond the token-level emotional semantics used in existing models, particularly detecting speech containing abusive and insulting words.

Contrastive Learning Hate Speech Detection

Spike-driven Transformer

1 code implementation NeurIPS 2023 Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li

In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition.

Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots

no code implementations22 Jun 2023 Chao Shen, Wenkang Zhan, Kaiyao Xin, Manyang Li, Zhenyu Sun, Hui Cong, Chi Xu, Jian Tang, Zhaofeng Wu, Bo Xu, Zhongming Wei, Chunlai Xue, Chao Zhao, Zhanguo Wang

Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for developing various optoelectronic devices such as QD lasers and single photon sources.

StructuredMesh: 3D Structured Optimization of Façade Components on Photogrammetric Mesh Models using Binary Integer Programming

no code implementations7 Jun 2023 Libin Wang, Han Hu, Qisen Shang, Bo Xu, Qing Zhu

The lack of fa\c{c}ade structures in photogrammetric mesh models renders them inadequate for meeting the demands of intricate applications.

object-detection Object Detection

VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

no code implementations31 May 2023 Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu

In this paper, we first propose ViLaS (Vision and Language into Automatic Speech Recognition), a novel multimodal ASR model based on the continuous integrate-and-fire (CIF) mechanism, which can integrate visual and textual context simultaneously or separately, to facilitate speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification

no code implementations24 May 2023 Xiyuan Wang, Fangyuan Wang, Bo Xu, Liang Xu, Jing Xiao

Typically, the Time-Delay Neural Network (TDNN) and Transformer can serve as a backbone for Speaker Verification (SV).

Speaker Verification

Mixture of personality improved Spiking actor network for efficient multi-agent cooperation

no code implementations10 May 2023 Xiyun Li, Ziyi Ni, Jingqing Ruan, Linghui Meng, Jing Shi, Tielin Zhang, Bo Xu

Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning.

Multi-agent Reinforcement Learning reinforcement-learning

Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

1 code implementation8 May 2023 Junyu Lu, Bo Xu, Xiaokun Zhang, Changrong Min, Liang Yang, Hongfei Lin

In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers.

Hate Speech Detection

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

2 code implementations7 May 2023 Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.

Attribute Instruction Following +4

Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

1 code implementation ICLR 2023 Hongming Zhang, Chenjun Xiao, Han Wang, Jun Jin, Bo Xu, Martin Müller

In this work, we further exploit the information in the replay memory by treating it as an empirical \emph{Replay Memory MDP (RM-MDP)}.

SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks

no code implementations12 Apr 2023 Haojia Yu, Han Hu, Bo Xu, Qisen Shang, Zhendong Wang, Qing Zhu

Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images.

Segmentation Semantic Segmentation +2

Semantic Image Translation for Repairing the Texture Defects of Building Models

no code implementations30 Mar 2023 Qisen Shang, Han Hu, Haojia Yu, Bo Xu, Libin Wang, Qing Zhu

Experimental results on publicly available fa\c{c}ade image and 3D model datasets demonstrate that our method yields superior results and effectively addresses issues associated with flawed textures.

Image Generation Style Transfer +2

Matching-based Term Semantics Pre-training for Spoken Patient Query Understanding

1 code implementation2 Mar 2023 Zefa Hu, Xiuyi Chen, Haoran Wu, Minglun Han, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

Medical Slot Filling (MSF) task aims to convert medical queries into structured information, playing an essential role in diagnosis dialogue systems.

slot-filling Slot Filling

A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization

1 code implementation CVPR 2023 Yijia He, Bo Xu, Zhanpeng Ouyang, Hongdong Li

We propose a novel visual-inertial odometry (VIO) initialization method, which decouples rotation and translation estimation, and achieves higher efficiency and better robustness.

Translation

Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy Network

1 code implementation29 Dec 2022 Duzhen Zhang, Tielin Zhang, Shuncheng Jia, Qingyu Wang, Bo Xu

Learning from the interaction is the primary way biological agents know about the environment and themselves.

Privileged Prior Information Distillation for Image Matting

no code implementations25 Nov 2022 Cheng Lyu, Jiake Xie, Bo Xu, Cheng Lu, Han Huang, Xin Huang, Ming Wu, Chuang Zhang, Yong Tang

Performance of trimap-free image matting methods is limited when trying to decouple the deterministic and undetermined regions, especially in the scenes where foregrounds are semantically ambiguous, chromaless, or high transmittance.

Image Matting

Motif-topology improved Spiking Neural Network for the Cocktail Party Effect and McGurk Effect

1 code implementation12 Nov 2022 Shuncheng Jia, Tielin Zhang, Ruichen Zuo, Bo Xu

Here, we propose a Motif-topology improved SNN (M-SNN) for the efficient multi-sensory integration and cognitive phenomenon simulations.

Mitigating spectral bias for the multiscale operator learning with hierarchical attention

no code implementations19 Oct 2022 Xinliang Liu, Bo Xu, Lei Zhang

Neural operators have emerged as a powerful tool for learning the mapping between infinite-dimensional parameter and solution spaces of partial differential equations (PDEs).

Operator learning

Attention Spiking Neural Networks

no code implementations28 Sep 2022 Man Yao, Guangshe Zhao, Hengyu Zhang, Yifan Hu, Lei Deng, Yonghong Tian, Bo Xu, Guoqi Li

On ImageNet-1K, we achieve top-1 accuracy of 75. 92% and 77. 08% on single/4-step Res-SNN-104, which are state-of-the-art results in SNNs.

Action Recognition Image Classification

Situational Perception Guided Image Matting

no code implementations20 Apr 2022 Bo Xu, Jiake Xie, Han Huang, Ziwen Li, Cheng Lu, Yong Tang, Yandong Guo

In this paper, we propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations and captures sufficient situational perception information for better global saliency distilled from the visual-to-textual task.

Image Matting Object

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

no code implementations15 Apr 2022 Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu

Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history.

Contrastive Learning Question Answering +2

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR

1 code implementation29 Mar 2022 Fangyuan Wang, Bo Xu

We integrate this scheme with the chunk-wise Transformer and Conformer, and identify them as SChunk-Transformer and SChunk-Conformer, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Exploiting Pairwise Mutual Information for Knowledge-Grounded Dialogue

1 code implementation IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022 Bo Zhang, Jian Wang, Hongfei Lin, Hui Ma, Bo Xu

Correlation integration is designed to fully exploit the pairwise mutual information among dialogue context, knowledge, and responses, while overall integration adopts an integration gate to capture global information.

Dialogue Generation

Semantic Distillation Guided Salient Object Detection

no code implementations8 Mar 2022 Bo Xu, Guanze Liu, Han Huang, Cheng Lu, Yandong Guo

Most existing CNN-based salient object detection methods can identify local segmentation details like hair and animal fur, but often misinterpret the real saliency due to the lack of global contextual information caused by the subjectiveness of the SOD task and the locality of convolution layers.

Image Captioning Object +3

Motif-topology and Reward-learning improved Spiking Neural Network for Efficient Multi-sensory Integration

1 code implementation11 Feb 2022 Shuncheng Jia, Ruichen Zuo, Tielin Zhang, Hongxing Liu, Bo Xu

Network architectures and learning principles are key in forming complex functions in artificial neural networks (ANNs) and spiking neural networks (SNNs).

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

1 code implementation30 Jan 2022 Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu

Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge.

speech-recognition Speech Recognition

Semi-Supervised Adversarial Recognition of Refined Window Structures for Inverse Procedural Façade Modeling

no code implementations22 Jan 2022 Han Hu, Xinrong Liang, Yulin Ding, Qisen Shang, Bo Xu, Xuming Ge, Min Chen, Ruofei Zhong, Qing Zhu

Unfortunately, the large amount of interactive sample labeling efforts has dramatically hindered the application of deep learning methods, especially for 3D modeling tasks, which require heterogeneous samples.

Generative Adversarial Network

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem

no code implementations17 Dec 2021 Jing Shi, Xuankai Chang, Tomoki Hayashi, Yen-Ju Lu, Shinji Watanabe, Bo Xu

Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols, and convert the paradigm of the speech separation/enhancement related tasks from regression to classification.

regression Speech Separation

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

1 code implementation6 Dec 2021 Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu

In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.

Offline RL reinforcement-learning +4

LiMuSE: Lightweight Multi-modal Speaker Extraction

1 code implementation7 Nov 2021 Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu

Multi-modal cues, including spatial information, facial expression and voiceprint, are introduced to the speech separation and speaker extraction tasks to serve as complementary information to achieve better performance.

Model Compression Quantization +1

Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation

no code implementations22 Oct 2021 Ziwen Li, Bo Xu, Han Huang, Cheng Lu, Yandong Guo

In this paper, we propose a new framework Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation (DTS-VIBE), to generate 3D human pose and mesh from RGB videos.

3D Human Pose Estimation Optical Flow Estimation

Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction

1 code implementation ICCV 2021 Bo Xu, Han Huang, Cheng Lu, Ziwen Li, Yandong Guo

In this paper, we propose a Virtual Multi-modality Foreground Matting (VMFM) method to learn human-object interactive foreground (human and objects interacted with him or her) from a raw RGB image.

Human-Object Interaction Detection Image Matting

Tao: A Learning Framework for Adaptive Nearest Neighbor Search using Static Features Only

no code implementations2 Oct 2021 Kaixiang Yang, Hongya Wang, Bo Xu, Wei Wang, Yingyuan Xiao, Ming Du, Junfeng Zhou

In the middle of query execution, AdaptNN collects a number of runtime features and predicts termination condition for each individual query, by which better end-to-end latency is attained.

feature selection Information Retrieval +2

A Simple Unified Framework for Anomaly Detection in Deep Reinforcement Learning

no code implementations21 Sep 2021 Hongming Zhang, Ke Sun, Bo Xu, Linglong Kong, Martin Müller

In this paper, we propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribution~(OOD) state outliers.

Anomaly Detection Atari Games +3

Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning

no code implementations15 Jun 2021 Duzhen Zhang, Tielin Zhang, Shuncheng Jia, Xiang Cheng, Bo Xu

Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding.

OpenAI Gym reinforcement-learning +1

WASE: Learning When to Attend for Speaker Extraction in Cocktail Party Environments

1 code implementation13 Jun 2021 Yunzhe Hao, Jiaming Xu, Peng Zhang, Bo Xu

In the speaker extraction problem, it is found that additional information from the target speaker contributes to the tracking and extraction of the target speaker, which includes voiceprint, lip movement, facial expression, and spatial information.

Action Detection Activity Detection

Counterfactual Supporting Facts Extraction for Explainable Medical Record Based Diagnosis with Graph Network

1 code implementation NAACL 2021 Haoran Wu, Wei Chen, Shuang Xu, Bo Xu

Specifically, we first structure the sequence of EMR into a hierarchical graph network and then obtain the causal relationship between multi-granularity features and diagnosis results through counterfactual intervention on the graph.

counterfactual

Graph Force Learning

no code implementations7 Mar 2021 Ke Sun, Jiaying Liu, Shuo Yu, Bo Xu, Feng Xia

Features representation leverages the great power in network analysis tasks.

Graph Learning

Network Representation Learning: From Traditional Feature Learning to Deep Learning

no code implementations7 Mar 2021 Ke Sun, Lei Wang, Bo Xu, Wenhong Zhao, Shyh Wei Teng, Feng Xia

Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data.

Recommendation Systems Representation Learning

MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition

no code implementations25 Feb 2021 Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu

In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

1 code implementation20 Jan 2021 Fei Du, Bo Xu, Jiasheng Tang, Yuqi Zhang, Fan Wang, Hao Li

We extend the classical tracking-by-detection paradigm to this tracking-any-object task.

Ranked #7 on Multi-Object Tracking on TAO (using extra training data)

Multi-Object Tracking Object

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

no code implementations17 Jan 2021 Cheng Yi, Shiyu Zhou, Bo Xu

In this work, we fuse a pre-trained acoustic encoder (wav2vec2. 0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Research on Fast Text Recognition Method for Financial Ticket Image

no code implementations5 Jan 2021 Fukang Tian, Haiyu Wu, Bo Xu

At present, a few works have applied deep learning methods to financial ticket recognition.

Region Proposal

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

no code implementations22 Dec 2020 Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages.

speech-recognition Speech Recognition

CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

no code implementations17 Dec 2020 Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu

End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become the mainstream.

speech-recognition Speech Recognition

Research on All-content Text Recognition Method for Financial Ticket Image

no code implementations15 Dec 2020 Fukang Tian, Haiyu Wu, Bo Xu

With the development of the economy, the number of financial tickets increases rapidly.

Text Detection

Knowledge Aware Emotion Recognition in Textual Conversations via Multi-Task Incremental Transformer

no code implementations COLING 2020 Duzhen Zhang, Xiuyi Chen, Shuang Xu, Bo Xu

For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance.

Emotion Recognition Graph Attention +3

Audio-visual Speech Separation with Adversarially Disentangled Visual Representation

no code implementations29 Nov 2020 Peng Zhang, Jiaming Xu, Jing Shi, Yunzhe Hao, Bo Xu

In our model, we use the face detector to detect the number of speakers in the scene and use visual information to avoid the permutation problem.

Speech Separation

Financial ticket intelligent recognition system based on deep learning

no code implementations29 Oct 2020 Fukang Tian, Haiyu Wu, Bo Xu

Facing the rapid growth in the issuance of financial tickets (or bills, invoices etc.

Self-Learning

Tuning Convolutional Spiking Neural Network with Biologically-plausible Reward Propagation

1 code implementation9 Oct 2020 Tielin Zhang, Shuncheng Jia, Xiang Cheng, Bo Xu

The performance of the proposed BRP-SNN is further verified on the spatial (including MNIST and Cifar-10) and temporal (including TIDigits and DvsGesture) tasks, where the SNN using BRP has reached a similar accuracy compared to other state-of-the-art BP-based SNNs and saved 50% more computational cost than ANNs.

Finite Meta-Dynamic Neurons in Spiking Neural Networks for Spatio-temporal Learning

no code implementations7 Oct 2020 Xiang Cheng, Tielin Zhang, Shuncheng Jia, Bo Xu

Spiking Neural Networks (SNNs) have incorporated more biologically-plausible structures and learning principles, hence are playing critical roles in bridging the gap between artificial and natural neural networks.

Consecutive Decoding for Speech-to-text Translation

1 code implementation21 Sep 2020 Qianqian Dong, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, Lei LI

The key idea is to generate source transcript and target translation text with a single decoder.

Machine Translation speech-recognition +3

Shifu2: A Network Representation Learning Based Model for Advisor-advisee Relationship Mining

no code implementations17 Aug 2020 Jiaying Liu, Feng Xia, Lei Wang, Bo Xu, Xiangjie Kong, Hanghang Tong, Irwin King

The advisor-advisee relationship represents direct knowledge heritage, and such relationship may not be readily available from academic libraries and search engines.

Representation Learning

MODEL: Motif-based Deep Feature Learning for Link Prediction

no code implementations9 Aug 2020 Lei Wang, Jing Ren, Bo Xu, Jian-Xin Li, Wei Luo, Feng Xia

Link prediction plays an important role in network analysis and applications.

Link Prediction

DINE: A Framework for Deep Incomplete Network Embedding

no code implementations9 Aug 2020 Ke Hou, Jiaying Liu, Yin Peng, Bo Xu, Ivan Lee, Feng Xia

Empirically, we evaluate DINE over three networks on multi-label classification and link prediction tasks.

General Classification Link Prediction +3

Speaker-Conditional Chain Model for Speech Separation and Extraction

no code implementations25 Jun 2020 Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu

With the predicted speaker information from whole observation, our model is helpful to solve the problem of conventional speech separation and speaker extraction for multi-round long recordings.

Audio and Speech Processing Sound

Deep Learning Guided Building Reconstruction from Satellite Imagery-derived Point Clouds

no code implementations19 May 2020 Bo Xu, Xu Zhang, Zhixin Li, Matt Leotta, Shih-Fu Chang, Jie Shan

For points that belong to the same roof shape, a multi-cue, hierarchical RANSAC approach is proposed for efficient and reliable segmenting and reconstructing the building point cloud.

3D Reconstruction

Discriminative Multi-modality Speech Recognition

2 code implementations CVPR 2020 Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.

Ranked #6 on Audio-Visual Speech Recognition on LRS3-TED (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

1 code implementation18 Dec 2019 Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie zhou

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image.

Multimodal Reasoning Visual Dialog

Few-Features Attack to Fool Machine Learning Models through Mask-Based GAN

no code implementations12 Nov 2019 Feng Chen, Yunkai Shang, Bo Xu, Jincheng Hu

In comparison with the previous non-learning adversarial example attack approaches, the GAN-based adversarial attack example approach can generate the adversarial samples quickly using the GAN architecture every time facing a new sample after training, but meanwhile needs to perturb the attack samples in great quantities, which results in the unpractical application in reality.

Adversarial Attack BIG-bench Machine Learning

Unsupervised pre-training for sequence to sequence speech recognition

no code implementations28 Oct 2019 Zhiyun Fan, Shiyu Zhou, Bo Xu

The unsupervised pre-training is finished on AISHELL-2 dataset and we apply the pre-trained model to multiple paired data ratios of AISHELL-1 and HKUST.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Iterative Update and Unified Representation for Multi-Agent Reinforcement Learning

no code implementations16 Aug 2019 Jiancheng Long, Hongming Zhang, Tianyang Yu, Bo Xu

In this method, iterative update can greatly alleviate the nonstationarity of the environment, unified representation can speed up the interaction with environment and avoid the linear growth of memory usage.

Multi-agent Reinforcement Learning reinforcement-learning +1

A Working Memory Model for Task-oriented Dialog Response Generation

no code implementations ACL 2019 Xiuyi Chen, Jiaming Xu, Bo Xu

Our WMM2Seq adopts a working memory to interact with two separated long-term memories, which are the episodic memory for memorizing dialog history and the semantic memory for storing KB tuples.

Response Generation World Knowledge

The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding

no code implementations NAACL 2019 Yiqun Yao, Jiaming Xu, Bo Xu

Visual Dialog is a multi-modal task that requires a model to participate in a multi-turn human dialog grounded on an image, and generate correct, human-like responses.

General Knowledge Visual Dialog

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

2 code implementations27 May 2019 Linhao Dong, Bo Xu

In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction.

Language Modelling Multi-Task Learning +2

Reference-Based Sequence Classification

no code implementations17 May 2019 Zengyou He, Guangyao Xu, Chaohua Sheng, Bo Xu, Quan Zou

By utilizing this framework as a tool, we propose new sequence classification algorithms that are quite different from existing solutions.

Classification General Classification

Material Segmentation of Multi-View Satellite Imagery

no code implementations17 Apr 2019 Matthew Purri, Jia Xue, Kristin Dana, Matthew Leotta, Dan Lipsa, Zhixin Li, Bo Xu, Jie Shan

The residuals are computed by differencing the sparse-sampled reflectance function with a dictionary of pre-defined dense-sampled reflectance functions.

Material Recognition Segmentation +1

Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping

no code implementations18 Feb 2019 Linhao Dong, Feng Wang, Bo Xu

Experiments on two Mandarin ASR datasets show the replacement of RNNs by the self-attention networks yields a 8. 4%-10. 2% relative character error rate (CER) reduction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Biologically Plausible Supervised Learning Method for Spiking Neural Networks Using the Symmetric STDP Rule

1 code implementation17 Dec 2018 Yunzhe Hao, Xuhui Huang, Meng Dong, Bo Xu

By combining the sym-STDP rule with bio-plausible synaptic scaling and intrinsic plasticity of the dynamic threshold, our SNN model implemented SL well and achieved good performance in the benchmark recognition task (MNIST dataset).

Concept Learning through Deep Reinforcement Learning with Memory-Augmented Neural Networks

no code implementations15 Nov 2018 Jing Shi, Jiaming Xu, Yiqun Yao, Bo Xu

In this paper, we present a memory-augmented neural network which is motivated by the process of human concept learning.

One-Shot Learning Outlier Detection +2

WECA: A WordNet-Encoded Collocation-Attention Network for Homographic Pun Recognition

no code implementations EMNLP 2018 Yufeng Diao, Hongfei Lin, Di wu, Liang Yang, Kan Xu, Zhihao Yang, Jian Wang, Shaowu Zhang, Bo Xu, Dongyu Zhang

In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns.

Semi-Supervised Disfluency Detection

no code implementations COLING 2018 Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, Bo Xu

While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity.

Generative Adversarial Network Machine Translation +1

Construction of a Chinese Corpus for the Analysis of the Emotionality of Metaphorical Expressions

no code implementations ACL 2018 Dongyu Zhang, Hongfei Lin, Liang Yang, Shaowu Zhang, Bo Xu

However, there is little research on the construction of metaphor corpora annotated with emotion for the analysis of emotionality of metaphorical expressions.

Emotion Recognition

Single-channel Speech Dereverberation via Generative Adversarial Training

no code implementations25 Jun 2018 Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu

In this paper, we propose a single-channel speech dereverberation system (DeReGAT) based on convolutional, bidirectional long short-term memory and deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT).

Speech Dereverberation

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

no code implementations12 Jun 2018 Shiyu Zhou, Shuang Xu, Bo Xu

Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can obtain relatively 10. 5\% average word error rate (WER) reduction compared to SHL-MLSTM with residual learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition

3 code implementations4 Jun 2018 Fenfen Sheng, Zhineng Chen, Bo Xu

Considering scene image has large variation in text and background, we further design a modality-transform block to effectively transform 2D input images to 1D sequences, combined with the encoder to extract more discriminative features.

Optical Character Recognition (OCR) Scene Text Recognition

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

no code implementations16 May 2018 Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Experiments on HKUST datasets demonstrate that the lexicon free modeling units can outperform lexicon related modeling units in terms of character error rate (CER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

1 code implementation28 Apr 2018 Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Unsupervised Neural Machine Translation with Weight Sharing

1 code implementation ACL 2018 Zhen Yang, Wei Chen, Feng Wang, Bo Xu

Unsupervised neural machine translation (NMT) is a recently proposed approach for machine translation which aims to train the model without using any labeled data.

Machine Translation NMT +2

Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

1 code implementation IJCNLP 2017 Chunqi Wang, Bo Xu

The first is that they heavily rely on manually designed bigram feature, i. e. they are not good at capturing n-gram features automatically.

Chinese Word Segmentation Feature Engineering +1

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

3 code implementations NAACL 2018 Zhen Yang, Wei Chen, Feng Wang, Bo Xu

During training, both the dynamic discriminator and the static BLEU objective are employed to evaluate the generated sentences and feedback the evaluations to guide the learning of the generator.

Machine Translation NMT +2

Combining Lexical and Semantic-based Features for Answer Sentence Selection

no code implementations WS 2016 Jing Shi, Jiaming Xu, Yiqun Yao, Suncong Zheng, Bo Xu

As the result of the evaluation shows, our solution provides a valuable and brief model which could be used in modelling question answering or sentence semantic relevance.

Feature Engineering Open-Domain Question Answering +1

A Character-Aware Encoder for Neural Machine Translation

no code implementations COLING 2016 Zhen Yang, Wei Chen, Feng Wang, Bo Xu

This article proposes a novel character-aware neural machine translation (NMT) model that views the input sequences as sequences of characters rather than words.

Machine Translation NMT +1

Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

3 code implementations COLING 2016 Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, Bo Xu

To integrate the features on both dimensions of the matrix, this paper explores applying 2D max pooling operation to obtain a fixed-length representation of the text.

Binary Classification General Classification +2

Hierarchical Memory Networks for Answer Selection on Unknown Words

1 code implementation COLING 2016 Jiaming Xu, Jing Shi, Yiqun Yao, Suncong Zheng, Bo Xu

Recently, end-to-end memory networks have shown promising results on Question Answering task, which encode the past facts into an explicit memory and perform reasoning ability by making multiple computational steps on the memory.

Answer Selection Sentence

Convolutional Neural Networks for Text Hashing

no code implementations IJCAI 2015 Jiaming Xu, PengWang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, HongWei Hao

Meanwhile word features and position features are together fed into a convolutional network to learn the implicit features which are further incorporated with the explicit features to fit the pretrained binary code.

Cannot find the paper you are looking for? You can Submit a new open access paper.