Search Results for author: Jian Wu

Found 150 papers, 61 papers with code

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.

Denoising Self-Supervised Learning +3

18,485

Paper
Code

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

8 code implementations • 14 Apr 2019 • Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, Peng Jiang

To address this problem, we train the bidirectional model using the Cloze task, predicting the masked items in the sequence by jointly conditioning on their left and right context.

Ranked #2 on Recommendation Systems on MovieLens 1M (HR@10 (full corpus) metric)

Sequential Recommendation

4,116

Paper
Code

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

7 code implementations • Interspeech 2020 • Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality.

Ranked #5 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge (PESQ-NB metric)

Speech Enhancement Audio and Speech Processing Sound

2,123

Paper
Code

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

6 code implementations • 19 Apr 2020 • Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, Jian Wu

UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation.

Ranked #1 on Medical Image Segmentation on LiTS2017

Decoder Image Segmentation +3

623

Paper
Code

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

3 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu

We integrate the proposed methods into the HuBERT framework.

Data Augmentation Multi-Task Learning +5

392

Paper
Code

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

2 code implementations • NeurIPS 2016 • Jian Wu, Peter I. Frazier

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e. g. when evaluating the performances of several different neural network architectures in a parallel computing environment.

Bayesian Optimization

255

Paper
Code

Bayesian Optimization with Gradients

1 code implementation • NeurIPS 2017 • Jian Wu, Matthias Poloczek, Andrew Gordon Wilson, Peter I. Frazier

Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions.

Bayesian Optimization

255

Paper
Code

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

1 code implementation • 6 Apr 2022 • Zhuangwei Shi, Yang Hu, Guangliang Mo, Jian Wu

Due to the complex volatility of the stock market, the research and prediction on the change of the stock price, can avoid the risk for the investors.

Stock Prediction Time Series +1

234

Paper
Code

Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models

2 code implementations • 6 Apr 2024 • Songtao Jiang, Yan Zhang, Chenyi Zhou, Yeying Jin, Yang Feng, Jian Wu, Zuozhu Liu

In this paper, we present a novel approach, Joint Visual and Text Prompting (VTPrompt), that employs fine-grained visual information to enhance the capability of MLLMs in VQA, especially for object-oriented perception.

Object Question Answering +1

213

Paper
Code

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

Ranked #1 on Pedestrian Image Caption on CUHK-PEDES

3D Human Pose Estimation Action Recognition +8

209

Paper
Code

X2CT-GAN: Reconstructing CT from Biplanar X-Rays with Generative Adversarial Networks

1 code implementation • CVPR 2019 • Xingde Ying, Heng Guo, Kai Ma, Jian Wu, Zheng-Xin Weng, Yefeng Zheng

Computed tomography (CT) can provide a 3D view of the patient's internal organs, facilitating disease diagnosis, but it incurs more radiation dose to a patient and a CT scanner is much more cost prohibitive than an X-ray machine too.

Computed Tomography (CT) Generative Adversarial Network

139

Paper
Code

Continuous speech separation: dataset and analysis

1 code implementation • 30 Jan 2020 • Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

129

Paper
Code

CenterNet3D: An Anchor Free Object Detector for Point Cloud

2 code implementations • 13 Jul 2020 • Guojun Wang, Jian Wu, Bin Tian, Siyu Teng, Long Chen, Dongpu Cao

However, because inherent sparsity of point clouds, 3D object center points are likely to be in empty space which makes it difficult to estimate accurate boundaries.

3D Object Detection Autonomous Driving +3

118

Paper
Code

Continuous Speech Separation with Conformer

1 code implementation • 13 Aug 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

Ranked #1 on Speech Separation on LibriCSS (using extra training data)

Speech Separation

103

Paper
Code

Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

1 code implementation • 12 Aug 2020 • Haohe Liu, Lei Xie, Jian Wu, Geng Yang

We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands.

Audio and Speech Processing Sound

Paper
Code

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation

1 code implementation • 6 May 2022 • Zui Chen, Yansen Jing, Shengcheng Yuan, Yifei Xu, Jian Wu, Hang Zhao

Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design.

Audio Classification Audio Signal Processing

Paper
Code

Electrocardio Panorama: Synthesizing New ECG Views with Self-supervision

1 code implementation • 12 May 2021 • Jintai Chen, Xiangshang Zheng, Hongyun Yu, Danny Z. Chen, Jian Wu

For the first time, we propose a new concept, Electrocardio Panorama, which allows visualizing ECG signals from any queried viewpoints.

Self-Supervised Learning

Paper
Code

Large Window-based Mamba UNet for Medical Image Segmentation: Beyond Convolution and Self-attention

1 code implementation • 12 Mar 2024 • Jinhong Wang, Jintai Chen, Danny Chen, Jian Wu

In this paper, we introduce a Large Window-based Mamba U}-shape Network, or LMa-UNet, for 2D and 3D medical image segmentation.

Image Segmentation Long-range modeling +2

Paper
Code

ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data

1 code implementation • 7 Jan 2023 • Jintai Chen, Jiahuan Yan, Danny Ziyi Chen, Jian Wu

Though deep neural networks have gained enormous successes in various fields (e. g., computer vision) with supervised learning, they have so far been still trailing after the performances of GBDTs on tabular data.

Paper
Code

T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction

1 code implementation • 30 Nov 2022 • Jiahuan Yan, Jintai Chen, Yixuan Wu, Danny Z. Chen, Jian Wu

Recent development of deep neural networks (DNNs) for tabular learning has largely benefited from the capability of DNNs for automatic feature interaction.

Relation

Paper
Code

DANets: Deep Abstract Networks for Tabular Data Classification and Regression

1 code implementation • 6 Dec 2021 • Jintai Chen, Kuanlun Liao, Yao Wan, Danny Z. Chen, Jian Wu

A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.

regression

Paper
Code

Multi-View Adaptive Fusion Network for 3D Object Detection

1 code implementation • 2 Nov 2020 • Guojun Wang, Bin Tian, Yachen Zhang, Long Chen, Dongpu Cao, Jian Wu

3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.

3D Object Detection Autonomous Driving +3

Paper
Code

Towards Distribution-Agnostic Generalized Category Discovery

1 code implementation • NeurIPS 2023 • Jianhong Bai, Zuozhu Liu, Hualiang Wang, Ruizhe Chen, Lianrui Mu, Xiaomeng Li, Joey Tianyi Zhou, Yang Feng, Jian Wu, Haoji Hu

In this paper, we formally define a more realistic task as distribution-agnostic generalized category discovery (DA-GCD): generating fine-grained predictions for both close- and open-set classes in a long-tailed open-world setting.

Contrastive Learning Transfer Learning

Paper
Code

A Hierarchical Recurrent Neural Network for Symbolic Melody Generation

2 code implementations • 14 Dec 2017 • Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, Jun Zhu

In this paper, we present a hierarchical recurrent neural network for melody generation, which consists of three Long-Short-Term-Memory (LSTM) subnetworks working in a coarse-to-fine manner along time.

Sound Multimedia

Paper
Code

IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

1 code implementation • 4 Nov 2020 • Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, DongYan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

In this challenge, we open source a sizable speech, keyword, echo and noise corpus for promoting data-driven methods, particularly deep-learning approaches on KWS and SSL.

Sound Audio and Speech Processing

Paper
Code

TSegFormer: 3D Tooth Segmentation in Intraoral Scans with Geometry Guided Transformer

1 code implementation • 22 Nov 2023 • Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, Jin Hao, Haochao Ying, Jian Wu, Zuozhu Liu

Optical Intraoral Scanners (IOS) are widely used in digital dentistry to provide detailed 3D information of dental crowns and the gingiva.

Paper
Code

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

1 code implementation • 2 Feb 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

1 code implementation • 30 Mar 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

Sample-efficient Multi-objective Molecular Optimization with GFlowNets

1 code implementation • NeurIPS 2023 • Yiheng Zhu, Jialu Wu, Chaowen Hu, Jiahuan Yan, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as a black-box optimization problem over the discrete chemical space.

Bayesian Optimization

Paper
Code

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation

1 code implementation • 3 Jan 2022 • Yixuan Wu, Kuanlun Liao, Jintai Chen, Jinhong Wang, Danny Z. Chen, Honghao Gao, Jian Wu

In this paper, we propose a new method called Dilated Transformer, which conducts self-attention for pair-wise patch relations captured alternately in local and global scopes.

Decoder Image Segmentation +3

Paper
Code

DialMed: A Dataset for Dialogue-based Medication Recommendation

1 code implementation • COLING 2022 • Zhenfeng He, Yuqiang Han, Zhenqiu Ouyang, Wei Gao, Hongxu Chen, Guandong Xu, Jian Wu

Therefore, we make the first attempt to recommend medications with the conversations between doctors and patients.

Graph Attention

Paper
Code

Robust Training of Graph Neural Networks via Noise Governance

1 code implementation • 12 Nov 2022 • Siyi Qian, Haochao Ying, Renjun Hu, Jingbo Zhou, Jintai Chen, Danny Z. Chen, Jian Wu

To address these issues, we propose a novel RTGNN (Robust Training of Graph Neural Networks via Noise Governance) framework that achieves better robustness by learning to explicitly govern label noise.

Memorization

Paper
Code

Making Pre-trained Language Models Great on Tabular Prediction

1 code implementation • 4 Mar 2024 • Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng Sun, Jian Wu, Jintai Chen

Condensing knowledge from diverse domains, language models (LMs) possess the capability to comprehend feature names from various tables, potentially serving as versatile learners in transferring knowledge across distinct tables and diverse prediction tasks, but their discrete text representation space is inherently incompatible with numerical feature values in tables.

Paper
Code

Personalized Re-ranking for Recommendation

1 code implementation • 15 Apr 2019 • Changhua Pei, Yi Zhang, Yongfeng Zhang, Fei Sun, Xiao Lin, Hanxiao Sun, Jian Wu, Peng Jiang, Wenwu Ou

Ranking is a core task in recommender systems, which aims at providing an ordered list of items to users.

Recommendation Systems Re-Ranking

Paper
Code

MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation

1 code implementation • 15 May 2023 • Yiheng Zhu, Zhenqiu Ouyang, Ben Liao, Jialu Wu, Yixuan Wu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models.

Graph Generation Molecular Graph Generation +1

Paper
Code

Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification

1 code implementation • 28 Nov 2023 • Jiahuan Yan, Haojun Gao, Zhang Kai, Weize Liu, Danny Chen, Jian Wu, Jintai Chen

Deep learning approaches exhibit promising performances on various text tasks.

imbalanced classification text-classification +1

Paper
Code

Extractive Research Slide Generation Using Windowed Labeling Ranking

1 code implementation • NAACL (sdp) 2021 • Athar Sefid, Jian Wu, Prasenjit Mitra, Lee Giles

Presentation slides describing the content of scientific and technical papers are an efficient and effective way to present that work.

Extractive Summarization Sentence

Paper
Code

Identifying Electrocardiogram Abnormalities Using a Handcrafted-Rule-Enhanced Neural Network

1 code implementation • 16 Jun 2022 • Yuexin Bian, Jintai Chen, Xiaojun Chen, Xiaoxian Yang, Danny Z. Chen, Jian Wu

Automatic ECG classification methods, especially the deep learning based ones, have been proposed to detect cardiac abnormalities using ECG records, showing good potential to improve clinical diagnosis and help early prevention of cardiovascular diseases.

Clinical Knowledge ECG Classification

Paper
Code

Improving LLM-based Machine Translation with Systematic Self-Correction

1 code implementation • 26 Feb 2024 • Zhaopeng Feng, Yan Zhang, Hao Li, Wenqiang Liu, Jun Lang, Yang Feng, Jian Wu, Zuozhu Liu

Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT).

Machine Translation Translation

Paper
Code

Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets

1 code implementation • 20 Jun 2019 • Athar Sefid, Jian Wu, Allen C. Ge, Jing Zhao, Lu Liu, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles

We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset.

Blocking Information Retrieval +2

Paper
Code

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

1 code implementation • NeurIPS 2023 • Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, Jin Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu

Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks.

Paper
Code

FedLoGe: Joint Local and Generic Federated Learning under Long-tailed Data

1 code implementation • 17 Jan 2024 • Zikai Xiao, Zihan Chen, Liyinglan Liu, Yang Feng, Jian Wu, Wanlu Liu, Joey Tianyi Zhou, Howard Hao Yang, Zuozhu Liu

Federated Long-Tailed Learning (Fed-LT), a paradigm wherein data collected from decentralized local clients manifests a globally prevalent long-tailed distribution, has garnered considerable attention in recent times.

Personalized Federated Learning Representation Learning

Paper
Code

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

1 code implementation • 4 Feb 2024 • Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin

Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data.

Inductive Bias

Paper
Code

Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations

2 code implementations • 1 Jul 2021 • Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Our experiments show that CRF with visual features outperformed both a heuristic and a CRF model with only text-based features.

Ranked #1 on Key Information Extraction on ETD500

Key Information Extraction Optical Character Recognition (OCR)

Paper
Code

MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries

1 code implementation • 30 Mar 2023 • Muntabir Hasan Choudhury, Lamia Salsabil, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Metadata quality is crucial for digital objects to be discovered through digital library interfaces.

Metadata quality

Paper
Code

ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations

1 code implementation • 7 Nov 2023 • Muntabir Hasan Choudhury, Lamia Salsabil, William A. Ingram, Edward A. Fox, Jian Wu

To overcome the challenge of imbalanced labeled samples, we augmented data for minority categories and employed a hierarchical classifier.

Navigate

Paper
Code

A Novel Correlation-optimized Deep Learning Method for Wind Speed Forecast

1 code implementation • 3 Jun 2023 • Yang Yang, Jin Lang, Jian Wu, Yanyan Zhang, Xiang Zhao

Finally, the effectiveness of the proposed method is verified by three wind prediction cases from a wind farm in Liaoning, China.

Paper
Code

Improving Automatic Source Code Summarization via Deep Reinforcement Learning

2 code implementations • 17 Nov 2018 • Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, Philip S. Yu

To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization, b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given.

Code Summarization Decoder +4

Paper
Code

Ord2Seq: Regarding Ordinal Regression as Label Sequence Prediction

1 code implementation • ICCV 2023 • Jinhong Wang, Yi Cheng, Jintai Chen, Tingting Chen, Danny Chen, Jian Wu

In this way, we decompose an ordinal regression task into a series of recursive binary classification steps, so as to subtly distinguish adjacent categories.

Binary Classification regression

Paper
Code

TabCaps: A Capsule Neural Network for Tabular Data Classification with BoW Routing

1 code implementation • ICLR 2023 • Jintai Chen, Kuanlun Liao, Yanwen Fang, Danny Chen, Jian Wu

In this paper, we propose to encapsulate all feature values of a record into vectorial features and process them collectively rather than have to deal with individual ones, which directly captures the representations at the data level and benefits robust performances.

Paper
Code

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

1 code implementation • 7 Sep 2023 • Sai Koneru, Jian Wu, Sarah Rajtmajer

Hypothesis formulation and testing are central to empirical research.

Paper
Code

Acknowledgement Entity Recognition in CORD-19 Papers

1 code implementation • EMNLP (sdp) 2020 • Jian Wu, Pei Wang, Xin Wei, Sarah Rajtmajer, C. Lee Giles, Christopher Griffin

We built a supplementary database by linking CORD-19 papers with acknowledgement entities extracted by AckExtract including persons and organizations and find that only up to 50–60% of named entities are actually acknowledged.

Sentence

Paper
Code

Online Deep Learning from Doubly-Streaming Data

1 code implementation • 25 Apr 2022 • Heng Lian, John Scovil Atwood, BoJian Hou, Jian Wu, Yi He

This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away.

Paper
Code

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

1 code implementation • 7 Nov 2023 • Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, Diane Oyen

CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meaningful captions on sketched images often included in scientific and technical documents.

3D Reconstruction Benchmarking +4

Paper
Code

AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks

1 code implementation • 15 Dec 2021 • Ruiwei Feng, Yufeng Xie, Minshan Lai, Danny Z. Chen, Ji Cao, Jian Wu

Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine.

Drug Response Prediction

Paper
Code

Robust Image Ordinal Regression with Controllable Image Generation

1 code implementation • 7 May 2023 • Yi Cheng, Haochao Ying, Renjun Hu, Jinhong Wang, Wenhao Zheng, Xiao Zhang, Danny Chen, Jian Wu

Image ordinal regression has been mainly studied along the line of exploiting the order of categories.

Image Generation regression

Paper
Code

Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models

1 code implementation • 15 Nov 2023 • Weize Liu, Guocong Li, Kai Zhang, Bang Du, Qiyuan Chen, Xuming Hu, Hongxia Xu, Jintai Chen, Jian Wu

While techniques such as chain-of-thought (CoT) distillation have displayed promise in distilling LLMs into small language models (SLMs), there is a risk that distilled SLMs may still inherit flawed reasoning and hallucinations from LLMs.

Transfer Learning

Paper
Code

Personalized Heart Disease Detection via ECG Digital Twin Generation

1 code implementation • 17 Apr 2024 • Yaojun Hu, Jintai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention.

Management

Paper
Code

Can citations tell us about a paper's reproducibility? A case study of machine learning papers

1 code implementation • 7 May 2024 • Rochana R. Obadage, Sarah M. Rajtmajer, Jian Wu

The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature.

Sentiment Analysis

Paper
Code

Learned Neural Iterative Decoding for Lossy Image Compression Systems

no code implementations • 15 Mar 2018 • Alexander G. Ororbia, Ankur Mali, Jian Wu, Scott O'Connell, David Miller, C. Lee Giles

For lossy image compression systems, we develop an algorithm, iterative refinement, to improve the decoder's reconstruction compared to standard decoding techniques.

Decoder Image Compression

Paper
Add Code

Discretization-free Knowledge Gradient Methods for Bayesian Optimization

no code implementations • 20 Jul 2017 • Jian Wu, Peter I. Frazier

This paper studies Bayesian ranking and selection (R&S) problems with correlated prior beliefs and continuous domains, i. e. Bayesian optimization (BO).

Bayesian Optimization

Paper
Add Code

Multi-modal Fusion for Diabetes Mellitus and Impaired Glucose Regulation Detection

no code implementations • 12 Apr 2016 • Jinxing Li, David Zhang, Yongcheng Li, Jian Wu

has proved that tongue, face and sublingual diagnosis as a noninvasive method is a reasonable way for disease detection.

Multi-modal Classification

Paper
Add Code

Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training

no code implementations • 12 Nov 2018 • Yao Wan, Wenqiang Yan, Jianwei Gao, Zhou Zhao, Jian Wu, Philip S. Yu

Dialogue Act (DA) classification is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention.

Ranked #5 on Dialogue Act Classification on Switchboard corpus

Classification Dialogue Act Classification +3

Paper
Add Code

Continuous-fidelity Bayesian Optimization with Knowledge Gradient

no code implementations • ICLR 2018 • Jian Wu, Peter I. Frazier

While Bayesian optimization (BO) has achieved great success in optimizing expensive-to-evaluate black-box functions, especially tuning hyperparameters of neural networks, methods such as random search (Li et al., 2016) and multi-fidelity BO (e. g. Klein et al. (2017)) that exploit cheap approximations, e. g. training on a smaller training data or with fewer iterations, can outperform standard BO approaches that use only full-fidelity observations.

Bayesian Optimization

Paper
Add Code

Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based Tibetan Word Segmentation

no code implementations • IJCNLP 2015 • Minghua Nuo, Huidan Liu, Congjun Long, Jian Wu

Language Modelling

Paper
Add Code

Building Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages

no code implementations • WS 2012 • Huidan Liu, Minghua Nuo, Jian Wu, Yeping He

Paper
Add Code

Zipf's Law and Statistical Data on Modern Tibetan

no code implementations • COLING 2014 • Huidan Liu, Minghua Nuo, Jian Wu

Paper
Add Code

Tibetan Base Noun Phrase Identification Framework Based on Chinese-Tibetan Sentence Aligned Corpus

no code implementations • COLING 2012 • Ming Hua Nuo, Hui Dan Liu, Wei Na Zhao, Long Long Ma, Jian Wu, Zhi Ming Ding

Sentence Word Alignment

Paper
Add Code

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

no code implementations • 12 Mar 2019 • Jian Wu, Saul Toscano-Palmerin, Peter I. Frazier, Andrew Gordon Wilson

Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck.

Bayesian Optimization

Paper
Add Code

End-to-End Multi-Channel Speech Separation

no code implementations • 15 May 2019 • Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.

Speech Separation

Paper
Add Code

A comprehensive study of speech separation: spectrogram vs waveform separation

no code implementations • 17 May 2019 • Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.

speech-recognition Speech Recognition +1

Paper
Add Code

Dunhuang Grottoes Painting Dataset and Benchmark

no code implementations • 10 Jul 2019 • Tianxiu Yu, Shijie Zhang, Cong Lin, ShaoDi You, Jian Wu, Jiawan Zhang, Xiaohong Ding, Huili An

Follow the trend, we release the first public dataset for Dunhuang Grotto Painting restoration.

Paper
Add Code

Privileged Features Distillation at Taobao Recommendations

no code implementations • 11 Jul 2019 • Chen Xu, Quan Li, Junfeng Ge, Jinyang Gao, Xiaoyong Yang, Changhua Pei, Fei Sun, Jian Wu, Hanxiao Sun, Wenwu Ou

To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available.

Paper
Add Code

Practical Two-Step Lookahead Bayesian Optimization

no code implementations • NeurIPS 2019 • Jian Wu, Peter Frazier

Expected improvement and other acquisition functions widely used in Bayesian optimization use a "one-step" assumption: they value objective function evaluations assuming no future evaluations will be performed.

Bayesian Optimization Vocal Bursts Valence Prediction

Paper
Add Code

Method and Dataset Mining in Scientific Papers

no code implementations • 29 Nov 2019 • Rujing Yao, Linlin Hou, Yingchun Ye, Ou wu, Ji Zhang, Jian Wu

In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers.

Paper
Add Code

Query Auto Completion for Math Formula Search

no code implementations • 9 Dec 2019 • Shaurya Rohatgi, Wei Zhong, Richard Zanibbi, Jian Wu, C. Lee Giles

Query Auto Completion (QAC) is among the most appealing features of a web search engine.

Math

Paper
Add Code

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Ranked #4 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Time Domain Audio Visual Speech Separation

no code implementations • 7 Apr 2019 • Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.

Audio and Speech Processing Sound

Paper
Add Code

Speaker diarization with session-level speaker embedding refinement using graph neural networks

no code implementations • 22 May 2020 • Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session.

Clustering speaker-diarization +1

Paper
Add Code

Large Scale Subject Category Classification of Scholarly Papers with Deep Attentive Neural Networks

no code implementations • 27 Jul 2020 • Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, C. Lee Giles

The results showed the importance of retraining word embedding models to maximize the vocabulary overlap and the effectiveness of the attention mechanism.

General Classification Sentence

Paper
Add Code

An End-to-end Architecture of Online Multi-channel Speech Separation

no code implementations • 7 Sep 2020 • Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

speech-recognition Speech Recognition +1

Paper
Add Code

Echoes in Unidirectionally Rotating Molecules

no code implementations • 15 Aug 2020 • Long Xu, Ilia Tutunnikov, Lianrong Zhou, Kang Lin, Junjie Qiang, Peifen Lu, Yehiam Prior, Ilya Sh. Averbukh, Jian Wu

Abstract We report the experimental observation of molecular unidirectional rotation (UDR) echoes, and analyze their origin and behavior both classically and quantum mechanically.

Optics

Paper
Add Code

Dynamic radiomics: a new methodology to extract quantitative time-related features from tomographic images

no code implementations • 1 Nov 2020 • Fengying Che, Ruichuan Shi, Jian Wu, Haoran Li, Shuqin Li, Weixing Chen, Hao Zhang, Zhi Li, Xiaoyu Cui

The feature extraction methods of radiomics are mainly based on static tomographic images at a certain moment, while the occurrence and development of disease is a dynamic process that cannot be fully reflected by only static characteristics.

Paper
Add Code

Preference Robust Optimization with Quasi-Concave Choice Functions for Multi-Attribute Prospects

no code implementations • 31 Aug 2020 • Jian Wu, William B. Haskell, Wenjie Huang, Huifu Xu

Preference robust choice models concern decision-making problems where the decision maker's (DM) utility/risk preferences are ambiguous and the evaluation is based on the worst-case utility function/risk measure from a set of plausible utility functions/risk measures.

Attribute Decision Making +1

Paper
Add Code

Modeling Updates of Scholarly Webpages Using Archived Data

no code implementations • 7 Dec 2020 • Yasith Jayawardana, Alexander C. Nwala, Gavindya Jayawardena, Jian Wu, Sampath Jayarathna, Michael L. Nelson, C. Lee Giles

The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources.

Paper
Add Code

Speaker attribution with voice profiles by graph-based semi-supervised learning

no code implementations • 6 Feb 2021 • Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno

Speaker attribution is required in many real-world applications, such as meeting transcription, where speaker identity is assigned to each utterance according to speaker voice profiles.

Speaker Identification

Paper
Add Code

Doctor Imitator: Hand-Radiography-based Bone Age Assessment by Imitating Scoring Methods

no code implementations • 10 Feb 2021 • Jintai Chen, Bohan Yu, Biwen Lei, Ruiwei Feng, Danny Z. Chen, Jian Wu

The architecture of DI is designed to learn the diagnostic logistics of doctors using the scoring methods (e. g., the Tanner-Whitehouse method) for bone age assessment.

Anatomy

Paper
Add Code

Flow-Mixup: Classifying Multi-labeled Medical Images with Corrupted Labels

no code implementations • 9 Feb 2021 • Jintai Chen, Hongyun Yu, Ruiwei Feng, Danny Z. Chen, Jian Wu

In clinical practice, medical image interpretation often involves multi-labeled classification, since the affected parts of a patient tend to present multiple symptoms or comorbidities.

Image Classification Medical Image Classification

Paper
Add Code

A Comprehensive Review of Computer-aided Whole-slide Image Analysis: from Datasets to Feature Extraction, Segmentation, Classification, and Detection Approaches

no code implementations • 21 Feb 2021 • Chen Li, Xintong Li, Md Rahaman, Xiaoyan Li, Hongzan Sun, Hong Zhang, Yong Zhang, Xiaoqi Li, Jian Wu, YuDong Yao, Marcin Grzegorzek

This paper reviews the methods of WSI analysis based on machine learning.

BIG-bench Machine Learning Classification +2

Paper
Add Code

Predicting the Reproducibility of Social and Behavioral Science Papers Using Supervised Learning Models

no code implementations • 8 Apr 2021 • Jian Wu, Rajal Nivargi, Sree Sai Teja Lanka, Arjun Manoj Menon, Sai Ajay Modukuri, Nishanth Nakshatri, Xin Wei, Zhuoer Wang, James Caverlee, Sarah M. Rajtmajer, C. Lee Giles

In this paper, we investigate prediction of the reproducibility of SBS papers using machine learning methods based on a set of features.

BIG-bench Machine Learning

Paper
Add Code

Reconstruction Condition of Quantized Signals in Unlimited Sampling Framework

no code implementations • 29 Nov 2020 • Yan He, Jifang Qiu, Chang Liu, Yue Liu, Jian Wu

The latest theoretical advances in the field of unlimited sampling framework (USF) show the potential to avoid clipping problems of analog-to-digital converters (ADC).

Quantization

Paper
Add Code

Document Domain Randomization for Deep Learning Document Layout Extraction

no code implementations • 20 May 2021 • Meng Ling, Jian Chen, Torsten Möller, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Robert S. Laramee, Han-Wei Shen, Jian Wu, C. Lee Giles

We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages to real-world document segmentation.

Document Layout Analysis

Paper
Add Code

ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations

1 code implementation • 23 Jun 2021 • Sampanna Yashwant Kahu, William A. Ingram, Edward A. Fox, Jian Wu

To the best of our knowledge, ScanBank is the first manually annotated dataset for figure and table extraction for scanned ETDs.

Data Augmentation Table Extraction

Paper
Code

Sequence-level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models

no code implementations • 30 Jun 2021 • Amber Afshan, Kshitiz Kumar, Jian Wu

We propose a cost-effective method of using CC scores to select an optimal adaptation data set, where we maximize ASR gains from minimal data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations • 5 Jul 2021 • Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

no code implementations • 6 Jul 2021 • Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations

no code implementations • 4 Sep 2021 • Mingzhi Yu, Diane Litman, Shuang Ma, Jian Wu

Then we use the model to perform similarity measure in a corpus-based entrainment analysis.

Paper
Add Code

Continuous Speech Separation with Recurrent Selective Attention Network

no code implementations • 28 Oct 2021 • Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

speech-recognition Speech Recognition +1

Paper
Add Code

SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning

no code implementations • WOSP 2020 • Chenrui Guo, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu

The tool is built on a Support Vector Machine (SVM) model trained on a set of 7, 058 manually annotated citation context sentences, curated from 34, 000 papers from the ACL Anthology.

Paper
Add Code

PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving

no code implementations • 23 Dec 2021 • Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, Yunlong Wang, Diange Yang

The accelerating development of autonomous driving technology has placed greater demands on obtaining large amounts of high-quality data.

3D Object Detection Autonomous Driving +5

Paper
Add Code

A Synthetic Prediction Market for Estimating Confidence in Published Work

no code implementations • 23 Dec 2021 • Sarah Rajtmajer, Christopher Griffin, Jian Wu, Robert Fraleigh, Laxmaan Balaji, Anna Squicciarini, Anthony Kwasnica, David Pennock, Michael McLaughlin, Timothy Fritton, Nishanth Nakshatri, Arjun Menon, Sai Ajay Modukuri, Rajal Nivargi, Xin Wei, C. Lee Giles

Explainably estimating confidence in published scholarly work offers opportunity for faster and more robust scientific progress.

Paper
Add Code

What Can Machine Vision Do for Lymphatic Histopathology Image Analysis: A Comprehensive Review

no code implementations • 21 Jan 2022 • Xiaoqi Li, HaoYuan Chen, Chen Li, Md Mamunur Rahaman, Xintong Li, Jian Wu, Xiaoyan Li, Hongzan Sun, Marcin Grzegorzek

In the past ten years, the computing power of machine vision (MV) has been continuously improved, and image analysis algorithms have developed rapidly.

Paper
Add Code

Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study

no code implementations • 7 Feb 2022 • Daniel Tompkins, Kshitiz Kumar, Jian Wu

An Xception model reaches state-of-the-art (SOTA) accuracy on the ESC-50 dataset for audio event detection through knowledge transfer from ImageNet weights, pretraining on AudioSet, and an on-the-fly data augmentation pipeline.

Data Augmentation Event Detection +1

Paper
Add Code

A State-of-the-art Survey of U-Net in Microscopic Image Analysis: from Simple Usage to Structure Mortification

no code implementations • 14 Feb 2022 • Jian Wu, Wanli Liu, Chen Li, Tao Jiang, Islam Mohammad Shariful, Hongzan Sun, Xiaoqi Li, Xintong Li, Xinyu Huang, Marcin Grzegorzek

Image analysis technology is used to solve the inadvertences of artificial traditional methods in disease, wastewater treatment, environmental change monitoring analysis and convolutional neural networks (CNN) play an important role in microscopic image analysis.

Image Segmentation Segmentation +1

Paper
Add Code

Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function

no code implementations • 28 Feb 2022 • Haohao Hu, Hexing Yang, Jian Wu, Xiao Lei, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

Since a 3D surface can be usually observed from multiple camera images with different view poses, an optimal image patch selection for the texturing and an optimal semantic class estimation for the semantic mapping are still challenging.

3D Reconstruction

Paper
Add Code

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Computational Efficiency Speech Separation

Paper
Add Code

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Paper
Add Code

SciEv: Finding Scientific Evidence Papers for Scientific News

no code implementations • 30 Apr 2022 • Md Reshad Ul Hoque, Jiang Li, Jian Wu

To our best knowledge, this is the first dataset of this kind.

Paper
Add Code

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

no code implementations • 17 May 2022 • Mostafa Karimi, Changliang Liu, Kenichi Kumatani, Yao Qian, Tianyu Wu, Jian Wu

Self-supervised learning (SSL) methods have proven to be very successful in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

RL-GA: A Reinforcement Learning-Based Genetic Algorithm for Electromagnetic Detection Satellite Scheduling Problem

no code implementations • 12 Jun 2022 • Yanjie Song, Luona Wei, Qing Yang, Jian Wu, Lining Xing, Yingwu Chen

In this way, the search information can be effectively used by the reinforcement learning method.

Q-Learning reinforcement-learning +2

Paper
Add Code

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

no code implementations • 21 Jul 2022 • Jintai Chen, Kuanlun Liao, Kun Wei, Haochao Ying, Danny Z. Chen, Jian Wu

Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases.

Generative Adversarial Network

Paper
Add Code

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

no code implementations • 27 Aug 2022 • Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers.

Action Detection Activity Detection +3

Paper
Add Code

Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages

no code implementations • 8 Sep 2022 • Li Miao, Jian Wu, Piyush Behre, Shuangyu Chang, Sarangarajan Parthasarathy

It is challenging to train and deploy Transformer LMs for hybrid speech recognition 2nd pass re-ranking in low-resource languages due to (1) data scarcity in low-resource languages, (2) expensive computing costs for training and refreshing 100+ monolingual models, and (3) hosting inefficiency considering sparse traffic.

Language Modelling Re-Ranking +2

Paper
Add Code

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

no code implementations • 12 Sep 2022 • Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka

To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Simulating realistic speech overlaps improves multi-talker ASR

no code implementations • 27 Oct 2022 • Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speech separation with large-scale self-supervised learning

no code implementations • 9 Nov 2022 • Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.

Self-Supervised Learning Speech Separation

Paper
Add Code

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

no code implementations • 10 Nov 2022 • Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang

In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion.

Representation Learning Self-Supervised Learning +2

Paper
Add Code

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

no code implementations • 11 Nov 2022 • Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction

1 code implementation • 12 Dec 2022 • Jinhong Wang, Jingwen Wang, Tingting Chen, Wenhao Zheng, Zhe Xu, Xingdi Wu, Wen Xu, Haochao Ying, Danny Chen, Jian Wu

Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed.

regression

Paper
Code

ACL-Fig: A Dataset for Scientific Figure Classification

no code implementations • 28 Jan 2023 • Zeba Karishma, Shaurya Rohatgi, Kavya Shrinivas Puranik, Jian Wu, C. Lee Giles

However, there are no large-scale retrieval services for scientific figures and tables.

Classification Question Answering +1

Paper
Add Code

Improving Transformer-based Networks With Locality For Automatic Speaker Verification

no code implementations • 17 Feb 2023 • Mufan Sang, Yong Zhao, Gang Liu, John H. L. Hansen, Jian Wu

The proposed models achieve 0. 75% EER on VoxCeleb 1 test set, outperforming the previously proposed Transformer-based models and CNN-based models, such as ResNet34 and ECAPA-TDNN.

Speaker Verification

Paper
Add Code

Speaker Change Detection for Transformer Transducer ASR

no code implementations • 16 Feb 2023 • Jian Wu, Zhuo Chen, Min Hu, Xiong Xiao, Jinyu Li

Speaker change detection (SCD) is an important feature that improves the readability of the recognized words from an automatic speech recognition (ASR) system by breaking the word sequence into paragraphs at speaker change points.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Study on Reproducibility and Replicability of Table Structure Recognition Methods

no code implementations • 20 Apr 2023 • Kehinde Ajayi, Muntabhir Hasan Choudhury, Sarah Rajtmajer, Jian Wu

We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers.

Paper
Add Code

TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering

no code implementations • 24 May 2023 • Jian Wu, Yicheng Xu, Yan Gao, Jian-Guang Lou, Börje F. Karlsson, Manabu Okumura

A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence.

Question Answering Retrieval

Paper
Add Code

A Novel Black Box Process Quality Optimization Approach based on Hit Rate

no code implementations • 31 May 2023 • Yang Yang, Jian Wu, Xiangman Song, Derun Wu, Lijie Su, Lixin Tang

However, optimizing hit rate is a non-convex and challenging problem.

Paper
Add Code

On decoder-only architecture for speech-to-text and large language model integration

no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Decoder Language Modelling +2

Paper
Add Code

Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

no code implementations • 11 Aug 2023 • Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad, Kshitiz Kumar, Jian Wu

Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels

no code implementations • 16 Sep 2023 • Yixuan Wu, Jintai Chen, Jiahuan Yan, Yiheng Zhu, Danny Z. Chen, Jian Wu

Since annotating medical images for segmentation tasks commonly incurs expensive costs, it is highly desirable to design an annotation-efficient method to alleviate the annotation burden.

Attribute Contrastive Learning +4

Paper
Add Code

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

no code implementations • 15 Sep 2023 • Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

OneSeg: Self-learning and One-shot Learning based Single-slice Annotation for 3D Medical Image Segmentation

no code implementations • 24 Sep 2023 • Yixuan Wu, Bo Zheng, Jintai Chen, Danny Z. Chen, Jian Wu

As deep learning methods continue to improve medical image segmentation performance, data annotation is still a big bottleneck due to the labor-intensive and time-consuming burden on medical experts, especially for 3D images.

Image Segmentation Medical Image Segmentation +5

Paper
Add Code

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

no code implementations • 8 Oct 2023 • Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv, Hui Miao, Li Guo, Shujun Zhang, Cheng Pei, Xiaojuan Fan, Jianqin Lei, Ting Wei, Junguo Duan, Chun Liu, Xiaobo Xia, Siqi Xiong, Junhong Li, Benny Lo, Yih Chung Tham, Tien Yin Wong, Ningli Wang, Wu Yuan

To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data.

Disease Prediction Representation Learning

Paper
Add Code

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

no code implementations • 3 Nov 2023 • Jing Pan, Jian Wu, Yashesh Gaur, Sunit Sivasankaran, Zhuo Chen, Shujie Liu, Jinyu Li

With fewer than 20M trainable parameters and as little as 450 hours of English speech data for SQA generation, COSMIC exhibits emergent instruction-following and in-context learning capabilities in speech-to-text tasks.

Domain Adaptation In-Context Learning +4

Paper
Add Code

All Data on the Table: Novel Dataset and Benchmark for Cross-Modality Scientific Information Extraction

no code implementations • 14 Nov 2023 • Yuhan Li, Jian Wu, Zhiwei Yu, Börje F. Karlsson, Wei Shen, Manabu Okumura, Chin-Yew Lin

To close this gap in data availability and enable cross-modality IE, while alleviating labeling costs, we propose a semi-supervised pipeline for annotating entities in text, as well as entities and relations in tables, in an iterative procedure.

Paper
Add Code

Jointly Explicit and Implicit Cross-Modal Interaction Network for Anterior Chamber Inflammation Diagnosis

no code implementations • 11 Dec 2023 • Qian Shao, Ye Dai, Haochao Ying, Kan Xu, Jinhong Wang, Wei Chi, Jian Wu

To this end, we propose a jointly Explicit and implicit Cross-Modal Interaction Network (EiCI-Net) for Anterior Chamber Inflammation Diagnosis that uses anterior segment optical coherence tomography (AS-OCT) images, slit-lamp images, and clinical data jointly.

Clinical Knowledge Informativeness

Paper
Add Code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Floorplanning of VLSI by Mixed-Variable Optimization

no code implementations • 27 Jan 2024 • Jian Sun, Huabin Cheng, Jian Wu, Zhanyang Zhu, Yu Chen

FA-GSS uses the Golden Section strategy to optimize both wirelength and area targets.

Paper
Add Code

AI-Enhanced Virtual Reality in Medicine: A Comprehensive Survey

no code implementations • 5 Feb 2024 • Yixuan Wu, Kaiyuan Hu, Danny Z. Chen, Jian Wu

With the rapid advance of computer graphics and artificial intelligence technologies, the ways we interact with the world have undergone a transformative shift.

Medical Diagnosis

Paper
Add Code

MRKE: The Multi-hop Reasoning Evaluation of LLMs by Knowledge Edition

no code implementations • 19 Feb 2024 • Jian Wu, Linyi Yang, Manabu Okumura, Yue Zhang

Although Large Language Models (LLMs) have shown strong performance in Multi-hop Question Answering (MHQA) tasks, their real reasoning ability remains exploration.

Multi-hop Question Answering Question Answering

Paper
Add Code

GenDec: A robust generative Question-decomposition method for Multi-hop reasoning

no code implementations • 17 Feb 2024 • Jian Wu, Linyi Yang, Yuliang Ji, Wenhao Huang, Börje F. Karlsson, Manabu Okumura

Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts.

Multi-hop Question Answering Question Answering +1

Paper
Add Code

Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images

no code implementations • 21 Feb 2024 • Xikai Yang, Jian Wu, Xi Wang, Yuchen Yuan, Ning Li Wang, Pheng-Ann Heng

Extensive experiments on the Sequential fundus Images for Glaucoma Forecast (SIGF) dataset demonstrate the superiority of the proposed MST-former method, achieving an AUC of 98. 6% for glaucoma forecasting.

Disease Prediction Irregular Time Series +1

Paper
Add Code

Unraveling Babel: Exploring Multilingual Activation Patterns within Large Language Models

no code implementations • 26 Feb 2024 • Weize Liu, Yinlong Xu, Hongxia Xu, Jintai Chen, Xuming Hu, Jian Wu

Recently, large language models (LLMs) have achieved tremendous breakthroughs in the field of language processing, yet their mechanisms in processing multiple languages remain agnostic.

Paper
Add Code

SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction

no code implementations • 3 Mar 2024 • Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, Jian Wu

Recent development of large language models (LLMs) has exhibited impressive zero-shot proficiency on generic and common sense questions.

Common Sense Reasoning

Paper
Add Code

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

no code implementations • 7 Mar 2024 • Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu

Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications.

Clinical Knowledge

Paper
Add Code

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

no code implementations • 19 Mar 2024 • Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

Paper
Add Code

PoCo: A Self-Supervised Approach via Polar Transformation Based Progressive Contrastive Learning for Ophthalmic Disease Diagnosis

no code implementations • 28 Mar 2024 • Jinhong Wang, Tingting Chen, Jintai Chen, Yixuan Wu, Yuyang Xu, Danny Chen, Haochao Ying, Jian Wu

In this paper, we present a self-supervised method via polar transformation based progressive contrastive learning, called PoCo, for ophthalmic disease diagnosis.

Contrastive Learning

Paper
Add Code

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

no code implementations • 1 Apr 2024 • Yue Wang, Yingzhou Lu, Yinlong Xu, Zihan Ma, Hongxia Xu, Bang Du, Honghao Gao, Jian Wu

Existing research often focuses on leveraging electronic health records (EHRs) to support clinical trial outcome prediction.

Clinical Knowledge Language Modelling +1

Paper
Add Code

Multi-rater Prompting for Ambiguous Medical Image Segmentation

no code implementations • 11 Apr 2024 • Jinhong Wang, Yi Cheng, Jintai Chen, Hongxia Xu, Danny Chen, Jian Wu

In this paper, we tackle two challenges arisen in multi-rater annotations for medical image segmentation (called ambiguous medical image segmentation): (1) How to train a deep learning model when a group of raters produces a set of diverse but plausible annotations, and (2) how to fine-tune the model efficiently when computation resources are not available for re-training the entire model on a different dataset domain.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

no code implementations • 18 Apr 2024 • Xiaotang Gai, Chenyi Zhou, Jiaxiang Liu, Yang Feng, Jian Wu, Zuozhu Liu

Moreover, we design a novel framework which finetunes lightweight pretrained generative models by incorporating medical decision-making rationales into the training process.

Decision Making Medical Visual Question Answering +2

Paper
Add Code

Group-On: Boosting One-Shot Segmentation with Supportive Query

no code implementations • 18 Apr 2024 • Hanjing Zhou, Mingze Yin, Jintai Chen, Danny Chen, Jian Wu

One-shot semantic segmentation aims to segment query images given only ONE annotated support image of the same class.

One-Shot Segmentation Segmentation

Paper
Add Code

1st Place Solution to the 1st SkatingVerse Challenge

no code implementations • 22 Apr 2024 • Tao Sun, Yuanzi Fu, Kaicheng Yang, Jian Wu, Ziyong Feng

This paper presents the winning solution for the 1st SkatingVerse Challenge.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.