Search Results for author: Fan Yu

Found 24 papers, 11 papers with code

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations31 Mar 2022 Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.

Automatic Speech Recognition Frame +1

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

1 code implementation ACL 2022 Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen

To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).

Open-Domain Question Answering Passage Retrieval

Solving Partial Differential Equations with Point Source Based on Physics-Informed Neural Networks

no code implementations2 Nov 2021 Xiang Huang, Hongsheng Liu, Beiji Shi, Zidong Wang, Kang Yang, Yang Li, Bingya Weng, Min Wang, Haotian Chu, Jing Zhou, Fan Yu, Bei Hua, Lei Chen, Bin Dong

In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems.

You Ought to Look Around: Precise, Large Span Action Detection

no code implementations 25th International Conference on Pattern Recognition (ICPR) 2021 Ge Pan, Han Zhang, Fan Yu, Yonghong Song, Yuanlin Zhang, Han Yuan

In this paper, we propose a method called YOLA (You Ought to Look Around) which includes three parts: 1) a robust backbone SPN-I3D for extracting spatio-temporal features.

Action Detection Action Localization

Towards More Effective and Economic Sparsely-Activated Model

no code implementations14 Oct 2021 Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao

To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU).

Answer Complex Questions: Path Ranker Is All You Need

1 code implementation Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 Xinyu Zhang, Ke Zhan, Enrui Hu, Chengzhen Fu, Lan Luo, Hao Jiang, Yantao Jia, Fan Yu, Zhicheng Dou, Zhao Cao, Lei Chen

Currently, the most popular method for open-domain Question Answering (QA) adopts "Retriever and Reader" pipeline, where the retriever extracts a list of candidate documents from a large set of documents followed by a ranker to rank the most relevant documents and the reader extracts answer from the candidates.

Open-Domain Question Answering

SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature

1 code implementation CVPR 2021 Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang

For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy.

Dimensionality Reduction

SKFAC:Training Neural Networks with Faster Kronecker-Factored Approximate Curvature

1 code implementation Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang

For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy.

Dimensionality Reduction

THOR, Trace-based Hardware-adaptive layer-ORiented Natural Gradient Descent Computation

1 code implementation AAAI Technical Track on Machine Learning 2021 Mengyun Chen, Kaixin Gao, Xiaolei Liu, Zidong Wang, Ningxi Ni, Qian Zhang, Lei Chen, Chao Ding, ZhengHai Huang, Min Wang, Shuangling Wang, Fan Yu, Xinyuan Zhao, Dachuan Xu

It is well-known that second-order optimizer can accelerate the training of deep neural networks, however, the huge computation cost of second-order optimization makes it impractical to apply in real practice.

DGCL: an efficient communication library for distributed GNN training

1 code implementation Proceedings of the Sixteenth European Conference on Computer Systems 2021 Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu

Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics.

Elastic Deep Learning in Multi-Tenant GPU Clusters

no code implementations IEEE Transactions on Parallel and Distributed Systems 2021 Yidi Wu, Kaihao Ma, Xiao Yan, Zhi Liu, Zhenkun Cai, Yuzhen Huang, James Cheng, Han Yuan, Fan Yu

We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i. e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster.

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

3 code implementations2 Feb 2021 Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Speech Recognition

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

no code implementations24 Dec 2020 Zedong Tang, Fenlong Jiang, Junke Song, Maoguo Gong, Hao Li, Fan Yu, Zidong Wang, Min Wang

Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent (SGD).

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

5 code implementations10 Dec 2020 BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Speech Recognition

Eigenvalue-corrected Natural Gradient Based on a New Approximation

no code implementations27 Nov 2020 Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Shuangling Wang, Zidong Wang, Dachuan Xu, Fan Yu

Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers.

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

no code implementations21 Nov 2020 Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Zidong Wang, Dachuan Xu, Fan Yu

There have been many attempts to use second-order optimization methods for training deep neural networks.

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

no code implementations13 Nov 2020 Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.

Sound Audio and Speech Processing

The energy technique for the six-step BDF method

no code implementations17 Jul 2020 Georgios Akrivis, Minghua Chen, Fan Yu, Zhi Zhou

In combination with the Grenander--Szeg\"o theorem, we observe that a relaxed positivity condition on multipliers, milder than the basic %fundamental requirement of the Nevanlinna--Odeh multipliers that the sum of the absolute values of their components is strictly less than $1$, makes the energy technique applicable to the stability analysis of BDF methods for parabolic equations with selfadjoint elliptic part.

Numerical Analysis Numerical Analysis

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism

1 code implementation16 Apr 2020 Zhenkun Cai, Kaihao Ma, Xiao Yan, Yidi Wu, Yuzhen Huang, James Cheng, Teng Su, Fan Yu

A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs).

Cannot find the paper you are looking for? You can Submit a new open access paper.