Search Results for author: Shen Huang

Found 21 papers, 9 papers with code

CSP:Code-Switching Pre-training for Neural Machine Translation

no code implementations EMNLP 2020 Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju

Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language.

Decoder Machine Translation +3

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

1 code implementation2 Jul 2024 Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Exploring Key Point Analysis with Pairwise Generation and Graph Partitioning

1 code implementation17 Apr 2024 Xiao Li, Yong Jiang, Shen Huang, Pengjun Xie, Gong Cheng, Fei Huang

Our objective is to train a generative model that can simultaneously provide a score indicating the presence of shared key point between a pair of arguments and generate the shared key point.

Argument Mining graph partitioning +2

EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data

no code implementations25 Dec 2023 Shirong Ma, Shen Huang, Shulin Huang, Xiaobin Wang, Yangning Li, Hai-Tao Zheng, Pengjun Xie, Fei Huang, Yong Jiang

Experimental results demonstrate the effectiveness of continual pre-training of E-commerce LLMs and the efficacy of our devised data mixing strategy.

In-Context Learning

End-to-End Beam Retrieval for Multi-Hop Question Answering

2 code implementations17 Aug 2023 Jiahao Zhang, Haiyang Zhang, Dongmei Zhang, Yong liu, Shen Huang

This approach models the multi-hop retrieval process in an end-to-end manner by jointly optimizing an encoder and two classification heads across all hops.

Language Modelling Large Language Model +3

EcomGPT: Instruction-tuning Large Language Models with Chain-of-Task Tasks for E-commerce

1 code implementation14 Aug 2023 Yangning Li, Shirong Ma, Xiaobin Wang, Shen Huang, Chengyue Jiang, Hai-Tao Zheng, Pengjun Xie, Fei Huang, Yong Jiang

EcomInstruct scales up the data size and task diversity by constructing atomic tasks with E-commerce basic data types, such as product information, user reviews.

Diversity Instruction Following +3

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

no code implementations ACL 2021 Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Code-switching pre-training for neural machine translation

no code implementations17 Sep 2020 Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju

Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language.

Decoder Machine Translation +3

Cognitive Representation Learning of Self-Media Online Article Quality

no code implementations13 Aug 2020 Yiru Wang, Shen Huang, Gongfu Li, Qiang Deng, Dongliang Liao, Pengda Si, Yujiu Yang, Jin Xu

The automatic quality assessment of self-media online articles is an urgent and new issue, which is of great value to the online recommendation and search.

Representation Learning

Utterance-level end-to-end language identification using attention-based CNN-BLSTM

no code implementations20 Feb 2019 Weicheng Cai, Danwei Cai, Shen Huang, Ming Li

In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory (CNN-BLSTM).

Language Identification

TencentFmRD Neural Machine Translation for WMT18

no code implementations WS 2018 Bojie Hu, Ambyer Han, Shen Huang

Our systems are neural machine translation systems trained with our original system TenTrans.

Machine Translation NMT +1

Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure

no code implementations IJCNLP 2017 Shen Huang, Xu sun, Houfeng Wang

Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance.

Chinese Word Segmentation Domain Adaptation +2

Bi-LSTM Neural Networks for Chinese Grammatical Error Diagnosis

no code implementations WS 2016 Shen Huang, Houfeng Wang

Grammatical Error Diagnosis for Chinese has always been a challenge for both foreign learners and NLP researchers, for the variousity of grammar and the flexibility of expression.

Grammatical Error Detection Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.