Search Results for author: Shaohan Huang

Found 67 papers, 43 papers with code

Pseudo-Label Guided Unsupervised Domain Adaptation of Contextual Embeddings

no code implementations EACL (AdaptNLP) 2021 Tianyu Chen, Shaohan Huang, Furu Wei, JianXin Li

In unsupervised domain adaptation, we aim to train a model that works well on a target domain when provided with labeled source samples and unlabeled target samples.

Language Modelling Masked Language Modeling +3

Instruction Pre-Training: Language Models are Supervised Multitask Learners

1 code implementation20 Jun 2024 Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei

Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs).

You Only Cache Once: Decoder-Decoder Architectures for Language Models

1 code implementation8 May 2024 Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.

Decoder Retrieval

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

no code implementations23 Apr 2024 Xun Wu, Shaohan Huang, Furu Wei

Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts.

Language Modelling Large Language Model +2

Multi-Head Mixture-of-Experts

no code implementations23 Apr 2024 Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei

These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form.

Language Modelling

Mixture of LoRA Experts

1 code implementation21 Apr 2024 Xun Wu, Shaohan Huang, Furu Wei

LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques.

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

1 code implementation28 Feb 2024 Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to fine-tune large language models (LLMs).

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

2 code implementations27 Feb 2024 Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).

$Se^2$: Sequential Example Selection for In-Context Learning

1 code implementation21 Feb 2024 Haoyu Liu, Jianfeng Liu, Shaohan Huang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Furu Wei, Qi Zhang

The remarkable capability of large language models (LLMs) for in-context learning (ICL) needs to be activated by demonstration examples.

Diversity In-Context Learning

Text Diffusion with Reinforced Conditioning

no code implementations19 Feb 2024 Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio.

Improving Domain Adaptation through Extended-Text Reading Comprehension

1 code implementation14 Jan 2024 Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method.

Clustering Domain Adaptation +1

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

1 code implementation20 Oct 2023 Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability.

Instruction Following Language Modelling +1

BitNet: Scaling 1-bit Transformers for Large Language Models

2 code implementations17 Oct 2023 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.

Language Modelling Quantization

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

1 code implementation4 Oct 2023 Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."

Decoder Image Generation

Calibrating LLM-Based Evaluator

no code implementations23 Sep 2023 Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation.

In-Context Learning Language Modelling +1

Adapting Large Language Models via Reading Comprehension

1 code implementation18 Sep 2023 Daixuan Cheng, Shaohan Huang, Furu Wei

Taken inspiration from human learning via reading comprehension--practice after reading improves the ability to answer questions based on the learned knowledge--we propose a simple method for transforming raw corpora into reading comprehension texts.

Language Modelling Question Answering +1

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

no code implementations3 Sep 2023 Jiaxing Qi, Shaohan Huang, Zhongzhi Luan, Carol Fung, Hailong Yang, Depei Qian

In this work, we proposed LogGPT, a log-based anomaly detection framework based on ChatGPT.

Anomaly Detection

Scaling Sentence Embeddings with Large Language Models

1 code implementation31 Jul 2023 Ting Jiang, Shaohan Huang, Zhongzhi Luan, Deqing Wang, Fuzhen Zhuang

We also fine-tune LLMs with current contrastive learning approach, and the 2. 7B OPT model, incorporating our prompt-based method, surpasses the performance of 4. 8B ST5, achieving the new state-of-the-art results on STS tasks.

Contrastive Learning In-Context Learning +4

Retentive Network: A Successor to Transformer for Large Language Models

8 code implementations17 Jul 2023 Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.

Language Modelling

LongNet: Scaling Transformers to 1,000,000,000 Tokens

3 code implementations5 Jul 2023 Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

Scaling sequence length has become a critical demand in the era of large language models.

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations26 Jun 2023 Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Image Captioning In-Context Learning +9

Learning Music Sequence Representation from Text Supervision

no code implementations31 May 2023 Tianyu Chen, Yuan Xie, Shuai Zhang, Shaohan Huang, Haoyi Zhou, JianXin Li

Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals.

Contrastive Learning Representation Learning

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

1 code implementation16 May 2023 Ziheng Li, Shaohan Huang, Zihan Zhang, Zhi-Hong Deng, Qiang Lou, Haizhen Huang, Jian Jiao, Furu Wei, Weiwei Deng, Qi Zhang

Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding.

Language Modelling Sentence +3

Pre-training Language Model as a Multi-perspective Course Learner

no code implementations6 May 2023 Beiduo Chen, Shaohan Huang, Zihan Zhang, Wu Guo, ZhenHua Ling, Haizhen Huang, Furu Wei, Weiwei Deng, Qi Zhang

Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision.

Language Modelling Masked Language Modeling

UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation

1 code implementation15 Mar 2023 Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Furu Wei, Denvy Deng, Qi Zhang

Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization.

Hallucination Prompt Engineering +1

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

1 code implementation20 Dec 2022 Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.

Decoder Denoising +2

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

no code implementations26 Oct 2022 Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei, Vishrav Chaudhary, Xia Song

In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications.

Representation Learning

MoEC: Mixture of Expert Clusters

no code implementations19 Jul 2022 Yuan Xie, Shaohan Huang, Tianyu Chen, Furu Wei

Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead.

Machine Translation Natural Language Understanding

Language Models are General-Purpose Interfaces

1 code implementation13 Jun 2022 Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Causal Language Modeling Few-Shot Learning +6

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

no code implementations Findings (ACL) 2022 Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei

As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).

Privacy Preserving

Task-Specific Expert Pruning for Sparse Mixture-of-Experts

no code implementations1 Jun 2022 Tianyu Chen, Shaohan Huang, Yuan Xie, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei

The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pre-training and has achieved promising results due to its model capacity.

DeepNet: Scaling Transformers to 1,000 Layers

6 code implementations1 Mar 2022 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.

Translation

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

1 code implementation15 Jan 2022 Yunzhi Yao, Shaohan Huang, Li Dong, Furu Wei, Huajun Chen, Ningyu Zhang

In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers.

Language Modelling Question Answering

Improving Non-autoregressive Generation with Mixup Training

1 code implementation21 Oct 2021 Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang

While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge.

Natural Language Understanding Paraphrase Generation +2

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

2 code implementations EMNLP 2021 Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei

We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.

Language Modelling

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations25 Jun 2021 Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Decoder +6

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

2 code implementations Findings (ACL) 2021 Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.

Relation XLM-R

Unsupervised Fine-tuning for Text Clustering

no code implementations COLING 2020 Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou

Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).

Clustering text-classification +2

Generating Commonsense Explanation by Extracting Bridge Concepts from Reasoning Paths

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang

Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense.

Explanation Generation

Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph

1 code implementation EMNLP 2020 Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang

Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation.

Text Generation

DocBank: A Benchmark Dataset for Document Layout Analysis

2 code implementations COLING 2020 Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou

DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.

Document Layout Analysis

TableBank: Table Benchmark for Image-based Table Detection and Recognition

1 code implementation LREC 2020 Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.

Table Detection

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

16 code implementations31 Dec 2019 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Document AI Document Image Classification +3

TableBank: A Benchmark Dataset for Table Detection and Recognition

2 code implementations LREC 2020 Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.

Table Detection

Text Morphing

no code implementations30 Sep 2018 Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences.

Sentence Text Generation

Neural Melody Composition from Lyrics

no code implementations12 Sep 2018 Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou

In this paper, we study a novel task that learns to compose music from natural language.

Decoder

Dictionary-Guided Editing Networks for Paraphrase Generation

no code implementations21 Jun 2018 Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct.

Paraphrase Generation Sentence

Response Generation by Context-aware Prototype Editing

3 code implementations19 Jun 2018 Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, Ming Zhou

Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses.

Decoder Diversity +3

Learning to Generate Product Reviews from Attributes

no code implementations EACL 2017 Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, Ke Xu

This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating.

Attribute Decoder +3

Cannot find the paper you are looking for? You can Submit a new open access paper.