Search Results for author: Yikang Shen

Found 53 papers, 21 papers with code

Unsupervised Dependency Graph Network

1 code implementation • ACL 2022 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville

We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.

Language Modelling Masked Language Modeling +3

Paper
Code

Phrase-aware Unsupervised Constituency Parsing

no code implementations • ACL 2022 • Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han

Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.

Constituency Parsing Language Modelling +1

Paper
Add Code

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

1 code implementation • 11 Apr 2024 • Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.

901

Paper
Code

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

no code implementations • 8 Apr 2024 • Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios.

Paper
Add Code

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

1 code implementation • 14 Mar 2024 • Zhiqing Sun, Longhui Yu, Yikang Shen, Weiyang Liu, Yiming Yang, Sean Welleck, Chuang Gan

This paper answers this question in the context of tackling hard reasoning tasks (e. g., level 4-5 MATH problems) via learning from human annotations on easier tasks (e. g., level 1-3 MATH problems), which we term as \textit{easy-to-hard generalization}.

Math Reinforcement Learning (RL) +1

Paper
Code

Scattered Mixture-of-Experts Implementation

1 code implementation • 13 Mar 2024 • Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs.

132

Paper
Code

API Pack: A Massive Multilingual Dataset for API Call Generation

1 code implementation • 14 Feb 2024 • Zhen Guo, Adriana Meza Soria, Wei Sun, Yikang Shen, Rameswar Panda

We introduce API Pack, a multilingual dataset featuring over one million instruction-API call pairs aimed at advancing large language models' API call generation capabilities.

Paper
Code

Diversity Measurement and Subset Selection for Instruction Tuning Datasets

no code implementations • 4 Feb 2024 • Peiqi Wang, Yikang Shen, Zhen Guo, Matthew Stallone, Yoon Kim, Polina Golland, Rameswar Panda

Our experiments demonstrate that the proposed diversity measure in the normalized weight gradient space is correlated with downstream instruction-following performance.

Instruction Following Point Processes

Paper
Add Code

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

no code implementations • 30 Jan 2024 • Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, Chuang Gan

Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values.

Language Modelling Large Language Model +1

Paper
Add Code

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

no code implementations • 19 Jan 2024 • Mayank Agarwal, Yikang Shen, Bailin Wang, Yoon Kim, Jie Chen

In this work, we explore data-efficient adaptation of pre-trained code models by further pre-training and fine-tuning them with program structures.

Paper
Add Code

Gated Linear Attention Transformers with Hardware-Efficient Training

2 code implementations • 11 Dec 2023 • Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim

When used as a replacement for the standard attention layer in Transformers, the resulting gated linear attention (GLA) Transformer is found to perform competitively against the LLaMA-architecture Transformer (Touvron et al., 2023) as well recent linear-time-inference baselines such as RetNet(Sun et al., 2023a) and Mamba (Gu & Dao, 2023) on moderate-scale language modeling experiments.

2k Language Modelling

434

Paper
Code

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

no code implementations • 6 Nov 2023 • Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan

A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.

CoLA Question Answering +5

Paper
Add Code

Autonomous Tree-search Ability of Large Language Models

no code implementations • 14 Oct 2023 • Zheyu Zhang, Zhuorui Ye, Yikang Shen, Chuang Gan

This approach yield a greater improvement compared to the ones fine-tuned on CoT data.

Decision Making

Paper
Add Code

The Consensus Game: Language Model Generation via Equilibrium Search

no code implementations • 13 Oct 2023 • Athul Paul Jacob, Yikang Shen, Gabriele Farina, Jacob Andreas

When applied to question answering and other text generation tasks, language models (LMs) may be queried generatively (by sampling answers from their output distribution) or discriminatively (by using them to score or rank a set of candidate outputs).

Language Modelling Question Answering +2

Paper
Add Code

Sparse Universal Transformer

no code implementations • 11 Oct 2023 • Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers.

Paper
Add Code

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions

no code implementations • ICCV 2023 • Chengyang Zhao, Yikang Shen, Zhenfang Chen, Mingyu Ding, Chuang Gan

To tackle this problem, we propose a new framework TextPSG consisting of four modules, i. e., a region grouper, an entity grounder, a segment merger, and a label generator, with several novel techniques.

Graph Generation Panoptic Scene Graph Generation +1

Paper
Add Code

SALMON: Self-Alignment with Instructable Reward Models

1 code implementation • 9 Oct 2023 • Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents.

In-Context Learning Language Modelling

124

Paper
Code

GraphText: Graph Reasoning in Text Space

no code implementations • 2 Oct 2023 • Jianan Zhao, Le Zhuo, Yikang Shen, Meng Qu, Kai Liu, Michael Bronstein, Zhaocheng Zhu, Jian Tang

Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language.

In-Context Learning Text Generation

Paper
Add Code

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Paper
Add Code

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

no code implementations • 29 Jun 2023 • Zitian Chen, Mingyu Ding, Yikang Shen, Wei Zhan, Masayoshi Tomizuka, Erik Learned-Miller, Chuang Gan

We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.

Continual Learning Multi-Task Learning

Paper
Add Code

ModuleFormer: Modularity Emerges from Mixture-of-Experts

1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan

In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.

Language Modelling

216

Paper
Code

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

1 code implementation • NeurIPS 2023 • Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable.

In-Context Learning Language Modelling

1,088

Paper
Code

Hyper-Decision Transformer for Efficient Online Policy Adaptation

no code implementations • 17 Apr 2023 • Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan

To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner.

Paper
Add Code

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.

Paper
Code

Planning with Large Language Models for Code Generation

no code implementations • 9 Mar 2023 • Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, Chuang Gan

Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process.

Code Generation Language Modelling +1

Paper
Add Code

Transformer-Patcher: One Mistake worth One Neuron

1 code implementation • 24 Jan 2023 • Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie zhou, Wenge Rong, Zhang Xiong

Our method outperforms previous fine-tuning and HyperNetwork-based methods and achieves state-of-the-art performance for Sequential Model Editing (SME).

Model Editing

Paper
Code

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning

no code implementations • 12 Jan 2023 • Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Hao Zhang, Chuang Gan

The see stage scans the image and grounds the visual concept candidates with a visual perception model.

Few-Shot Learning Image Captioning +4

Paper
Add Code

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

no code implementations • CVPR 2023 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik G. Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

no code implementations • 15 Dec 2022 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

Mixture of Attention Heads: Selecting Attention Heads Per Token

1 code implementation • 11 Oct 2022 • Xiaofeng Zhang, Yikang Shen, Zeyu Huang, Jie zhou, Wenge Rong, Zhang Xiong

This paper proposes the Mixture of Attention Heads (MoA), a new architecture that combines multi-head attention with the MoE mechanism.

Computational Efficiency Language Modelling +2

Paper
Code

Prompting Decision Transformer for Few-Shot Policy Generalization

no code implementations • 27 Jun 2022 • Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan

Humans can leverage prior experience and learn novel tasks from a handful of demonstrations.

Few-Shot Learning Inductive Bias +2

Paper
Add Code

Syntactic Inductive Biases for Deep Learning Methods

no code implementations • 8 Jun 2022 • Yikang Shen

We propose two families of inductive biases, one for constituency structure and another one for dependency structure.

Inductive Bias

Paper
Add Code

Self-Instantiated Recurrent Units with Dynamic Soft Recursion

no code implementations • NeurIPS 2021 • Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan Guo Wei, Shuai Zhang

On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step.

Inductive Bias

Paper
Add Code

Inducing Reusable Skills From Demonstrations with Option-Controller Network

no code implementations • 29 Sep 2021 • Siyuan Zhou, Yikang Shen, Yuchen Lu, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime.

Paper
Add Code

Learning Task Decomposition with Ordered Memory Policy Network

no code implementations • 19 Mar 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration.

Inductive Bias

Paper
Add Code

Long Range Arena : A Benchmark for Efficient Transformers

no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.

16k Benchmarking

Paper
Add Code

Recurrently Controlling a Recurrent Network with Recurrent Networks Controlled by More Recurrent Networks

no code implementations • 1 Jan 2021 • Yi Tay, Yikang Shen, Alvin Chan, Aston Zhang, Shuai Zhang

This paper explores an intriguing idea of recursively parameterizing recurrent nets.

Code Generation Inductive Bias +4

Paper
Add Code

Learning Task Decomposition with Order-Memory Policy Network

no code implementations • ICLR 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

Many complex real-world tasks are composed of several levels of sub-tasks.

Imitation Learning Inductive Bias

Paper
Add Code

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

2 code implementations • ACL 2021 • Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.

Constituency Parsing Language Modelling +2

32,816

Paper
Code

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

16k Benchmarking +1

682

Paper
Code

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

no code implementations • NAACL 2021 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville

In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM).

Language Modelling

Paper
Add Code

Recursive Top-Down Production for Sentence Generation with Latent Trees

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville

We model the recursive production property of context-free grammars for natural and synthetic languages.

Sentence Translation

Paper
Code

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

1 code implementation • ACL 2020 • Wenyu Du, Zhouhan Lin, Yikang Shen, Timothy J. O'Donnell, Yoshua Bengio, Yue Zhang

It is commonly believed that knowledge of syntactic structure should improve language modeling.

Language Modelling

Paper
Code

Metagross: Meta Gated Recursive Controller Units for Sequence Modeling

no code implementations • ICLR 2020 • Yi Tay, Yikang Shen, Alvin Chan, Yew Soon Ong

This paper proposes Metagross (Meta Gated Recursive Controller), a new neural sequence modeling unit.

Code Generation Inductive Bias +4

Paper
Add Code

Ordered Memory

1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Paper
Code

Investigating Biases in Textual Entailment Datasets

no code implementations • 23 Jun 2019 • Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville

The ability to understand logical relationships between sentences is an important task in language understanding.

BIG-bench Machine Learning Natural Language Inference +2

Paper
Add Code

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

7 code implementations • ICLR 2019 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville

When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.

Ranked #13 on Constituency Grammar Induction on PTB Diagnostic ECG Database

Constituency Grammar Induction Inductive Bias +1

577

Paper
Code

BanditSum: Extractive Summarization as a Contextual Bandit

1 code implementation • EMNLP 2018 • Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung

In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels.

Ranked #10 on Extractive Text Summarization on CNN / Daily Mail

Extractive Summarization Extractive Text Summarization

Paper
Code

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

2 code implementations • ACL 2018 • Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio

In this work, we propose a novel constituency parsing scheme.

Constituency Parsing Position +1

Paper
Code

Generating Contradictory, Neutral, and Entailing Sentences

no code implementations • 7 Mar 2018 • Yikang Shen, Shawn Tan, Chin-wei Huang, Aaron Courville

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP).

Natural Language Inference RTE +1

Paper
Add Code

Neural Language Modeling by Jointly Learning Syntax and Lexicon

1 code implementation • ICLR 2018 • Yikang Shen, Zhouhan Lin, Chin-wei Huang, Aaron Courville

In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model.

Ranked #13 on Constituency Grammar Induction on PTB Diagnostic ECG Database (Max F1 (WSJ) metric)

Constituency Grammar Induction Language Modelling

Paper
Code

Self-organized Hierarchical Softmax

no code implementations • 26 Jul 2017 • Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies.

Language Modelling Sentence +1

Paper
Add Code

Word Embedding based Correlation Model for Question/Answer Matching

no code implementations • 15 Nov 2015 • Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang, Zhang Xiong

With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web.

Question Answering Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.