1 code implementation • ACL 2022 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville
We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.
no code implementations • ACL 2022 • Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han
Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.
no code implementations • 6 Nov 2023 • Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan
A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.
no code implementations • 14 Oct 2023 • Zheyu Zhang, Zhuorui Ye, Yikang Shen, Chuang Gan
This approach yield a greater improvement compared to the ones fine-tuned on CoT data.
no code implementations • 13 Oct 2023 • Athul Paul Jacob, Yikang Shen, Gabriele Farina, Jacob Andreas
When applied to question answering and other text generation tasks, language models (LMs) may be queried generatively (by sampling answers from their output distribution) or discriminatively (by using them to score or rank a set of candidate outputs).
no code implementations • 11 Oct 2023 • Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan
The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers.
no code implementations • ICCV 2023 • Chengyang Zhao, Yikang Shen, Zhenfang Chen, Mingyu Ding, Chuang Gan
To tackle this problem, we propose a new framework TextPSG consisting of four modules, i. e., a region grouper, an entity grounder, a segment merger, and a label generator, with several novel techniques.
1 code implementation • 9 Oct 2023 • Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan
Central to our approach is a principle-following reward model.
no code implementations • 2 Oct 2023 • Jianan Zhao, Le Zhuo, Yikang Shen, Meng Qu, Kai Liu, Michael Bronstein, Zhaocheng Zhu, Jian Tang
Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language.
no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell
Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.
no code implementations • 29 Jun 2023 • Zitian Chen, Mingyu Ding, Yikang Shen, Wei Zhan, Masayoshi Tomizuka, Erik Learned-Miller, Chuang Gan
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan
In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.
no code implementations • 17 Apr 2023 • Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan
To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner.
1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.
no code implementations • 9 Mar 2023 • Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, Chuang Gan
Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process.
1 code implementation • 24 Jan 2023 • Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie zhou, Wenge Rong, Zhang Xiong
Our method outperforms previous fine-tuning and HyperNetwork-based methods and achieves state-of-the-art performance for Sequential Model Editing (SME).
no code implementations • 12 Jan 2023 • Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Hao Zhang, Chuang Gan
The see stage scans the image and grounds the visual concept candidates with a visual perception model.
no code implementations • CVPR 2023 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik G. Learned-Miller, Chuang Gan
To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').
no code implementations • 15 Dec 2022 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik Learned-Miller, Chuang Gan
To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').
1 code implementation • 11 Oct 2022 • Xiaofeng Zhang, Yikang Shen, Zeyu Huang, Jie zhou, Wenge Rong, Zhang Xiong
This paper proposes the Mixture of Attention Heads (MoA), a new architecture that combines multi-head attention with the MoE mechanism.
no code implementations • 27 Jun 2022 • Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan
Humans can leverage prior experience and learn novel tasks from a handful of demonstrations.
no code implementations • 8 Jun 2022 • Yikang Shen
We propose two families of inductive biases, one for constituency structure and another one for dependency structure.
no code implementations • NeurIPS 2021 • Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan Guo Wei, Shuai Zhang
On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step.
no code implementations • 29 Sep 2021 • Siyuan Zhou, Yikang Shen, Yuchen Lu, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime.
no code implementations • 19 Mar 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration.
no code implementations • ICLR 2021 • Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan
Many complex real-world tasks are composed of several levels of sub-tasks.
no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.
no code implementations • 1 Jan 2021 • Yi Tay, Yikang Shen, Alvin Chan, Aston Zhang, Shuai Zhang
This paper explores an intriguing idea of recursively parameterizing recurrent nets.
2 code implementations • ACL 2021 • Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville
There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.
5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.
Ranked #19 on
Long-range modeling
on LRA
no code implementations • NAACL 2021 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville
In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville
We model the recursive production property of context-free grammars for natural and synthetic languages.
1 code implementation • ACL 2020 • Wenyu Du, Zhouhan Lin, Yikang Shen, Timothy J. O'Donnell, Yoshua Bengio, Yue Zhang
It is commonly believed that knowledge of syntactic structure should improve language modeling.
no code implementations • ICLR 2020 • Yi Tay, Yikang Shen, Alvin Chan, Yew Soon Ong
This paper proposes Metagross (Meta Gated Recursive Controller), a new neural sequence modeling unit.
1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville
Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.
no code implementations • 23 Jun 2019 • Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville
The ability to understand logical relationships between sentences is an important task in language understanding.
7 code implementations • ICLR 2019 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.
1 code implementation • EMNLP 2018 • Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels.
Ranked #10 on
Extractive Text Summarization
on CNN / Daily Mail
2 code implementations • ACL 2018 • Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio
In this work, we propose a novel constituency parsing scheme.
no code implementations • 7 Mar 2018 • Yikang Shen, Shawn Tan, Chin-wei Huang, Aaron Courville
Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP).
1 code implementation • ICLR 2018 • Yikang Shen, Zhouhan Lin, Chin-wei Huang, Aaron Courville
In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model.
Ranked #13 on
Constituency Grammar Induction
on PTB Diagnostic ECG Database
(Max F1 (WSJ) metric)
no code implementations • 26 Jul 2017 • Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville
We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies.
no code implementations • 15 Nov 2015 • Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang, Zhang Xiong
With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web.