1 code implementation • 9 Oct 2024 • Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
In the realm of graph learning, there is a category of methods that conceptualize graphs as hierarchical structures, utilizing node clustering to capture broader structural information.
1 code implementation • 8 Oct 2024 • Chang Su, Jiexing Qi, He Yan, Kai Zou, Zhouhan Lin
Semantic parsing that translates natural language queries to SPARQL is of great importance for Knowledge Graph Question Answering (KGQA) systems.
no code implementations • 8 Oct 2024 • Yihuai Xu, Yongwei Wang, Yifei Bi, Huangsen Cao, Zhouhan Lin, Yu Zhao, Fei Wu
Despite this, existing training-free detection methods typically rely on global text sequence statistics, neglecting the modeling of local discriminative features, thereby limiting their detection efficacy.
1 code implementation • 7 Oct 2024 • Jushi Kai, Shengyuan Hou, Yusheng Huang, Zhouhan Lin
Grammar induction has made significant progress in recent years.
no code implementations • 7 Oct 2024 • Siyuan Huang, Zhiyuan Ma, Jintao Du, Changhua Meng, Weiqiang Wang, Zhouhan Lin
Self-Consistency, a widely-used decoding strategy, significantly boosts the reasoning capabilities of Large Language Models (LLMs).
no code implementations • 11 Aug 2024 • Dingyi Rong, Wenzhuo Zheng, Bozitao Zhong, Zhouhan Lin, Liang Hong, Ning Liu
Accurate prediction of enzyme function is crucial for elucidating biological mechanisms and driving innovation across various sectors.
1 code implementation • 22 Feb 2024 • Yunchong Song, Siyuan Huang, Xinbing Wang, Chenghu Zhou, Zhouhan Lin
GPN benefits from the discrete assignments generated by the graph parsing algorithm, allowing good memory efficiency while preserving node information intact.
no code implementations • 19 Jan 2024 • Xuekai Zhu, Yao Fu, BoWen Zhou, Zhouhan Lin
We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics.
2 code implementations • 11 Jan 2024 • Jushi Kai, Tianhang Zhang, Hai Hu, Zhouhan Lin
Therefore, we propose to ''highlight'' the factual information by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation.
1 code implementation • 31 Dec 2023 • Zhouhan Lin, Cheng Deng, Le Zhou, Tianhang Zhang, Yi Xu, Yutong Xu, Zhongmou He, Yuanyuan Shi, Beiya Dai, Yunchong Song, Boyi Zeng, Qiyuan Chen, Yuxun Miao, Bo Xue, Shu Wang, Luoyi Fu, Weinan Zhang, Junxian He, Yunqiang Zhu, Xinbing Wang, Chenghu Zhou
To our best knowledge, it is the largest language model for the geoscience domain.
no code implementations • 8 Dec 2023 • Zhixin Guo, Jianping Zhou, Jiexing Qi, Mingxuan Yan, Ziwei He, Guanjie Zheng, Zhouhan Lin, Xinbing Wang, Chenghu Zhou
The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information.
1 code implementation • 8 Dec 2023 • Boyi Zeng, Lizheng Wang, Yuncong Hu, Yi Xu, Chenghu Zhou, Xinbing Wang, Yu Yu, Zhouhan Lin
In this study, we introduce HuRef, a human-readable fingerprint for LLMs that uniquely identifies the base model without interfering with training or exposing model parameters to the public.
1 code implementation • 10 Oct 2023 • Yusheng Huang, Zhouhan Lin
Multimodal information extraction is attracting research attention nowadays, which requires aggregating representations from different modalities.
1 code implementation • NeurIPS 2023 • Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information.
1 code implementation • 6 Oct 2023 • Yusheng Huang, Zhouhan Lin
Document-level relation extraction (DocRE) is a task that focuses on identifying relations between entities within a document.
1 code implementation • 8 Jun 2023 • Cheng Deng, Tianhang Zhang, Zhongmou He, Yi Xu, Qiyuan Chen, Yuanyuan Shi, Luoyi Fu, Weinan Zhang, Xinbing Wang, Chenghu Zhou, Zhouhan Lin, Junxian He
Large language models (LLMs) have achieved great success in general domains of natural language processing.
1 code implementation • 24 May 2023 • Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin
Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.
Ranked #1 on Open-Domain Question Answering on ELI5
1 code implementation • 23 May 2023 • Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, BoWen Zhou
While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment.
1 code implementation • 10 Apr 2023 • Yusheng Huang, Jiexing Qi, Xinbing Wang, Zhouhan Lin
We further employ the asymmetric focusing mechanism to decouple the gradient contribution from the negative and positive samples.
1 code implementation • 19 Feb 2023 • Jiexing Qi, Shuhao Li, Zhixin Guo, Yusheng Huang, Chenghu Zhou, Weinan Zhang, Xinbing Wang, Zhouhan Lin
In this work, we first collect a large-scale institution name normalization dataset LoT-insts1, which contains over 25k classes that exhibit a naturally long-tailed distribution.
Ranked #1 on Long-tail Learning on Lot-insts
no code implementations • 9 Feb 2023 • Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang
The design of our framework consists of two aspects: a prompt planner and a knowledge adapter.
1 code implementation • 3 Feb 2023 • Yunchong Song, Chenghu Zhou, Xinbing Wang, Zhouhan Lin
This is achieved by aligning the hierarchy of the rooted-tree of a central node with the ordered neurons in its node representation.
Ranked #5 on Node Classification on Cornell
1 code implementation • 21 Oct 2022 • Shengyuan Hou, Jushi Kai, Haotian Xue, Bingyu Zhu, Bo Yuan, Longtao Huang, Xinbing Wang, Zhouhan Lin
Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data.
1 code implementation • 21 Oct 2022 • Ning Shi, Bin Tang, Bo Yuan, Longtao Huang, Yewen Pu, Jie Fu, Zhouhan Lin
Text editing, such as grammatical error correction, arises naturally from imperfect textual data.
no code implementations • 22 Sep 2022 • Yi Xu, Luoyi Fu, Zhouhan Lin, Jiexing Qi, Xinbing Wang
As a fully unsupervised framework, INFINITY is empirically verified to outperform state-of-the-art baselines for G2T and T2G tasks.
1 code implementation • ACL 2022 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo
To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer.
1 code implementation • 14 May 2022 • Jiexing Qi, Jingyao Tang, Ziwei He, Xiangpeng Wan, Yu Cheng, Chenghu Zhou, Xinbing Wang, Quanshi Zhang, Zhouhan Lin
Our model can incorporate almost all types of existing relations in the literature, and in addition, we propose introducing co-reference relations for the multi-turn scenario.
Ranked #1 on Dialogue State Tracking on CoSQL
1 code implementation • ACL 2022 • Xichen Pan, Peiyu Chen, Yichen Gong, Helong Zhou, Xinbing Wang, Zhouhan Lin
In particular, audio and visual front-ends are trained on large-scale unimodal datasets, then we integrate components of both front-ends into a larger multimodal framework which learns to recognize parallel audio-visual data into characters through a combination of CTC and seq2seq decoding.
Ranked #3 on Automatic Speech Recognition (ASR) on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +7
1 code implementation • 16 Dec 2021 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu
We further prune the hidden states corresponding to the unnecessary positions early in lower layers, achieving significant inference-time speedup.
1 code implementation • 12 Jun 2021 • Ning Shi, Wei Wang, Boxin Wang, Jinfeng Li, Xiangyu Liu, Zhouhan Lin
Punctuation restoration is an important post-processing step in automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • SIGDIAL (ACL) 2021 • Kun Qian, Ahmad Beirami, Zhouhan Lin, Ankita De, Alborz Geramifard, Zhou Yu, Chinnadhurai Sankar
In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling.
1 code implementation • 5 Dec 2020 • Minkai Xu, Mingxuan Wang, Zhouhan Lin, Hao Zhou, Weinan Zhang, Lei LI
Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT).
1 code implementation • ACL 2020 • Wenyu Du, Zhouhan Lin, Yikang Shen, Timothy J. O'Donnell, Yoshua Bengio, Yue Zhang
It is commonly believed that knowledge of syntactic structure should improve language modeling.
2 code implementations • 14 Mar 2020 • Ning Shi, Boxin Wang, Wei Wang, Xiangyu Liu, Zhouhan Lin
Humans can systematically generalize to novel compositions of existing concepts.
1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville
Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.
1 code implementation • IJCNLP 2019 • Xingdi Yuan, Marc-Alexandre Cote, Jie Fu, Zhouhan Lin, Christopher Pal, Yoshua Bengio, Adam Trischler
In QAit, an agent must interact with a partially observable text-based environment to gather information required to answer questions.
Deep Reinforcement Learning Machine Reading Comprehension +1
no code implementations • WS 2018 • Athul Paul Jacob, Zhouhan Lin, Aless Sordoni, ro, Yoshua Bengio
We propose a hierarchical model for sequential data that learns a tree on-the-fly, i. e. while reading the sequence.
no code implementations • ICML 2018 • Nan Rosemary Ke, Konrad Zolna, Alessandro Sordoni, Zhouhan Lin, Adam Trischler, Yoshua Bengio, Joelle Pineau, Laurent Charlin, Chris Pal
We evaluate this method on several types of tasks with different attributes.
Ranked #3 on Open-Domain Question Answering on SearchQA (Unigram Acc metric)
2 code implementations • ACL 2018 • Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio
In this work, we propose a novel constituency parsing scheme.
no code implementations • 20 Jan 2018 • Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio
We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition.
1 code implementation • ICLR 2018 • Yikang Shen, Zhouhan Lin, Chin-wei Huang, Aaron Courville
In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model.
Ranked #16 on Constituency Grammar Induction on PTB Diagnostic ECG Database (Max F1 (WSJ) metric)
no code implementations • 7 Sep 2017 • Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio
By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble.
52 code implementations • 9 Mar 2017 • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos santos, Mo Yu, Bing Xiang, Bo-Wen Zhou, Yoshua Bengio
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.
1 code implementation • 21 Nov 2016 • Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio
Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high.
1 code implementation • 24 Aug 2016 • Joachim Ott, Zhouhan Lin, Ying Zhang, Shih-Chii Liu, Yoshua Bengio
We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets.
1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang
Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.
no code implementations • NeurIPS 2016 • Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Ruslan Salakhutdinov, Yoshua Bengio
In this paper, we systematically analyze the connecting architectures of recurrent neural networks (RNNs).
Ranked #23 on Language Modelling on Text8
1 code implementation • 9 Nov 2015 • Zhouhan Lin, Yushi Chen, Xing Zhao, Gang Wang
Hyperspectral image (HSI) classification is a hot topic in the remote sensing community.
5 code implementations • 9 Nov 2015 • Zhouhan Lin, Roland Memisevic, Kishore Konda
We propose ways to improve the performance of fully connected networks.
2 code implementations • 11 Oct 2015 • Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
For most deep learning algorithms training is notoriously time consuming.
no code implementations • 14 Feb 2015 • Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin
Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology.