Sparse Attention with Learning to Hash

no code implementations ICLR 2022 Zhiqing Sun, Yiming Yang, Shinjae Yoo

To overcome these issues, this paper proposes a new strategy for sparse attention, namely LHA (Learning-to-Hash Attention), which directly learns separate parameterized hash functions for queries and keys, respectively.

Image Classification Language Modelling +1

Rethinking Transformer-based Set Prediction for Object Detection

1 code implementation ICCV 2021 Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani

DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge.

object-detection Object Detection

An EM Approach to Non-autoregressive Conditional Sequence Generation

1 code implementation ICML 2020 Zhiqing Sun, Yiming Yang

Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from the issue of high inference latency.

Machine Translation Translation

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.

no code implementations ICLR Workshop DeepDiffEq 2019 Yiping Lu*, Zhuohan Li*, Di He, Zhiqing Sun, Bin Dong, Tao Qin, LiWei Wang, Tie-Yan Liu

In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method.

Natural Language Processing

Fast Structured Decoding for Sequence Models

1 code implementation NeurIPS 2019 Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng

However, these models assume that the decoding process of each token is conditionally independent of others.

Machine Translation Translation

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

2 code implementations ICLR 2020 Yiping Lu, Zhuohan Li, Di He, Zhiqing Sun, Bin Dong, Tao Qin, Li-Wei Wang, Tie-Yan Liu

In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system.

Natural Language Processing

Neural Consciousness Flow

1 code implementation30 May 2019 Xiaoran Xu, Wei Feng, Zhiqing Sun, Zhi-Hong Deng

Instead, inspired by the consciousness prior proposed by Yoshua Bengio, we explore reasoning with the notion of attentive awareness from a cognitive perspective, and formulate it in the form of attentive message passing on graphs, called neural consciousness flow (NeuCFlow).

Decision Making Knowledge Base Completion

DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

no code implementations19 May 2019 Zhiqing Sun, Jian Tang, Pan Du, Zhi-Hong Deng, Jian-Yun Nie

Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process.

Document Summarization Information Retrieval +1

Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling

1 code implementation EMNLP 2018 Zhiqing Sun, Zhi-Hong Deng

As far as we know, we are the first to propose a neural model for unsupervised CWS and achieve competitive performance to the state-of-the-art statistical models on four different datasets from SIGHAN 2005 bakeoff.

Chinese Word Segmentation Language Modelling

A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks

no code implementations27 Dec 2017 Zhiqing Sun, Gehui Shen, Zhi-Hong Deng

However, if we consider segmenting a given sentence, the most intuitive idea is to predict whether to segment for each gap between two consecutive characters, which in comparison makes previous approaches seem too complex.

Chinese Word Segmentation

