Search Results for author: Tian Jin

Found 9 papers, 2 papers with code

Striped Attention: Faster Ring Attention for Causal Transformers

1 code implementation15 Nov 2023 William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1. 45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k.

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

no code implementations7 Oct 2023 Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference.

In-Context Learning

The Effect of Data Dimensionality on Neural Network Prunability

no code implementations1 Dec 2022 Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, Tian Jin

Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy.

Pruning's Effect on Generalization Through the Lens of Training and Regularization

no code implementations25 Oct 2022 Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite

Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it.

Compiling ONNX Neural Network Models Using MLIR

1 code implementation19 Aug 2020 Tian Jin, Gheorghe-Teodor Bercea, Tung D. Le, Tong Chen, Gong Su, Haruki Imai, Yasushi Negishi, Anh Leu, Kevin O'Brien, Kiyokuni Kawachiya, Alexandre E. Eichenberger

Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing.

speech-recognition Speech Recognition

Language to Network: Conditional Parameter Adaptation with Natural Language Descriptions

no code implementations ACL 2020 Tian Jin, Zhun Liu, Shengjia Yan, Alex Eichenberger, re, Louis-Philippe Morency

In this paper, we propose \textbf{N3} (\textbf{N}eural \textbf{N}etworks from \textbf{N}atural Language) - a new paradigm of synthesizing task-specific neural networks from language descriptions and a generic pre-trained model.

General Classification Image Classification +3

Novel Co-variant Feature Point Matching Based on Gaussian Mixture Model

no code implementations26 Oct 2019 Liang Shen, Jiahua zhu, Chongyi Fan, Xiaotao Huang, Tian Jin

In this paper, we develop a novel method considering all the feature center position coordinates, the local feature shape and orientation information based on Gaussian Mixture Model for co-variant feature matching.

Segmented convolutional gated recurrent neural networks for human activity recognition in ultra-wideband radar

no code implementations Neurocomputing 2019 Hao Du, Tian Jin, Yuan He, Yongping Song, Yongpeng Dai

In this work, we propose a neural network architecture, namely segmented convolutional gated recurrent neural network (SCGRNN), to recognize human activities based on micro-Doppler spectrograms measured by the ultra-wideband radar.

Human Activity Recognition RF-based Action Recognition +1

Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

no code implementations9 Feb 2019 Chu Qin, Ying Tan, Shang Ying Chen, Xian Zeng, Xingxing Qi, Tian Jin, Huan Shi, Yiwei Wan, Yu Chen, Jingfeng Li, Weidong He, Yali Wang, Peng Zhang, Feng Zhu, Hongping Zhao, Yuyang Jiang, Yuzong Chen

We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1. 39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space.

Clustering Drug Discovery

Cannot find the paper you are looking for? You can Submit a new open access paper.