Search Results for author: Hanxiao Liu

Found 35 papers, 21 papers with code

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations NeurIPS 2023 Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events

1 code implementation17 May 2023 Chenyang Shi, Hanxiao Liu, Jing Jin, Wenzhuo Li, Yuzhen Li, Boyi Wei, Yibo Zhang

The proposed method first estimates the optical flow based on frames and events, and then decides whether to further calculate the residual optical flow in those sub-regions via a Gumbel gating module according to the optical flow amplitude.

Event-based Optical Flow Optical Flow Estimation +1

Larger language models do in-context learning differently

no code implementations7 Mar 2023 Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets

1 code implementation15 Apr 2022 Chengrun Yang, Gabriel Bender, Hanxiao Liu, Pieter-Jan Kindermans, Madeleine Udell, Yifeng Lu, Quoc Le, Da Huang

The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc.

Image Retrieval Neural Architecture Search +1

Transformer Quality in Linear Time

1 code implementation21 Feb 2022 Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le

We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences.

Language Modelling Masked Language Modeling

Mixture-of-Experts with Expert Choice Routing

no code implementations18 Feb 2022 Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew Dai, Zhifeng Chen, Quoc Le, James Laudon

Prior work allocates a fixed number of experts to each token using a top-k function regardless of the relative importance of different tokens.

Searching for Efficient Transformers for Language Modeling

no code implementations NeurIPS 2021 David So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc Le

For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.

Language Modelling

Combined Scaling for Zero-shot Transfer Learning

no code implementations19 Nov 2021 Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.

Classification Contrastive Learning +3

Primer: Searching for Efficient Transformers for Language Modeling

4 code implementations17 Sep 2021 David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.

Language Modelling

CoAtNet: Marrying Convolution and Attention for All Data Sizes

14 code implementations NeurIPS 2021 Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

Image Classification Inductive Bias

Pay Attention to MLPs

20 code implementations NeurIPS 2021 Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years.

Image Classification Natural Language Inference +2

Transferable Graph Optimizers for ML Compilers

no code implementations NeurIPS 2020 Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Mangpo Phothilimthana, Shen Wang, Anna Goldie, Azalia Mirhoseini, James Laudon

Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code.

Discovering Multi-Hardware Mobile Models via Architecture Search

no code implementations18 Aug 2020 Grace Chu, Okan Arikan, Gabriel Bender, Weijun Wang, Achille Brighton, Pieter-Jan Kindermans, Hanxiao Liu, Berkin Akin, Suyog Gupta, Andrew Howard

Hardware-aware neural architecture designs have been predominantly focusing on optimizing model performance on single hardware and model development complexity, where another important factor, model deployment complexity, has been largely ignored.

Neural Architecture Search

Can weight sharing outperform random architecture search? An investigation with TuNAS

1 code implementation CVPR 2020 Gabriel Bender, Hanxiao Liu, Bo Chen, Grace Chu, Shuyang Cheng, Pieter-Jan Kindermans, Quoc Le

Efficient Neural Architecture Search methods based on weight sharing have shown good promise in democratizing Neural Architecture Search for computer vision models.

Image Classification Neural Architecture Search

Rethinking Pre-training and Self-training

2 code implementations NeurIPS 2020 Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.

Data Augmentation Object +4

MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

4 code implementations CVPR 2021 Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen

By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.

Neural Architecture Search Object +2

Evolving Normalization-Activation Layers

8 code implementations NeurIPS 2020 Hanxiao Liu, Andrew Brock, Karen Simonyan, Quoc V. Le

Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other.

Image Classification Image Generation +2

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

1 code implementation ECCV 2020 Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le

Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.

Neural Architecture Search

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

2 code implementations CVPR 2020 Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.

object-detection Object Detection

Neural Predictor for Neural Architecture Search

2 code implementations ECCV 2020 Wei Wen, Hanxiao Liu, Hai Li, Yiran Chen, Gabriel Bender, Pieter-Jan Kindermans

First we train N random architectures to generate N (architecture, validation accuracy) pairs and use them to train a regression model that predicts accuracy based on the architecture.

Neural Architecture Search regression

GDP: Generalized Device Placement for Dataflow Graphs

no code implementations28 Sep 2019 Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter C. Ma, Qiumin Xu, Ming Zhong, Hanxiao Liu, Anna Goldie, Azalia Mirhoseini, James Laudon

Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices.

Scaling Up Neural Architecture Search with Big Single-Stage Models

no code implementations25 Sep 2019 Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Quoc Le

In this work, we propose BigNAS, an approach that simplifies this workflow and scales up neural architecture search to target a wide range of model sizes simultaneously.

Neural Architecture Search

Learning Graph Convolution Filters from Data Manifold

no code implementations ICLR 2018 Guokun Lai, Hanxiao Liu, Yiming Yang

Convolution Neural Network (CNN) has gained tremendous success in computer vision tasks with its outstanding ability to capture the local latent features.

Hierarchical Representations for Efficient Architecture Search

1 code implementation ICLR 2018 Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, Koray Kavukcuoglu

We explore efficient neural architecture search methods and show that a simple yet powerful evolutionary algorithm can discover new architectures with excellent performance.

General Classification Image Classification +1

Learning Depthwise Separable Graph Convolution from Data Manifold

no code implementations31 Oct 2017 Guokun Lai, Hanxiao Liu, Yiming Yang

Convolution Neural Network (CNN) has gained tremendous success in computer vision tasks with its outstanding ability to capture the local latent features.

Computational Efficiency

Analogical Inference for Multi-Relational Embeddings

1 code implementation ICML 2017 Hanxiao Liu, Yuexin Wu, Yiming Yang

Large-scale multi-relational embedding refers to the task of learning the latent representations for entities and relations in large knowledge graphs.

Knowledge Graphs Link Prediction

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

19 code implementations21 Mar 2017 Guokun Lai, Wei-Cheng Chang, Yiming Yang, Hanxiao Liu

Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation.

Multivariate Time Series Forecasting Time Series +1

A Comparative Study of Word Embeddings for Reading Comprehension

no code implementations2 Mar 2017 Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, William W. Cohen

The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures.

BIG-bench Machine Learning Reading Comprehension +1

Adaptive Smoothed Online Multi-Task Learning

no code implementations NeurIPS 2016 Keerthiram Murugesan, Hanxiao Liu, Jaime Carbonell, Yiming Yang

This paper addresses the challenge of jointly learning both the per-task model parameters and the inter-task relationships in a multi-task online learning setting.

Multi-Task Learning

Cross-Graph Learning of Multi-Relational Associations

no code implementations6 May 2016 Hanxiao Liu, Yiming Yang

Cross-graph Relational Learning (CGRL) refers to the problem of predicting the strengths or labels of multi-relational tuples of heterogeneous object types, through the joint inference over multiple graphs which specify the internal connections among each type of objects.

Graph Learning Relational Reasoning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.