Search Results for author: Andrew Tulloch

Found 8 papers, 1 papers with code

MTrainS: Improving DLRM training efficiency using heterogeneous memories

no code implementations • 19 Apr 2023 • Hiwot Tadese Kassa, Paul Johnson, Jason Akers, Mrinmoy Ghosh, Andrew Tulloch, Dheevatsa Mudigere, Jongsoo Park, Xing Liu, Ronald Dreslinski, Ehsan K. Ardestani

In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth.

Paper
Add Code

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

no code implementations • 12 Apr 2021 • Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, Vijay Rao

Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers.

Paper
Add Code

Mixed-Precision Embedding Using a Cache

no code implementations • 21 Oct 2020 • Jie Amy Yang, Jianyu Huang, Jongsoo Park, Ping Tak Peter Tang, Andrew Tulloch

We propose a novel change to embedding tables using a cache memory architecture, where the majority of rows in an embedding is trained in low precision, and the most frequently or recently accessed rows cached and trained in full precision.

Quantization Recommendation Systems

Paper
Add Code

Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition

no code implementations • 19 Nov 2019 • Bing Xu, Andrew Tulloch, Yunpeng Chen, Xiaomeng Yang, Lin Qiao

We propose a new building block, IdleBlock, which naturally prunes connections within the block.

Neural Architecture Search

Paper
Add Code

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

no code implementations • 24 Nov 2018 • Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, Juan Pino, Martin Schatz, Alexander Sidorov, Viswanath Sivakumar, Andrew Tulloch, Xiaodong Wang, Yiming Wu, Hector Yuen, Utku Diril, Dmytro Dzhulgakov, Kim Hazelwood, Bill Jia, Yangqing Jia, Lin Qiao, Vijay Rao, Nadav Rotem, Sungjoo Yoo, Mikhail Smelyanskiy

The application of deep learning techniques resulted in remarkable improvement of machine learning models.

BIG-bench Machine Learning

Paper
Add Code

On Periodic Functions as Regularizers for Quantization of Neural Networks

no code implementations • 24 Nov 2018 • Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

We apply these functions component-wise and add the sum over the model parameters as a regularizer to the model loss during training.

Quantization

Paper
Add Code

High performance ultra-low-precision convolutions on mobile devices

no code implementations • 6 Dec 2017 • Andrew Tulloch, Yangqing Jia

Many applications of mobile deep learning, especially real-time computer vision workloads, are constrained by computation power.

Vocal Bursts Intensity Prediction

Paper
Add Code

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

70 code implementations • 8 Jun 2017 • Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

Stochastic Optimization

4,319

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.