Search Results for author: Shital Shah

Found 15 papers, 8 papers with code

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

1 code implementation22 Apr 2024 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Language Modelling

Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

no code implementations6 Oct 2022 Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah

In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e. g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models.

Inductive Bias

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

1 code implementation4 Mar 2022 Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey

Results show that the perplexity of 16-layer GPT-2 and Transformer-XL can be achieved with up to 1. 5x, 2. 5x faster runtime and 1. 2x, 2. 0x lower peak memory utilization.

Decoder Language Modelling +1

Ranking Convolutional Architectures by their Feature Extraction Capabilities

no code implementations29 Sep 2021 Debadeepta Dey, Shital Shah, Sebastien Bubeck

We propose a simple but powerful method which we call FEAR, for ranking architectures in any search space.

Neural Architecture Search

FEAR: A Simple Lightweight Method to Rank Architectures

1 code implementation7 Jun 2021 Debadeepta Dey, Shital Shah, Sebastien Bubeck

We propose a simple but powerful method which we call FEAR, for ranking architectures in any search space.

Neural Architecture Search

Ranking Architectures by Feature Extraction Capabilities

no code implementations ICML Workshop AutoML 2021 Debadeepta Dey, Shital Shah, Sebastien Bubeck

By training different architectures in the search space to the same training or validation error and subsequently comparing the usefulness of the features extracted on the task-dataset of interest by freezing most of the architecture we obtain quick estimates of the relative performance.

Neural Architecture Search

Understanding Failures of Deep Networks via Robust Feature Extraction

1 code implementation CVPR 2021 Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, Eric Horvitz

Traditional evaluation metrics for learned models that report aggregate scores over a test set are insufficient for surfacing important and informative patterns of failure over features and instances.

An Empirical Analysis of Backward Compatibility in Machine Learning Systems

no code implementations11 Aug 2020 Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz

In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance.

BIG-bench Machine Learning

A System for Real-Time Interactive Analysis of Deep Learning Training

1 code implementation5 Jan 2020 Shital Shah, Roland Fernandez, Steven Drucker

To achieve this, we model various exploratory inspection and diagnostic tasks for deep learning training processes as specifications for streams using a map-reduce paradigm with which many data scientists are already familiar.

3D Action Recognition

A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities

1 code implementation19 Sep 2019 Deepali Aneja, Daniel McDuff, Shital Shah

Embodied avatars as virtual agents have many applications and provide benefits over disembodied agents, allowing non-verbal social and interactional cues to be leveraged, in a similar manner to how humans interact with each other.

AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles

25 code implementations15 May 2017 Shital Shah, Debadeepta Dey, Chris Lovett, Ashish Kapoor

Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process.

Autonomous Vehicles

Submodular Trajectory Optimization for Aerial 3D Scanning

no code implementations ICCV 2017 Mike Roberts, Debadeepta Dey, Anh Truong, Sudipta Sinha, Shital Shah, Ashish Kapoor, Pat Hanrahan, Neel Joshi

Drones equipped with cameras are emerging as a powerful tool for large-scale aerial 3D scanning, but existing automatic flight planners do not exploit all available information about the scene, and can therefore produce inaccurate and incomplete 3D models.

Trajectory Planning

Cannot find the paper you are looking for? You can Submit a new open access paper.