Search Results for author: Frederick Liu

Found 20 papers, 9 papers with code

PaLM 2 Technical Report

no code implementations17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Language Modelling

Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

no code implementations13 Feb 2023 Maximilian Mozes, Tolga Bolukbasi, Ann Yuan, Frederick Liu, Nithum Thain, Lucas Dixon

In this paper, we explore the use of TracIn to improve model performance in the parameter-efficient tuning (PET) setting.

Decision Making Transfer Learning

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

no code implementations1 Feb 2023 Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result.

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

1 code implementation21 Oct 2022 Ziqi Wang, Yuexin Wu, Frederick Liu, Daogao Liu, Le Hou, Hongkun Yu, Jing Li, Heng Ji

However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models).

Data Augmentation Knowledge Distillation

DETR++: Taming Your Multi-Scale Detection Transformer

no code implementations7 Jun 2022 Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, Jindong Chen

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12].

object-detection Small Object Detection

Chefs' Random Tables: Non-Trigonometric Random Features

1 code implementation30 May 2022 Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels.

Towards Tracing Factual Knowledge in Language Models Back to the Training Data

1 code implementation23 May 2022 Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu

In this paper, we propose the problem of fact tracing: identifying which training examples taught an LM to generate a particular factual assertion.

Information Retrieval Retrieval

First is Better Than Last for Language Data Influence

1 code implementation24 Feb 2022 Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep Ravikumar

However, we observe that since the activation connected to the last layer of weights contains "shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other.

Threading the Needle of On and Off-Manifold Value Functions for Shapley Explanations

no code implementations24 Feb 2022 Chih-Kuan Yeh, Kuan-Yun Lee, Frederick Liu, Pradeep Ravikumar

We formalize the desiderata of value functions that respect both the model and the data manifold in a set of axioms and are robust to perturbation on off-manifold regions, and show that there exists a unique value function that satisfies these axioms, which we term the Joint Baseline value function, and the resulting Shapley value the Joint Baseline Shapley (JBshap), and validate the effectiveness of JBshap in experiments.

Feature Importance

EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models

1 code implementation16 Oct 2021 Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li

Pre-trained encoder-decoder transformer architectures have become increasingly popular recently with the advent of T5 models.

Language Modelling Multi-Label Classification +2

Leveraging redundancy in attention with Reuse Transformers

1 code implementation13 Oct 2021 Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

1 code implementation NeurIPS 2021 Jiefeng Chen, Frederick Liu, Besim Avci, Xi Wu, YIngyu Liang, Somesh Jha

This observation leads to two challenging tasks: (1) unsupervised accuracy estimation, which aims to estimate the accuracy of a pre-trained classifier on a set of unlabeled test inputs; (2) error detection, which aims to identify mis-classified test inputs.

The Penalty Imposed by Ablated Data Augmentation

no code implementations8 Jun 2020 Frederick Liu, Amir Najmi, Mukund Sundararajan

There is a set of data augmentation techniques that ablate parts of the input at random.

Data Augmentation

Estimating Training Data Influence by Tracing Gradient Descent

3 code implementations NeurIPS 2020 Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale

We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model.

Handling Homographs in Neural Machine Translation

no code implementations NAACL 2018 Frederick Liu, Han Lu, Graham Neubig

Homographs, words with different meanings but the same surface form, have long caused difficulty for machine translation systems, as it is difficult to select the correct translation based on the context.

Machine Translation NMT +3

Learning Character-level Compositionality with Visual Features

2 code implementations ACL 2017 Frederick Liu, Han Lu, Chieh Lo, Graham Neubig

Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words.

text-classification Text Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.