no code implementations • 29 May 2023 • Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laundon, Jeff Dean
Using this insight, we develop a complex block, named Brainformer, that consists of a diverse sets of layers such as sparsely gated feed-forward layers, dense feed-forward layers, attention layers, and various forms of layer normalization and activation functions.
We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").
We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.
On diffusion models, Lion outperforms Adam by achieving a better FID score and reducing the training compute by up to 2. 3x.
For industrial-scale advertising systems, prediction of ad click-through rate (CTR) is a central problem.
The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc.
In deep learning area, large-scale image datasets bring a breakthrough in the success of object recognition and retrieval.
In our work, we target the optimization of hardware and software configurations on an industry-standard edge accelerator.
We also design an efficient dynamic programming algorithm to decode segments that allows the model to be trained faster than the existing neural phrase-based machine translation method by Huang et al. (2018).