no code implementations • 28 Oct 2024 • Guneet S. Dhillon, Xingjian Shi, Yee Whye Teh, Alex Smola
Supervised fine-tuning (SFT) and alignment of large language models (LLMs) are key steps in providing a good user experience.
1 code implementation • 10 Apr 2023 • Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang
Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation.
1 code implementation • NeurIPS 2023 • Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu sun
This work proposes POMP, a prompt pre-training method for vision-language models.
1 code implementation • 6 Feb 2023 • Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton
Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored.
3 code implementations • 2 Feb 2023 • Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola
Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
Ranked #4 on
Science Question Answering
on ScienceQA
no code implementations • 4 Jan 2023 • Jiaao Chen, Aston Zhang, Xingjian Shi, Mu Li, Alex Smola, Diyi Yang
We discover the following design patterns: (i) group layers in a spindle pattern; (ii) allocate the number of trainable parameters to layers uniformly; (iii) tune all the groups; (iv) assign proper tuning strategies to different groups.
5 code implementations • 7 Oct 2022 • Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola
Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting.
Ranked #4 on
Question Answering
on TruthfulQA
(EM metric)
1 code implementation • 4 Jul 2022 • Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li, Alex Smola, Zhangyang Wang
However, in real-world applications, it is common for the training sets to have long-tailed distributions.
2 code implementations • NeurIPS 2021 • Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton
Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier.
1 code implementation • NeurIPS 2021 • Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary Chase Lipton
Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)---determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning---given such an estimate, learning the desired positive-versus-negative classifier.
1 code implementation • ICML Workshop AutoML 2021 • Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alex Smola
We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well.
no code implementations • 1 Jan 2021 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Tong He, Yong Yu, Zheng Zhang, Alex Smola
In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation.
no code implementations • 1 Jan 2021 • Jiarui Jin, Cong Chen, Ming Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Yong Yu, Jun Wang, Alex Smola
Goal-oriented reinforcement learning algorithms are often good at exploration, not exploitation, while episodic algorithms excel at exploitation, not exploration.
no code implementations • 1 Jan 2021 • Rasool Fakoor, Pratik Anil Chaudhari, Jonas Mueller, Alex Smola
We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data.
no code implementations • 16 May 2020 • Hyokun Yun, Michael Froh, Roshan Makhijani, Brian Luc, Alex Smola, Trishul Chilimbi
Tiering is an essential technique for building large-scale information retrieval systems.
no code implementations • 11 Sep 2019 • Jonas Mueller, Alex Smola
A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable.
no code implementations • 28 May 2019 • Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski
We provide both theoretical and empirical evidence for the soundness of our approach through a necessary and sufficient decomposition of exchangeable time series into a global and a local part.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 30 Nov 2018 • Danielle C. Maddix, Yuyang Wang, Alex Smola
A large collection of time series poses significant challenges for classical and neural forecasting approaches.
no code implementations • ICML 2018 • Hanjun Dai, Zornitsa Kozareva, Bo Dai, Alex Smola, Le Song
Many graph analytics problems can be solved via iterative algorithms where the solutions are often characterized by a set of steady-state conditions.
no code implementations • 4 Jun 2018 • Emmanouil Antonios Platanios, Alex Smola
We propose an algorithm for deep learning on networks and graphs.
2 code implementations • ICML 2018 • Zachary C. Lipton, Yu-Xiang Wang, Alex Smola
Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels.
8 code implementations • ICLR 2018 • Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum
Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information.
no code implementations • ICML 2017 • Manzil Zaheer, Satwik Kottur, Amr Ahmed, José Moura, Alex Smola
In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space.
no code implementations • 14 Feb 2017 • Han Zhao, Otilia Stretcu, Alex Smola, Geoff Gordon
In this paper, we consider a formulation of multitask learning that learns the relationships both between tasks and between features, represented through a task covariance and a feature covariance matrix, respectively.
1 code implementation • 14 Nov 2016 • Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton
In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples.
no code implementations • 24 Aug 2016 • Sashank J. Reddi, Jakub Konečný, Peter Richtárik, Barnabás Póczós, Alex Smola
It is well known that DANE algorithm does not match the communication complexity lower bounds.
no code implementations • 27 Jul 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola
Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting.
no code implementations • EACL 2017 • Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola
Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future.
no code implementations • 23 May 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola
This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions.
no code implementations • 19 Mar 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola
We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$.
no code implementations • 19 Mar 2016 • Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them.
no code implementations • 15 Dec 2015 • Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, Maria Florina Balcan, Alex Smola
In distributed machine learning, data is dispatched to multiple machines for processing.
16 code implementations • CVPR 2016 • Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola
Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively.
Ranked #5 on
Visual Question Answering (VQA)
on VQA v1 test-std
no code implementations • NeurIPS 2015 • Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, Alex Smola
We demonstrate the empirical performance of our method through a concrete realization of asynchronous SVRG.
no code implementations • 26 Feb 2015 • Yu-Xiang Wang, Stephen E. Fienberg, Alex Smola
We consider the problem of Bayesian learning on sensitive datasets and present two simple but somewhat surprising results that connect Bayesian learning to "differential privacy:, a cryptographic approach to protect individual-level privacy while permiting database-level utility.
1 code implementation • ICCV 2015 • Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang
The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters.
Ranked #44 on
Image Classification
on MNIST
no code implementations • 28 Oct 2014 • Yu-Xiang Wang, James Sharpnack, Alex Smola, Ryan J. Tibshirani
We introduce a family of adaptive estimators on graphs, based on penalizing the $\ell_1$ norm of discrete graph differences.
no code implementations • 3 May 2014 • Yu-Xiang Wang, Alex Smola, Ryan J. Tibshirani
We study a novel spline-like basis, which we name the "falling factorial basis", bearing many similarities to the classic truncated power basis.
no code implementations • 1 Feb 2014 • David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf
Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.
1 code implementation • 15 Mar 2012 • Yutian Chen, Max Welling, Alex Smola
We extend the herding algorithm to continuous spaces by using the kernel trick.
no code implementations • 12 Feb 2009 • Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola
Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation.