no code implementations • 4 Oct 2022 • Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola
Real-world deployment of machine learning models is challenging because data evolves over time.
1 code implementation • 10 Dec 2021 • Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola
In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network.
2 code implementations • 4 Nov 2021 • Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola
We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well.
Ranked #2 on Binary Classification on kickstarter
1 code implementation • NeurIPS 2021 • Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J. Smola, Yuyang Wang, Tim Januschowski
We propose the Recurrent Explicit Duration Switching Dynamical System (RED-SDS), a flexible model that is capable of identifying both state- and time-dependent switching dynamics.
1 code implementation • 21 Jun 2021 • Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola
This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code.
1 code implementation • 26 Feb 2021 • Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive.
1 code implementation • NeurIPS 2021 • Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration.
1 code implementation • 25 Nov 2020 • Jiarui Jin, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola
Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations.
1 code implementation • 1 Jul 2020 • Jiarui Jin, Jiarui Qin, Yuchen Fang, Kounianhua Du, Wei-Nan Zhang, Yong Yu, Zheng Zhang, Alexander J. Smola
To the best of our knowledge, this is the first work providing an efficient neighborhood-based interaction model in the HIN-based recommendations.
no code implementations • 26 Jun 2020 • Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola
This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity.
1 code implementation • NeurIPS 2020 • Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola
Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators.
no code implementations • 6 Apr 2020 • Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola
We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data.
1 code implementation • 14 Feb 2020 • Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola
Transformer has been widely used thanks to its ability to capture sequence information in an efficient way.
2 code implementations • ICLR 2020 • Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola
This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL).
1 code implementation • 5 May 2019 • Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola
Extensive experiments on the Atari-2600 and MuJoCo benchmark suites show that this simple technique is effective in reducing the sample complexity of state-of-the-art algorithms.
1 code implementation • arXiv 2019 • Chenguang Wang, Mu Li, Alexander J. Smola
In this paper, we explore effective Transformer architectures for language model, including adding additional LSTM layers to better capture the sequential context while still keeping the computation efficient.
Ranked #2 on Language Modelling on Penn Treebank (Word Level) (using extra training data)
1 code implementation • CVPR 2018 • Chao-yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
), we propose to train a deep network directly on the compressed video.
Ranked #46 on Action Classification on Charades (using extra training data)
no code implementations • ICLR 2018 • Xun Zheng, Manzil Zaheer, Amr Ahmed, Yu-An Wang, Eric P. Xing, Alexander J. Smola
Long Short-Term Memory (LSTM) is one of the most powerful sequence models.
1 code implementation • 12 Sep 2017 • Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J. Smola, Le Song
Knowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts.
no code implementations • 5 Sep 2017 • Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola
A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.
no code implementations • ICML 2017 • Manzil Zaheer, Amr Ahmed, Alexander J. Smola
Recurrent neural networks, such as long-short term memory (LSTM) networks, are powerful tools for modeling sequential data like user browsing history (Tan et al., 2016; Korpusik et al., 2016) or natural language text (Mikolov et al., 2010).
6 code implementations • ICCV 2017 • Chao-yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions.
Ranked #5 on Image Retrieval on CARS196
no code implementations • 31 Mar 2017 • Hsiao-Yu Fish Tung, Chao-yuan Wu, Manzil Zaheer, Alexander J. Smola
Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models.
1 code implementation • 27 Feb 2017 • Joachim D. Curtó, Irene C. Zarza, Feng Yang, Alexander J. Smola, Fernando de la Torre, Chong-Wah Ngo, Luc van Gool
The algorithm requires to compute the product of Walsh Hadamard Transform (WHT) matrices.
no code implementations • WSDM 2017 • Chao-yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, How Jing
Recommender systems traditionally assume that user profiles and movie attributes are static.
no code implementations • NeurIPS 2016 • Kumar Avinava Dubey, Sashank J. Reddi, Sinead A. Williamson, Barnabas Poczos, Alexander J. Smola, Eric P. Xing
In this paper, we present techniques for reducing variance in stochastic gradient Langevin dynamics, yielding novel stochastic Monte Carlo methods that improve performance by reducing the variance in the stochastic gradient.
no code implementations • NeurIPS 2016 • Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alexander J. Smola
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex.
1 code implementation • 7 Nov 2016 • Ziqi Liu, Alexander J. Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng, Jun Zhou
That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time.
no code implementations • 6 Dec 2015 • Chao-yuan Wu, Alex Beutel, Amr Ahmed, Alexander J. Smola
With this novel technique we propose a new Bayesian model for joint collaborative filtering of ratings and text reviews through a sum of simple co-clusterings.
no code implementations • 20 Aug 2015 • Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola
We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.
no code implementations • 18 May 2015 • Mu Li, Dave G. Andersen, Alexander J. Smola
Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance.
no code implementations • 6 May 2015 • Ziqi Liu, Yu-Xiang Wang, Alexander J. Smola
Differentially private collaborative filtering is a challenging task, both in terms of accuracy and speed.
no code implementations • 31 Dec 2014 • Alex Beutel, Amr Ahmed, Alexander J. Smola
Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data.
no code implementations • 19 Dec 2014 • Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson
Kernel methods have great promise for learning rich statistical representations of large modern datasets.
no code implementations • NeurIPS 2014 • Mu Li, David G. Andersen, Alexander J. Smola, Kai Yu
This paper describes a third-generation parameter server framework for distributed machine learning.
no code implementations • NeurIPS 2014 • Hsiao-Yu Tung, Alexander J. Smola
The Indian Buffet Process is a versatile statistical tool for modeling distributions over binary matrices.
no code implementations • NeurIPS 2013 • Chong Wang, Xi Chen, Alexander J. Smola, Eric P. Xing
We demonstrate how to construct the control variate for two practical problems using stochastic gradient optimization.