1 code implementation • Findings (EMNLP) 2021 • Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff
Topic models are useful tools for analyzing and interpreting the main underlying themes of large corpora of text.
no code implementations • 13 Jul 2023 • Linara Adilova, Asja Fischer, Martin Jaggi
In the federated setup one performs an aggregation of separate local models multiple times during training in order to obtain a stronger global model; most often aggregation is a simple averaging of the parameters.
no code implementations • 14 Jun 2023 • Mariel Werner, Lie He, Sai Praneeth Karimireddy, Michael Jordan, Martin Jaggi
Clustering clients with similar objectives and learning a model per cluster is an intuitive and interpretable approach to personalization in federated learning.
1 code implementation • 1 Jun 2023 • Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret
While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix.
no code implementations • 30 May 2023 • Anastasia Koloskova, Nikita Doikov, Sebastian U. Stich, Martin Jaggi
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data.
1 code implementation • 29 May 2023 • Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi
To facilitate the exchange of expertise among agents, we propose a distillation-based method leveraging unlabeled auxiliary data, which is pseudo-labeled by the collective.
1 code implementation • 26 May 2023 • Atli Kosson, Bettina Messmer, Martin Jaggi
These interactions can give rise to Spherical Motion Dynamics in scale-invariant layers (e. g., normalized layers), which converge to an equilibrium state, where the weight norm and the expected rotational update size are fixed.
1 code implementation • 26 May 2023 • Atli Kosson, Martin Jaggi
Finally, we show that we can eliminate all multiplications in the entire training process, including operations in the forward pass, backward pass and optimizer update, demonstrating the first successful training of modern neural network architectures in a fully multiplication-free fashion.
1 code implementation • 26 May 2023 • Atli Kosson, Dongyang Fan, Martin Jaggi
Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.
2 code implementations • 25 May 2023 • Amirkeivan Mohtashami, Martin Jaggi
While transformers have shown remarkable success in natural language processing, their attention mechanism's large memory requirements have limited their ability to handle longer contexts.
no code implementations • 24 Feb 2023 • Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion
This paper studies first-order algorithms for solving fully composite optimization problems over convex and compact sets.
no code implementations • 23 Feb 2023 • El Mahdi Chayti, Nikita Doikov, Martin Jaggi
Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates.
1 code implementation • 5 Jan 2023 • Thijs Vogels, Hadrien Hendrikx, Martin Jaggi
This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
no code implementations • 1 Dec 2022 • Nikita Doikov, El Mahdi Chayti, Martin Jaggi
This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.
no code implementations • 20 Nov 2022 • Frédéric Berdoz, Abhishek Singh, Martin Jaggi, Ramesh Raskar
To do so, each client releases averaged last hidden layer activations of similar labels to a central server that only acts as a relay (i. e., is not involved in the training or aggregation of the models).
no code implementations • 19 Nov 2022 • Simla Burcu Harma, Ayan Chakraborty, Babak Falsafi, Martin Jaggi, Yunho Oh
The unprecedented growth in DNN model complexity, size, and amount of training data has led to a commensurate increase in demand for computing and a search for minimal encoding.
1 code implementation • 12 Nov 2022 • Cécile Trottet, Thijs Vogels, Martin Jaggi, Mary-Anne Hartley
Data-driven Clinical Decision Support Systems (CDSS) have the potential to improve and standardise care with personalised probabilistic guidance.
1 code implementation • 10 Oct 2022 • Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva, Maria Teleńczuk, Shadi Albarqouni, Salman Avestimehr, Aurélien Bellet, Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi, Giovanni Neglia, Marc Tommasi, Mathieu Andreux
In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL.
no code implementations • 16 Jun 2022 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$.
1 code implementation • 7 Jun 2022 • Thijs Vogels, Hadrien Hendrikx, Martin Jaggi
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster.
no code implementations • 30 May 2022 • Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich
However, we show through a novel set of experiments that the stochastic noise is not sufficient to explain good non-convex training, and that instead the effect of a large learning rate itself is essential for obtaining best performance. We demonstrate the same effects also in the noise-less case, i. e. for full-batch GD.
no code implementations • NAACL 2022 • Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi
The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge.
no code implementations • 13 Apr 2022 • Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich
Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.
no code implementations • 11 Feb 2022 • Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova
We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets.
1 code implementation • 9 Feb 2022 • Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy
This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.
1 code implementation • 3 Feb 2022 • Lie He, Sai Praneeth Karimireddy, Martin Jaggi
In this paper, we study the challenging task of Byzantine-robust decentralized training on arbitrary communication graphs.
no code implementations • NeurIPS 2021 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.
1 code implementation • 16 Nov 2021 • Vinitra Swamy, Angelika Romanou, Martin Jaggi
In this paper, we compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
1 code implementation • 10 Nov 2021 • El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi
Collaborative training can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).
no code implementations • 25 Oct 2021 • Felix Grimberg, Mary-Anne Hartley, Sai P. Karimireddy, Martin Jaggi
In federated learning, differences in the data or objectives between the participating nodes motivate approaches to train a personalized machine learning model for each node.
no code implementations • 13 Oct 2021 • Martin Beaussart, Felix Grimberg, Mary-Anne Hartley, Martin Jaggi
Through a series of experiments, we compare our new approach to two recent personalized federated learning methods--Weight Erosion and APFL--as well as two general FL methods--Federated Averaging and SCAFFOLD.
1 code implementation • NeurIPS 2021 • Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.
no code implementations • 29 Sep 2021 • Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova
The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical.
no code implementations • 6 Sep 2021 • Sebastian Bischoff, Stephan Günnemann, Martin Jaggi, Sebastian U. Stich
We consider federated learning (FL), where the training data is distributed across a large number of clients.
1 code implementation • ICCV 2021 • Oguz Kaan Yuksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova
We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective, yielding the first test accuracy improvement results on real-world datasets -- CIFAR-10/100 -- via latent-space perturbations.
2 code implementations • 14 Jul 2021 • Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz, Satyen Kale, Sai Praneeth Karimireddy, Jakub Konecny, Sanmi Koyejo, Tian Li, Luyang Liu, Mehryar Mohri, Hang Qi, Sashank J. Reddi, Peter Richtarik, Karan Singhal, Virginia Smith, Mahdi Soltanolkotabi, Weikang Song, Ananda Theertha Suresh, Sebastian U. Stich, Ameet Talwalkar, Hongyi Wang, Blake Woodworth, Shanshan Wu, Felix X. Yu, Honglin Yuan, Manzil Zaheer, Mi Zhang, Tong Zhang, Chunxiang Zheng, Chen Zhu, Wennan Zhu
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection.
1 code implementation • 14 Jul 2021 • David Roschewitz, Mary-Anne Hartley, Luca Corinzia, Martin Jaggi
Thus, enabling the detection of outlier datasets in the federation and also learning the compensation for local data distribution shifts without sharing any original data.
no code implementations • 25 Jun 2021 • Yatin Dandi, Luis Barba, Martin Jaggi
A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients, or mini-batches due to heterogeneity and stochasticity of the distributed data.
no code implementations • 16 Jun 2021 • Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich
State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD).
1 code implementation • ACL 2021 • Prakhar Gupta, Martin Jaggi
The advent of contextual word embeddings -- representations of words which incorporate semantic and syntactic information from their context -- has led to tremendous improvements on a wide variety of NLP tasks.
1 code implementation • ACL 2021 • Zhuoyuan Mao, Prakhar Gupta, Pei Wang, Chenhui Chu, Martin Jaggi, Sadao Kurohashi
Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks.
1 code implementation • 15 Apr 2021 • Valerian Rey, Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Gérôme Bovet, Martin Jaggi
In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented.
no code implementations • 3 Mar 2021 • Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi
It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness.
1 code implementation • 9 Feb 2021 • Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity.
no code implementations • 9 Feb 2021 • Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
1 code implementation • 5 Feb 2021 • Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi
We evaluate our findings empirically, and discuss when methods are suitable for CP optimization.
no code implementations • 1 Jan 2021 • Tao Lin, Lingjing Kong, Anastasia Koloskova, Martin Jaggi, Sebastian U Stich
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
no code implementations • 1 Jan 2021 • Eliza Wszola, Martin Jaggi, Markus Püschel
Word embeddings have gained increasing popularity in the recent years due to the Word2vec library and its extension fastText that uses subword information.
1 code implementation • 18 Dec 2020 • Sai Praneeth Karimireddy, Lie He, Martin Jaggi
Secondly, we prove that even if the aggregation rules may succeed in limiting the influence of the attackers in a single round, the attackers can couple their attacks across time eventually leading to divergence.
1 code implementation • NeurIPS 2020 • Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models.
no code implementations • 3 Nov 2020 • Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian U. Stich
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
no code implementations • 28 Sep 2020 • Lie He, Sai Praneeth Karimireddy, Martin Jaggi
In Byzantine-robust distributed optimization, a central server wants to train a machine learning model over data distributed across multiple workers.
no code implementations • 19 Sep 2020 • Negar Foroutan Eghlidi, Martin Jaggi
Although distributed training reduces the computation time, the communication overhead associated with the gradient exchange forms a scalability bottleneck for the algorithm.
1 code implementation • 8 Aug 2020 • Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.
2 code implementations • 4 Aug 2020 • Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models.
2 code implementations • 29 Jun 2020 • Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
We also show that it is possible to re-parametrize a pre-trained multi-head attention layer into our collaborative attention layer.
1 code implementation • ICLR 2021 • Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois Fleuret, Martin Jaggi
Generative Adversarial Networks are notoriously challenging to train.
1 code implementation • ICLR 2022 • Sai Praneeth Karimireddy, Lie He, Martin Jaggi
In Byzantine robust distributed or federated learning, a central server wants to train a machine learning model over data distributed across multiple workers.
1 code implementation • NeurIPS 2020 • Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
no code implementations • ICLR 2020 • Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi
Deep neural networks often have millions of parameters.
no code implementations • ICML 2020 • Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data.
no code implementations • 8 Jun 2020 • Lie He, Sai Praneeth Karimireddy, Martin Jaggi
Increasingly machine learning systems are being deployed to edge servers and devices (e. g. mobile phones) and trained in a collaborative manner.
no code implementations • EMNLP 2020 • Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze
We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.
no code implementations • ICLR 2021 • Namhoon Lee, Thalaiyasingam Ajanthan, Philip H. S. Torr, Martin Jaggi
As a result, we find across various workloads of data set, network model, and optimization algorithm that there exists a general scaling trend between batch size and number of training steps to convergence for the effect of data parallelism, and further, difficulty of training under sparsity.
no code implementations • ICML 2020 • Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.
2 code implementations • 28 Dec 2019 • Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi
Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.
Cross-Lingual Document Classification
Cross-Lingual Word Embeddings
+6
8 code implementations • 10 Dec 2019 • Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao
FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.
1 code implementation • ICLR 2020 • Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice.
Ranked #154 on
Image Classification
on CIFAR-10
no code implementations • ICML 2020 • Prabhu Teja Sivaprasad, Florian Mai, Thijs Vogels, Martin Jaggi, François Fleuret
The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration.
2 code implementations • NeurIPS 2020 • Sidak Pal Singh, Martin Jaggi
Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression.
no code implementations • 25 Sep 2019 • Prabhu Teja S*, Florian Mai*, Thijs Vogels, Martin Jaggi, Francois Fleuret
There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD.
1 code implementation • ICLR 2020 • Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.
1 code implementation • WS 2019 • Arno Schneuwly, Ralf Grubenmann, Séverine Rion Logean, Mark Cieliebak, Martin Jaggi
We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer.
1 code implementation • NeurIPS 2019 • Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization.
1 code implementation • 2 May 2019 • Eliza Wszola, Celestine Mendler-Dünner, Martin Jaggi, Markus Püschel
A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator.
1 code implementation • NAACL 2019 • Prakhar Gupta, Matteo Pagliardini, Martin Jaggi
Pre-trained word vectors are ubiquitous in Natural Language Processing applications.
1 code implementation • 8 Apr 2019 • Martin Josifoski, Ivan S. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West
Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 26 Feb 2019 • Khalil Mrini, Claudiu Musat, Michael Baeriswyl, Martin Jaggi
We show our model's interpretability by visualizing how our model distributes attention inside a document.
no code implementations • 25 Feb 2019 • Matthias Hüser, Adrian Kündig, Walter Karlen, Valeria De Luca, Martin Jaggi
Approach: We developed a prediction framework that forecasts onsets of acute intracranial hypertension in the next 8 hours.
1 code implementation • ICLR 2020 • Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, Mathieu Salzmann
Neural Architecture Search (NAS) aims to facilitate the design of deep networks for new tasks.
no code implementations • ICLR 2019 • Yassine Benyahia, Kaicheng Yu, Kamil Bennani-Smires, Martin Jaggi, Anthony Davison, Mathieu Salzmann, Claudiu Musat
We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters.
3 code implementations • 1 Feb 2019 • Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.
2 code implementations • NeurIPS 2019 • Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice.
1 code implementation • 28 Jan 2019 • Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, Martin Jaggi
These issues arise because of the biased nature of the sign compression operator.
no code implementations • 16 Oct 2018 • Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.
1 code implementation • NeurIPS 2018 • Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i. e. algorithms that leverage the compute power of many devices for training.
2 code implementations • 29 Aug 2018 • Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi
We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding.
2 code implementations • ICLR 2020 • Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.
1 code implementation • NeurIPS 2018 • Lie He, An Bian, Martin Jaggi
Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy.
no code implementations • ICML 2018 • Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi
Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years.
no code implementations • 5 Jun 2018 • Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi
We propose a unified framework for building unsupervised representations of individual objects or entities (and their compositions), by associating with each object both a distributional as well as a point estimate (vector embedding).
no code implementations • 1 Jun 2018 • Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi
We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable.
no code implementations • NeurIPS 2018 • Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi
We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic.
no code implementations • ICML 2018 • Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi
Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives.
3 code implementations • CONLL 2018 • Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, Martin Jaggi
EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data.
1 code implementation • 14 Nov 2017 • Chenxin Ma, Martin Jaggi, Frank E. Curtis, Nathan Srebro, Martin Takáč
In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of $\mathcal{O}(1/t^2)$ in terms of reducing suboptimality.
no code implementations • NeurIPS 2017 • Sebastian U. Stich, Anant Raj, Martin Jaggi
Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications.
1 code implementation • NeurIPS 2017 • Celestine Dünner, Thomas Parnell, Martin Jaggi
We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems.
2 code implementations • 21 Jul 2017 • Pascal Kaiser, Jan Dirk Wegner, Aurelien Lucchi, Martin Jaggi, Thomas Hofmann, Konrad Schindler
We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations.
no code implementations • 11 Jul 2017 • Mikhail A. Langovoy, Akhilesh Gotmare, Martin Jaggi
We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes according to a binary property, e. g., according to their opinions or preferences on a topic.
no code implementations • ICML 2017 • Sebastian U. Stich, Anant Raj, Martin Jaggi
We propose a new selection rule for the coordinate selection in coordinate descent methods for huge-scale optimization.
no code implementations • NeurIPS 2017 • Francesco Locatello, Michael Tschannen, Gunnar Rätsch, Martin Jaggi
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees.
1 code implementation • ACL 2017 • Tina Fang, Martin Jaggi, Katerina Argyraki
Motivated by concerns for user privacy, we design a steganographic system ("stegosystem") that enables two users to exchange encrypted messages without an adversary detecting that such an exchange is taking place.
5 code implementations • NAACL 2018 • Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i. e. semantic representations) of word sequences as well.
no code implementations • 7 Mar 2017 • Dmytro Perekrestenko, Volkan Cevher, Martin Jaggi
Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems.
1 code implementation • 7 Mar 2017 • Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
This paper presents a novel approach for multi-lingual sentiment classification in short texts.
no code implementations • 21 Feb 2017 • Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi
Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms.
2 code implementations • 7 Nov 2016 • Virginia Smith, Simone Forte, Chenxin Ma, Martin Takac, Michael. I. Jordan, Martin Jaggi
The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning.
no code implementations • 23 Sep 2016 • Anant Raj, Jakob Olbrich, Bernd Gärtner, Bernhard Schölkopf, Martin Jaggi
We propose a new framework for deriving screening rules for convex optimization problems.
no code implementations • 16 Feb 2016 • Celestine Dünner, Simone Forte, Martin Takáč, Martin Jaggi
We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates.
no code implementations • 12 Feb 2016 • Rajiv Khanna, Michael Tschannen, Martin Jaggi
Efficiently representing real world data in a succinct and parsimonious manner is of central importance in many fields.
1 code implementation • 13 Dec 2015 • Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč
To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.
2 code implementations • 13 Dec 2015 • Virginia Smith, Simone Forte, Michael. I. Jordan, Martin Jaggi
Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.
1 code implementation • NeurIPS 2015 • Simon Lacoste-Julien, Martin Jaggi
In this paper, we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective.
1 code implementation • 12 Feb 2015 • Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč
Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.
no code implementations • NeurIPS 2014 • Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael. I. Jordan
Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning.
no code implementations • 5 Mar 2013 • Martin Jaggi
As a consequence, many existing optimization algorithms for both SVMs and Lasso can also be applied to the respective other problem instances.