Search Results for author: Martin Jaggi

Found 101 papers, 52 papers with code

Self-Supervised Neural Topic Modeling

1 code implementation Findings (EMNLP) 2021 Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff

Topic models are useful tools for analyzing and interpreting the main underlying themes of large corpora of text.

Topic Models

SKILL: Structured Knowledge Infusion for Large Language Models

no code implementations17 May 2022 Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi

The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge.

Knowledge Graphs

Data-heterogeneity-aware Mixing for Decentralized Learning

no code implementations13 Apr 2022 Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs.

Improving Generalization via Uncertainty Driven Perturbations

no code implementations11 Feb 2022 Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova

We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets.

Agree to Disagree: Diversity through Disagreement for Better Transferability

1 code implementation9 Feb 2022 Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy

This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.

OOD Detection

Byzantine-Robust Decentralized Learning via Self-Centered Clipping

1 code implementation3 Feb 2022 Lie He, Sai Praneeth Karimireddy, Martin Jaggi

In this paper, we study the challenging task of Byzantine-robust decentralized training on arbitrary communication graphs.

Federated Learning

Breaking the centralized barrier for cross-device federated learning

no code implementations NeurIPS 2021 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

Interpreting Language Models Through Knowledge Graph Extraction

1 code implementation16 Nov 2021 Vinitra Swamy, Angelika Romanou, Martin Jaggi

In this paper, we compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.

Language Modelling

Linear Speedup in Personalized Collaborative Learning

1 code implementation10 Nov 2021 El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi

Collaborative training can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).

Federated Learning Stochastic Optimization

Optimal Model Averaging: Towards Personalized Collaborative Learning

no code implementations25 Oct 2021 Felix Grimberg, Mary-Anne Hartley, Sai P. Karimireddy, Martin Jaggi

In federated learning, differences in the data or objectives between the participating nodes motivate approaches to train a personalized machine learning model for each node.

Federated Learning

WAFFLE: Weighted Averaging for Personalized Federated Learning

no code implementations13 Oct 2021 Martin Beaussart, Felix Grimberg, Mary-Anne Hartley, Martin Jaggi

Through a series of experiments, we compare our new approach to two recent personalized federated learning methods--Weight Erosion and APFL--as well as two general FL methods--Federated Averaging and SCAFFOLD.

Personalized Federated Learning

RelaySum for Decentralized Deep Learning on Heterogeneous Data

1 code implementation NeurIPS 2021 Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions.

Improved Generalization-Robustness Trade-off via Uncertainty Targeted Attacks

no code implementations29 Sep 2021 Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova

The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical.

On Second-order Optimization Methods for Federated Learning

no code implementations6 Sep 2021 Sebastian Bischoff, Stephan Günnemann, Martin Jaggi, Sebastian U. Stich

We consider federated learning (FL), where the training data is distributed across a large number of clients.

Federated Learning

Semantic Perturbations with Normalizing Flows for Improved Generalization

1 code implementation ICCV 2021 Oguz Kaan Yuksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova

We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective, yielding the first test accuracy improvement results on real-world datasets -- CIFAR-10/100 -- via latent-space perturbations.

Data Augmentation

IFedAvg: Interpretable Data-Interoperability for Federated Learning

1 code implementation14 Jul 2021 David Roschewitz, Mary-Anne Hartley, Luca Corinzia, Martin Jaggi

Thus, enabling the detection of outlier datasets in the federation and also learning the compensation for local data distribution shifts without sharing any original data.

Federated Learning

Implicit Gradient Alignment in Distributed and Federated Learning

no code implementations25 Jun 2021 Yatin Dandi, Luis Barba, Martin Jaggi

A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients, or mini-batches due to heterogeneity and stochasticity of the distributed data.

Federated Learning

Masked Training of Neural Networks with Partial Gradients

no code implementations16 Jun 2021 Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD).

Model Compression

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

1 code implementation ACL 2021 Prakhar Gupta, Martin Jaggi

The advent of contextual word embeddings -- representations of words which incorporate semantic and syntactic information from their context -- has led to tremendous improvements on a wide variety of NLP tasks.

Word Embeddings

Lightweight Cross-Lingual Sentence Representation Learning

1 code implementation ACL 2021 Zhuoyuan Mao, Prakhar Gupta, Pei Wang, Chenhui Chu, Martin Jaggi, Sadao Kurohashi

Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks.

Contrastive Learning Document Classification +2

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

no code implementations3 Mar 2021 Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness.

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

1 code implementation9 Feb 2021 Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

In this paper, we investigate and identify the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity.

Consensus Control for Decentralized Deep Learning

no code implementations9 Feb 2021 Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

Faster Training of Word Embeddings

no code implementations1 Jan 2021 Eliza Wszola, Martin Jaggi, Markus Püschel

Word embeddings have gained increasing popularity in the recent years due to the Word2vec library and its extension fastText that uses subword information.

Word Embeddings

On the Effect of Consensus in Decentralized Deep Learning

no code implementations1 Jan 2021 Tao Lin, Lingjing Kong, Anastasia Koloskova, Martin Jaggi, Sebastian U Stich

Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.

Learning from History for Byzantine Robust Optimization

1 code implementation18 Dec 2020 Sai Praneeth Karimireddy, Lie He, Martin Jaggi

Secondly, we prove that even if the aggregation rules may succeed in limiting the influence of the attackers in a single round, the attackers can couple their attacks across time eventually leading to divergence.

Federated Learning Stochastic Optimization

Practical Low-Rank Communication Compression in Decentralized Deep Learning

1 code implementation NeurIPS 2020 Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi

Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models.

Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

no code implementations28 Sep 2020 Lie He, Sai Praneeth Karimireddy, Martin Jaggi

In Byzantine-robust distributed optimization, a central server wants to train a machine learning model over data distributed across multiple workers.

Distributed Optimization

Sparse Communication for Training Deep Networks

no code implementations19 Sep 2020 Negar Foroutan Eghlidi, Martin Jaggi

Although distributed training reduces the computation time, the communication overhead associated with the gradient exchange forms a scalability bottleneck for the algorithm.

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

1 code implementation8 Aug 2020 Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon.

Federated Learning

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

2 code implementations4 Aug 2020 Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi

Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models.

Multi-Head Attention: Collaborate Instead of Concatenate

2 code implementations29 Jun 2020 Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi

We also show that it is possible to re-parametrize a pre-trained multi-head attention layer into our collaborative attention layer.

Machine Translation Translation

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing

1 code implementation ICLR 2022 Sai Praneeth Karimireddy, Lie He, Martin Jaggi

In Byzantine robust distributed or federated learning, a central server wants to train a machine learning model over data distributed across multiple workers.

Distributed Optimization Federated Learning

Ensemble Distillation for Robust Model Fusion in Federated Learning

1 code implementation NeurIPS 2020 Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.

Federated Learning Knowledge Distillation

Extrapolation for Large-batch Training in Deep Learning

no code implementations ICML 2020 Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data.

Secure Byzantine-Robust Machine Learning

no code implementations8 Jun 2020 Lie He, Sai Praneeth Karimireddy, Martin Jaggi

Increasingly machine learning systems are being deployed to edge servers and devices (e. g. mobile phones) and trained in a collaborative manner.

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

no code implementations EMNLP 2020 Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.

Pretrained Language Models

Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training

no code implementations ICLR 2021 Namhoon Lee, Thalaiyasingam Ajanthan, Philip H. S. Torr, Martin Jaggi

As a result, we find across various workloads of data set, network model, and optimization algorithm that there exists a general scaling trend between batch size and number of training steps to convergence for the effect of data parallelism, and further, difficulty of training under sparsity.

Network Pruning

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

no code implementations ICML 2020 Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency.

Stochastic Optimization

Robust Cross-lingual Embeddings from Parallel Sentences

2 code implementations28 Dec 2019 Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi

Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.

Cross-Lingual Document Classification Cross-Lingual Word Embeddings +5

On the Relationship between Self-Attention and Convolutional Layers

1 code implementation ICLR 2020 Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi

This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice.

Image Classification

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning

no code implementations ICML 2020 Prabhu Teja Sivaprasad, Florian Mai, Thijs Vogels, Martin Jaggi, François Fleuret

The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration.

Model Fusion via Optimal Transport

2 code implementations NeurIPS 2020 Sidak Pal Singh, Martin Jaggi

Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression.

Continual Learning Model Compression +2

On the Tunability of Optimizers in Deep Learning

no code implementations25 Sep 2019 Prabhu Teja S*, Florian Mai*, Thijs Vogels, Martin Jaggi, Francois Fleuret

There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD.

Decentralized Deep Learning with Arbitrary Communication Compression

1 code implementation ICLR 2020 Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters.

Correlating Twitter Language with Community-Level Health Outcomes

1 code implementation WS 2019 Arno Schneuwly, Ralf Grubenmann, Séverine Rion Logean, Mark Cieliebak, Martin Jaggi

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer.

Sentence Embeddings

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

1 code implementation NeurIPS 2019 Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization.

Distributed Optimization

On Linear Learning with Manycore Processors

1 code implementation2 May 2019 Eliza Wszola, Celestine Mendler-Dünner, Martin Jaggi, Markus Püschel

A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator.

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

1 code implementation8 Apr 2019 Martin Josifoski, Ivan S. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West

Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.

Document Embedding

Forecasting intracranial hypertension using multi-scale waveform metrics

no code implementations25 Feb 2019 Matthias Hüser, Adrian Kündig, Walter Karlen, Valeria De Luca, Martin Jaggi

Approach: We developed a prediction framework that forecasts onsets of acute intracranial hypertension in the next 8 hours.

Time Series

Overcoming Multi-Model Forgetting

no code implementations ICLR 2019 Yassine Benyahia, Kaicheng Yu, Kamil Bennani-Smires, Martin Jaggi, Anthony Davison, Mathieu Salzmann, Claudiu Musat

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a subsequent one, due to the overwriting of shared parameters.

Neural Architecture Search

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

3 code implementations1 Feb 2019 Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

Stochastic Optimization

Unsupervised Scalable Representation Learning for Multivariate Time Series

1 code implementation NeurIPS 2019 Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi

Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice.

Representation Learning Time Series

Efficient Greedy Coordinate Descent for Composite Problems

no code implementations16 Oct 2018 Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case.

Sparsified SGD with Memory

1 code implementation NeurIPS 2018 Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i. e. algorithms that leverage the compute power of many devices for training.

Distributed Optimization Quantization

Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

2 code implementations29 Aug 2018 Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We present a framework for building unsupervised representations of entities and their compositions, where each entity is viewed as a probability distribution rather than a vector embedding.

Sentence Embedding Sentence Similarity

Don't Use Large Mini-Batches, Use Local SGD

2 code implementations ICLR 2020 Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.

COLA: Decentralized Linear Learning

1 code implementation NeurIPS 2018 Lie He, An Bian, Martin Jaggi

Decentralized machine learning is a promising emerging paradigm in view of global challenges of data ownership and privacy.

General Classification

A Distributed Second-Order Algorithm You Can Trust

no code implementations ICML 2018 Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi

Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years.

Distributed Optimization Second-order methods

Wasserstein is all you need

no code implementations5 Jun 2018 Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

We propose a unified framework for building unsupervised representations of individual objects or entities (and their compositions), by associating with each object both a distributional as well as a point estimate (vector embedding).

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

no code implementations1 Jun 2018 Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable.

Training DNNs with Hybrid Block Floating Point

no code implementations NeurIPS 2018 Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi

We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic.

On Matching Pursuit and Coordinate Descent

no code implementations ICML 2018 Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U. Stich, Martin Jaggi

Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives.

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

3 code implementations CONLL 2018 Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, Martin Jaggi

EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data.

Keyphrase Extraction Sentence Embeddings

An Accelerated Communication-Efficient Primal-Dual Optimization Framework for Structured Machine Learning

1 code implementation14 Nov 2017 Chenxin Ma, Martin Jaggi, Frank E. Curtis, Nathan Srebro, Martin Takáč

In this paper, an accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of $\mathcal{O}(1/t^2)$ in terms of reducing suboptimality.

Distributed Optimization

Safe Adaptive Importance Sampling

no code implementations NeurIPS 2017 Sebastian U. Stich, Anant Raj, Martin Jaggi

Importance sampling has become an indispensable strategy to speed up optimization algorithms for large-scale applications.

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

1 code implementation NeurIPS 2017 Celestine Dünner, Thomas Parnell, Martin Jaggi

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems.

Learning Aerial Image Segmentation from Online Maps

1 code implementation21 Jul 2017 Pascal Kaiser, Jan Dirk Wegner, Aurelien Lucchi, Martin Jaggi, Thomas Hofmann, Konrad Schindler

We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations.

General Classification Semantic Segmentation

Unsupervised robust nonparametric learning of hidden community properties

no code implementations11 Jul 2017 Mikhail A. Langovoy, Akhilesh Gotmare, Martin Jaggi

We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes according to a binary property, e. g., according to their opinions or preferences on a topic.

Approximate Steepest Coordinate Descent

no code implementations ICML 2017 Sebastian U. Stich, Anant Raj, Martin Jaggi

We propose a new selection rule for the coordinate selection in coordinate descent methods for huge-scale optimization.

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

no code implementations NeurIPS 2017 Francesco Locatello, Michael Tschannen, Gunnar Rätsch, Martin Jaggi

Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees.

Generating Steganographic Text with LSTMs

1 code implementation ACL 2017 Tina Fang, Martin Jaggi, Katerina Argyraki

Motivated by concerns for user privacy, we design a steganographic system ("stegosystem") that enables two users to exchange encrypted messages without an adversary detecting that such an exchange is taking place.

Faster Coordinate Descent via Adaptive Importance Sampling

no code implementations7 Mar 2017 Dmytro Perekrestenko, Volkan Cevher, Martin Jaggi

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems.

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

5 code implementations NAACL 2018 Matteo Pagliardini, Prakhar Gupta, Martin Jaggi

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i. e. semantic representations) of word sequences as well.

Sentence Embeddings Word Embeddings

A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe

no code implementations21 Feb 2017 Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi

Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms.

Screening Rules for Convex Problems

no code implementations23 Sep 2016 Anant Raj, Jakob Olbrich, Bernd Gärtner, Bernhard Schölkopf, Martin Jaggi

We propose a new framework for deriving screening rules for convex optimization problems.

Primal-Dual Rates and Certificates

no code implementations16 Feb 2016 Celestine Dünner, Simone Forte, Martin Takáč, Martin Jaggi

We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates.

Pursuits in Structured Non-Convex Matrix Factorizations

no code implementations12 Feb 2016 Rajiv Khanna, Michael Tschannen, Martin Jaggi

Efficiently representing real world data in a succinct and parsimonious manner is of central importance in many fields.

Distributed Optimization with Arbitrary Local Solvers

1 code implementation13 Dec 2015 Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael. I. Jordan, Peter Richtárik, Martin Takáč

To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.

Distributed Optimization

L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

2 code implementations13 Dec 2015 Virginia Smith, Simone Forte, Michael. I. Jordan, Martin Jaggi

Despite the importance of sparsity in many large-scale applications, there are few methods for distributed optimization of sparsity-inducing objectives.

Distributed Optimization

On the Global Linear Convergence of Frank-Wolfe Optimization Variants

1 code implementation NeurIPS 2015 Simon Lacoste-Julien, Martin Jaggi

In this paper, we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective.

Adding vs. Averaging in Distributed Primal-Dual Optimization

1 code implementation12 Feb 2015 Chenxin Ma, Virginia Smith, Martin Jaggi, Michael. I. Jordan, Peter Richtárik, Martin Takáč

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck.

Distributed Optimization

Communication-Efficient Distributed Dual Coordinate Ascent

no code implementations NeurIPS 2014 Martin Jaggi, Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael. I. Jordan

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning.

Distributed Optimization

An Equivalence between the Lasso and Support Vector Machines

no code implementations5 Mar 2013 Martin Jaggi

As a consequence, many existing optimization algorithms for both SVMs and Lasso can also be applied to the respective other problem instances.

Cannot find the paper you are looking for? You can Submit a new open access paper.