# Fair k-Centers via Maximum Matching

The field of algorithms has seen a push for fairness, or the removal of inherent bias, in recent history.

# Mixture of Experts Meets Prompt-Based Continual Learning

Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer.

# Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts

The cosine router in sparse Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router.

# Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts

no code implementations22 May 2024, ,

The softmax gating function is arguably the most popular choice in mixture of experts modeling.

# On Parameter Estimation in Deviated Gaussian Mixture of Experts

no code implementations7 Feb 2024, ,

We consider the parameter estimation problem in the deviated Gaussian mixture of experts in which the data are generated from $(1 - \lambda^{\ast}) g_0(Y| X)+ \lambda^{\ast} \sum_{i = 1}^{k_{\ast}} p_{i}^{\ast} f(Y|(a_{i}^{\ast})^{\top}X+b_i^{\ast},\sigma_{i}^{\ast})$, where $X, Y$ are respectively a covariate vector and a response variable, $g_{0}(Y|X)$ is a known function, $\lambda^{\ast} \in [0, 1]$ is true but unknown mixing proportion, and $(p_{i}^{\ast}, a_{i}^{\ast}, b_{i}^{\ast}, \sigma_{i}^{\ast})$ for $1 \leq i \leq k^{\ast}$ are unknown parameters of the Gaussian mixture of experts.

# On Least Square Estimation in Softmax Gating Mixture of Experts

Mixture of experts (MoE) model is a statistical machine learning design that aggregates multiple expert networks using a softmax gating function in order to form a more intricate and expressive model.

# FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples.

# CompeteSMoE - Effective Training of Sparse Mixture of Experts via Competition

Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width.

3

# Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

no code implementations25 Jan 2024, ,

We demonstrate that due to interactions between the temperature and other model parameters via some partial differential equations, the convergence rates of parameter estimations are slower than any polynomial rates, and could be as slow as $\mathcal{O}(1/\log(n))$, where $n$ denotes the sample size.

# AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification

To address this, we introduce AG-ReID. v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios.

9

# A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications.

# Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts

no code implementations25 Sep 2023, , ,

When the true number of experts $k_{\ast}$ is known, we demonstrate that the convergence rates of density and parameter estimations are both parametric on the sample size.

# OpportunityFinder: A Framework for Automated Causal Inference

We introduce OpportunityFinder, a code-less framework for performing a variety of causal inference studies with panel data for non-expert users.

# Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts

1 code implementation12 May 2023, , ,

Originally introduced as a neural network for ensemble learning, mixture of experts (MoE) has recently become a fundamental building block of highly successful modern deep neural networks for heterogeneous data analysis in several applications of machine learning and statistics.

0

# Aerial-Ground Person Re-ID

Our dataset presents a novel elevated-viewpoint challenge for person re-ID due to the significant difference in person appearance across these cameras.

10

# AutoWS: Automated Weak Supervision Framework for Text Classification

Multiple techniques have been developed to either decrease the dependence of labeled data (zero/few-shot learning, weak supervision) or to improve the efficiency of labeling process (active learning).

# Fast Approximation of the Generalized Sliced-Wasserstein Distance

Generalized sliced Wasserstein distance is a variant of sliced Wasserstein distance that exploits the power of non-linear projection through a given defining function to better capture the complex structures of the probability distributions.

# Hierarchical Sliced Wasserstein Distance

We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively.

7

# Predicting Mutual Funds' Performance using Deep Learning and Ensemble Techniques

no code implementations18 Sep 2022, , , ,

Predicting fund performance is beneficial to both investors and fund managers, and yet is a challenging task.

# Robust Product Classification with Instance-Dependent Noise

Training a product title classification model which is robust to noisy labels in the data is very important to make product classification applications more practical.

# Generative Adversarial Networks and Image-Based Malware Classification

We also evaluate the utility of the GAN generative model for adversarial attacks on image-based malware detection.

# On Label Shift in Domain Adaptation via Wasserstein Distance

We study the label shift problem between the source and target domains in general domain adaptation (DA) settings.

# Entropic Gromov-Wasserstein between Gaussian Distributions

no code implementations24 Aug 2021, , , , ,

When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions.

# On Multimarginal Partial Optimal Transport: Equivalent Forms and Computational Complexity

no code implementations18 Aug 2021, , ,

We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $\tilde{\mathcal{O}}(m^3(n+1)^{m}/ \varepsilon^2)$ where $\varepsilon > 0$ stands for the desired tolerance.

# On Robust Optimal Transport: Computational Complexity and Barycenter Computation

We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence.

# Physical rendering of synthetic spaces for topological sound transport

Synthetic dimensions can be rendered in the physical space and this has been achieved with photonics and cold atomic gases, however, little to no work has been succeeded in acoustics because acoustic wave-guides cannot be weakly coupled in a continuous fashion.

Mesoscale and Nanoscale Physics Classical Physics

# EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing Data

The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics.

0

# Differentially private $k$-means clustering via exponential mechanism and max cover

We introduce a new $(\epsilon_p, \delta_p)$-differentially private algorithm for the $k$-means clustering problem.

# Differentially Private Decomposable Submodular Maximization

We extend this work by designing differentially private algorithms for both monotone and non-monotone decomposable submodular maximization under general matroid constraints, with competitive utility guarantees.

# Development of a Robotic System for Automated Decaking of 3D-Printed Parts

With the rapid rise of 3D-printing as a competitive mass manufacturing method, manual "decaking" - i. e. removing the residual powder that sticks to a 3D-printed part - has become a significant bottleneck.

Robotics

# Automated Essay Scoring with Discourse-Aware Neural Models

Automated essay scoring systems typically rely on hand-crafted features to predict essay quality, but such systems are limited by the cost of feature engineering.

# Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling

Our approach significantly outperforms the state-of-the-art on the DeepFashion dataset.

# A Deep Neural Architecture for Sentence-level Sentiment Classification in Twitter Social Networking

1 code implementation25 Jun 2017,

This paper introduces a novel deep learning framework including a lexicon-based approach for sentence-level prediction of sentiment label distribution.

0

# LOH and behold: Web-scale visual search, recommendation and clustering using Locally Optimized Hashing

We propose a novel hashing-based matching scheme, called Locally Optimized Hashing (LOH), based on a state-of-the-art quantization algorithm that can be used for efficient, large-scale search, recommendation, clustering, and deduplication.

# Subspace Embeddings for the Polynomial Kernel

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms.

# Improving Peer Feedback Prediction: The Sentence Level is Right

Cannot find the paper you are looking for? You can Submit a new open access paper.