Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available benchmarks.
In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations.
Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas.
A majority of recent developments in neural architecture search (NAS) have been aimed at decreasing the computational cost of various techniques without affecting their final performance.
Our search outputs a suite of models which Pareto-dominate all other high-performance architectures and existing bias mitigation methods in terms of accuracy and fairness, often by large margins, on the two most widely used datasets for face identification, CelebA and VGGFace2.
The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications.
Zero-cost proxies (ZC proxies) are a recent architecture performance prediction technique aiming to significantly speed up algorithms for neural architecture search (NAS).
By using far more meta-training data than prior work, RecZilla is able to substantially reduce the level of human involvement when faced with a new recommender system application.
The release of tabular benchmarks, such as NAS-Bench-101 and NAS-Bench-201, has significantly lowered the computational overhead for conducting scientific research in neural architecture search (NAS).
While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research.
In this work, we address this issue by releasing XAI-Bench: a suite of synthetic datasets along with a library for benchmarking feature attribution algorithms.
Early methods in the rapidly developing field of neural architecture search (NAS) required fully training thousands of neural networks.
First we formally define architecture encodings and give a theoretical characterization on the scalability of the encodings we study Then we identify the main encoding-dependent subroutines which NAS algorithms employ, running experiments to show which encodings work best with each subroutine for many popular algorithms.
Intra-processing methods are designed specifically to debias large models which have been trained on a generic dataset and fine-tuned on a more specific task.
In this work, we show that (1) the simplest hill-climbing algorithm is a powerful baseline for NAS, and (2), when the noise in popular NAS benchmark datasets is reduced to a minimum, hill-climbing to outperforms many popular state-of-the-art algorithms.
Bayesian optimization (BO), which has long had success in hyperparameter optimization, has recently emerged as a very promising strategy for NAS when it is coupled with a neural predictor.
We develop a path-based encoding scheme to featurize the neural architectures that are used to train the neural network model.
The typical idea is to design a clustering algorithm that outputs a near-optimal solution, provided the data satisfy a natural stability notion.
In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers.
We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance.
However, for real-valued functions, cardinal labels might not be accessible, or it may be difficult for an expert to consistently assign real-valued labels over the entire set of examples.
In this work, we take this approach and provide strong positive results both for the asymmetric and symmetric $k$-center problems under a natural input stability (promise) condition called $\alpha$-perturbation resilience [Bilu and Linia 2012], which states that the optimal solution does not change under any alpha-factor perturbation to the input distances.