In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization.
An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal.
Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources.
Given the long shelf-life and diverse usage of the resulting datasets, understanding when the data obtained by such auto-labeling systems can be relied on is crucial.
The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications.
Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task.
Ranked #3 on Question Answering on Story Cloze
While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features.
Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space.
The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples.
We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic space.
This makes the performance of NAS approaches in more diverse areas poorly understood.
We apply our decomposition framework to three scenarios -- well-specified, misspecified, and corrected models -- to 1) choose between labeled and unlabeled data and 2) learn from their combination.
We propose a framework that fuses limited label learning and weak supervision for segmentation tasks, enabling users to train high-performing segmentation CNNs with very few hand-labeled training points.
Our goal is to enable machine learning systems to be trained interactively.
However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs.
Ranked #5 on Link Prediction on YAGO3-10
To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.
In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD).
Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence.
The quality of the representations achieved by embeddings is determined by how well the geometry of the embedding space matches the structure of the data.
Labeling training data is a key bottleneck in the modern machine learning pipeline.
Snorkel MeTaL: A framework for training models with multi-task weak supervision
Ranked #1 on Semantic Textual Similarity on SentEval
Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.
After being trained, classifiers must often operate on data that has been corrupted by noise.