Unsupervised Pre-training
103 papers with code • 2 benchmarks • 7 datasets
Pre-training a neural network using unsupervised (self-supervised) auxiliary tasks on unlabeled data.
Libraries
Use these libraries to find Unsupervised Pre-training models and implementationsLatest papers
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration.
A Survey on Data Selection for Language Models
A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training.
Foundation Policies with Hilbert Representations
While a number of methods have been proposed to enable generic self-supervised RL, based on principles such as goal-conditioned RL, behavioral cloning, and unsupervised skill learning, such methods remain limited in terms of either the diversity of the discovered behaviors, the need for high-quality demonstration data, or the lack of a clear prompting or adaptation mechanism for downstream tasks.
Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval
In this study, we aim to shed light on this issue by revealing that masked auto-encoder (MAE) pre-training with enhanced decoding significantly improves the term coverage of input tokens in dense representations, compared to vanilla BERT checkpoints.
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding
In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Through our experiments in five locomotion and manipulation environments, we demonstrate that METRA can discover a variety of useful behaviors even in complex, pixel-based environments, being the first unsupervised RL method that discovers diverse locomotion behaviors in pixel-based Quadruped and Humanoid.
Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR Data
Airborne LiDAR systems have the capability to capture the Earth's surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates.
HIQL: Offline Goal-Conditioned RL with Latent States as Actions
This structure can be very useful, as assessing the quality of actions for nearby goals is typically easier than for more distant goals.
Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction
Our proposed method, UPNA (Unsupervised Pre-training of Node Attributes), solves the inductive link prediction problem by learning a function that takes a pair of node attributes and predicts the probability of an edge, as opposed to Graph Neural Networks (GNN), which can be prone to topological shortcuts in graphs with power-law degree distribution.
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
To tackle this issue, we introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling to overcome the complexity and diversity of in-the-wild videos and facilitate knowledge transfer between distinct scenes.