Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.
Automatic Machine Learning Model Selection
Model Selection
+2
Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning.
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering.
Ranked #19 on
Question Answering
on WebQuestions
We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines.
Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample.
We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP).
Ranked #1 on
Image-to-Text Retrieval
on COCO
We discuss how Pyserini - a widely used toolkit for reproducible IR research can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Ranked #1 on
Zero-Shot Learning
on COCO-MLT
(using extra training data)
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations, but is expensive, even if a sampling-based approximation is used.
Given the knowledge gap on current PTLM release practices, our empirical study uses a mixed-methods approach to analyze the releases of 52, 227 PTLMs on the most well-known model registry, HF.
Software Engineering