We present a foundation model for zero-shot metric monocular depth estimation.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e. g., PSNR, SSIM) and by perceptual quality measures (e. g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality.
Ranked #1 on Blind Face Restoration on CelebA-Test (FID metric)
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs).
By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10, 000 words while maintaining output quality.
Previous robot learning methods often collect data to train with one specific embodiment for one task, which is expensive and prone to overfitting.
We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.
We build our model based on the latest Llama-3. 1-8B-Instruct model.
Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits).