Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

apple/ml-depth-pro 2 Oct 2024

We present a foundation model for zero-shot metric monocular depth estimation.

Monocular Depth Estimation

2,216
9.50 stars / hour

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

microsoft/vptq 25 Sep 2024

Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits).

Quantization

306
1.50 stars / hour

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

hlt-mt/mosel 1 Oct 2024

The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models.

Automatic Speech Recognition speech-recognition +1

119
1.02 stars / hour

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

verazuo/jailbreak_llms 7 Aug 2023

We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.

Community Detection

2,323
0.90 stars / hour

Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

liruiw/HPT 30 Sep 2024

Previous robot learning methods often collect data to train with one specific embodiment for one task, which is expensive and prone to overfitting.

165
0.87 stars / hour

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

lizonghang/tpi-llm 1 Oct 2024

In this paper, we argue that tensor parallelism can be more effective than pipeline on low-resource devices, and present a compute- and memory-efficient tensor parallel inference system, named TPI-LLM, to serve 70B-scale models.

133
0.84 stars / hour

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

ohayonguy/PMRF 1 Oct 2024

Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e. g., PSNR, SSIM) and by perceptual quality measures (e. g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality.

 Ranked #1 on Blind Face Restoration on CelebA-Test (FID metric)

Blind Face Restoration Image Colorization +5

220
0.74 stars / hour

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

thu-ml/SageAttention 3 Oct 2024

Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer.

Image Generation Quantization +1

115
0.67 stars / hour

One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion

nico-bohlinger/one_policy_to_run_them_all 10 Sep 2024

Our experiments show that URMA can learn a locomotion policy on multiple embodiments that can be easily transferred to unseen robot platforms in simulation and the real world.

reinforcement-learning Reinforcement Learning

66
0.63 stars / hour

MinerU: An Open-Source Solution for Precise Document Content Extraction

opendatalab/mineru 27 Sep 2024

Document content analysis has been a crucial research area in computer vision.

Diversity Optical Character Recognition (OCR)

12,641
0.62 stars / hour