Depth Anything V2

DepthAnything/Depth-Anything-V2 13 Jun 2024

This work presents Depth Anything V2.

Monocular Depth Estimation

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

buaacyw/meshanything 14 Jun 2024

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement.


Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

AiuniAI/Unique3D 30 May 2024

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability.

Image to 3D Single-View 3D Reconstruction

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

deepseek-ai/deepseek-coder-v2 17 Jun 2024

Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

Language Modelling

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

ictnlp/streamspeech 5 Jun 2024

Simultaneous speech-to-speech translation (Simul-S2ST, a. k. a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication.

Automatic Speech Recognition (ASR) de-en

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

digitalphonetics/ims-toucan 10 Jun 2024

In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development.

Meta-Learning Speech Synthesis

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

gersteinlab/ml-bench 16 Nov 2023

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e. g., coming up with the right arguments for calling routines), requiring a deeper comprehension of complex file interactions.

Code Generation Navigate

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

google-deepmind/loft 19 Jun 2024

Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.


TextGrad: Automatic "Differentiation" via Text

zou-group/textgrad 11 Jun 2024

Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity.

Question Answering Specificity

Structure-Aware Sparse-View X-ray 3D Reconstruction

caiyuanhao1998/sax-nerf CVPR 2024

In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction.

3D Reconstruction Low-Dose X-Ray Ct Reconstruction

