Trending Research

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

osu-nlp-group/hipporag • 23 May 2024

In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting.

Hippocampus Knowledge Graphs +3

316

0.76 stars / hour

Paper
Code

APISR: Anime Production Inspired Real-World Anime Super-Resolution

kiteretsu77/apisr • • 3 Mar 2024

In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.

Super-Resolution

686

0.61 stars / hour

Paper
Code

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

zhu-zhiyu/nvs_solver • 24 May 2024

By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training.

Novel View Synthesis

119

0.61 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

3,184

0.58 stars / hour

Paper
Code

Yuan 2.0-M32: Mixture of Experts with Attention Router

ieit-yuan/yuan2.0-m32 • • 28 May 2024

Yuan 2. 0-M32, with a similar base architecture as Yuan-2. 0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active.

Math

129

0.58 stars / hour

Paper
Code

Look Once to Hear: Target Speech Hearing with Noisy Examples

vb000/lookoncetohear • • 10 May 2024

We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker.

Speech Extraction

419

0.56 stars / hour

Paper
Code

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

om-ai-lab/OmDet • • 11 Mar 2024

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.

Object object-detection +2

304

0.56 stars / hour

Paper
Code

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

wzzheng/occsora • • 30 May 2024

To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving.

Autonomous Driving Decision Making