HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

osu-nlp-group/hipporag 23 May 2024

In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting.

Hippocampus Knowledge Graphs +3

316
0.76 stars / hour

APISR: Anime Production Inspired Real-World Anime Super-Resolution

kiteretsu77/apisr 3 Mar 2024

In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.

Super-Resolution

686
0.61 stars / hour

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

zhu-zhiyu/nvs_solver 24 May 2024

By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training.

Novel View Synthesis

119
0.61 stars / hour

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

4k Language Modelling +3

3,184
0.58 stars / hour

Yuan 2.0-M32: Mixture of Experts with Attention Router

ieit-yuan/yuan2.0-m32 28 May 2024

Yuan 2. 0-M32, with a similar base architecture as Yuan-2. 0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active.

Math

129
0.58 stars / hour

Look Once to Hear: Target Speech Hearing with Noisy Examples

vb000/lookoncetohear 10 May 2024

We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker.

Speech Extraction

419
0.56 stars / hour

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

om-ai-lab/OmDet 11 Mar 2024

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.

Object object-detection +2

304
0.56 stars / hour

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

wzzheng/occsora 30 May 2024

To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving.

Autonomous Driving Decision Making

50
0.54 stars / hour

Matryoshka Query Transformer for Large Vision-Language Models

gordonhu608/mqt-llava 29 May 2024

This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources?

Language Modelling Representation Learning

54
0.52 stars / hour

State Space Models for Event Cameras

uzh-rpg/ssms_event_cameras 23 Feb 2024

We address this challenge by introducing state-space models (SSMs) with learnable timescale parameters to event-based vision.

Event-based vision Object Detection

72
0.48 stars / hour