Trending Research

EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider • ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

22,758

0.39 stars / hour

Paper
Code

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

sclbd/deepfakebench • • 19 Nov 2023

Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data.

DeepFake Detection Face Swapping +1

244

0.38 stars / hour

Paper
Code

RULER: What's the Real Context Size of Your Long-Context Language Models?

hsiehjackson/ruler • • 9 Apr 2024

Despite achieving nearly perfect accuracy in the vanilla NIAH test, all models exhibit large performance drops as the context length increases.

Long-Context Understanding

0.37 stars / hour

Paper
Code

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

jishengpeng/TextrolSpeech • 28 Aug 2023

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling

0.34 stars / hour

Paper
Code

A Survey on Vision Mamba: Models, Applications and Challenges

ruixxxx/awesome-vision-mamba-models • • 29 Apr 2024

To help keep pace with the rapid advancements in computer vision, this paper aims to provide a comprehensive review of visual Mamba approaches.

0.34 stars / hour

Paper
Code

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

tencentarc/instantmesh • • 10 Apr 2024

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

1,751

0.33 stars / hour

Paper
Code

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec • • 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning Decoder

406

0.33 stars / hour

Paper
Code

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

mit-han-lab/efficientvit • • 7 Feb 2024

For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT.

Decoder Knowledge Distillation +1

1,381

0.30 stars / hour

Paper
Code

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 • • 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Ranked #2 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Alignment +1

1,062

0.29 stars / hour

Paper
Code

WorldGPT: Empowering LLM as Multimodal World Model

dcdmllm/worldgpt • • 28 Apr 2024

As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.

Language Modelling Large Language Model

0.29 stars / hour

Paper
Code