EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

22,758
0.39 stars / hour

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

sclbd/deepfakebench 19 Nov 2023

Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data.

DeepFake Detection Face Swapping +1

244
0.38 stars / hour

RULER: What's the Real Context Size of Your Long-Context Language Models?

hsiehjackson/ruler 9 Apr 2024

Despite achieving nearly perfect accuracy in the vanilla NIAH test, all models exhibit large performance drops as the context length increases.

Long-Context Understanding

83
0.37 stars / hour

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

jishengpeng/TextrolSpeech 28 Aug 2023

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling

73
0.34 stars / hour

A Survey on Vision Mamba: Models, Applications and Challenges

ruixxxx/awesome-vision-mamba-models 29 Apr 2024

To help keep pace with the rapid advancements in computer vision, this paper aims to provide a comprehensive review of visual Mamba approaches.

44
0.34 stars / hour

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

tencentarc/instantmesh 10 Apr 2024

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

1,751
0.33 stars / hour

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

mcgill-nlp/llm2vec 9 Apr 2024

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning Decoder

406
0.33 stars / hour

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

mit-han-lab/efficientvit 7 Feb 2024

For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT.

Decoder Knowledge Distillation +1

1,381
0.30 stars / hour

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Text-to-Video Generation Video Alignment +1

1,062
0.29 stars / hour

WorldGPT: Empowering LLM as Multimodal World Model

dcdmllm/worldgpt 28 Apr 2024

As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.

Language Modelling Large Language Model

28
0.29 stars / hour