Data Engineering for Scaling Language Models to 128K Context

franxyao/long-context-data-engineering 15 Feb 2024

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

4k Continual Pretraining

346
0.21 stars / hour

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

hustvl/vim 17 Jan 2024

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

object-detection Object Detection +3

2,207
0.21 stars / hour

EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

23,069
0.20 stars / hour

An End-to-End Structure with Novel Position Mechanism and Improved EMD for Stock Forecasting

durandallee/aceformer 25 Mar 2024

As a branch of time series forecasting, stock movement forecasting is one of the challenging problems for investors and researchers.

Position Time Series +1

46
0.19 stars / hour

A Survey on Vision Mamba: Models, Applications and Challenges

ruixxxx/awesome-vision-mamba-models 29 Apr 2024

To help keep pace with the rapid advancements in computer vision, this paper aims to provide a comprehensive review of visual Mamba approaches.

63
0.19 stars / hour

PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval

hltcoe/colbert-x 2 May 2024

PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval.

Retrieval

48
0.19 stars / hour

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

facebookresearch/GPU-DPF 26 Jan 2023

Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100, 000$ queries per second -- a $>100 \times$ throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

Information Retrieval Language Modelling +1

46
0.19 stars / hour

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

21,210
0.19 stars / hour

MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

boese0601/magicdance 18 Nov 2023

In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting.

Video Generation

474
0.18 stars / hour

FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering

hiyouga/llama-factory 29 Apr 2024

To investigate these aspects, we create and publish a novel TQA evaluation benchmark in English.

Question Answering

21,204
0.18 stars / hour