Search Results for author: Xiaomeng Yang

Found 17 papers, 6 papers with code

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition

1 code implementation24 Mar 2025 Yifei Zhang, Chang Liu, Jin Wei, Xiaomeng Yang, Yu Zhou, Can Ma, Xiangyang Ji

In this paper, we propose a Linguistics-aware Masked Image Modeling (LMIM) approach, which channels the linguistic information into the decoding process of MIM through a separate branch.

Contrastive Learning Scene Text Recognition +2

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

no code implementations12 Mar 2025 Luozheng Qin, Zhiyu Tan, Mengping Yang, Xiaomeng Yang, Hao Li

Video Detailed Captioning (VDC) is a crucial task for vision-language bridging, enabling fine-grained descriptions of complex video content.

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

no code implementations5 Feb 2025 Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V. Le, Thang Luong

We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems.

Language Modeling Language Modelling +2

IPO: Iterative Preference Optimization for Text-to-Video Generation

no code implementations4 Feb 2025 Xiaomeng Yang, Zhiyu Tan, Xuecheng Nie, Hao Li

Specifically, IPO exploits a critic model to justify video generations for pairwise ranking as in Direct Preference Optimization or point-wise scoring as in Kahneman-Tversky Optimization.

Large Language Model Text-to-Video Generation +1

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

no code implementations6 Dec 2024 Yibin Wang, Zhiyu Tan, Junyan Wang, Xiaomeng Yang, Cheng Jin, Hao Li

Based on this, we train a reward model LiFT-Critic to learn reward function effectively, which serves as a proxy for human judgment, measuring the alignment between given videos and human expectations.

VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

no code implementations5 Aug 2024 Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Hao Li

Produced through a coarse-to-fine curation strategy, this dataset guarantees high-quality videos and detailed captions with excellent temporal consistency.

Text-to-Video Generation Video Generation

EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

1 code implementation24 Jun 2024 Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li

Our evaluation across 24 text-to-image generation models demonstrate that EvalAlign not only provides superior metric stability but also aligns more closely with human preferences than existing metrics, confirming its effectiveness and utility in model assessment.

Text-to-Image Generation

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation12 Apr 2024 Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

State Space Models

IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition

no code implementations19 Dec 2023 Xiaomeng Yang, Zhi Qiao, Yu Zhou

Nowadays, scene text recognition has attracted more and more attention due to its diverse applications.

Conditional Text Generation Decoder +3

End-to-end Story Plot Generator

no code implementations13 Oct 2023 Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian

Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words.

Blocking

Learning Personalized Alignment for Evaluating Open-ended Text Generation

no code implementations5 Oct 2023 Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei LI, Yuandong Tian

Recent research has increasingly focused on evaluating large language models' (LLMs) alignment with diverse human values and preferences, particularly for open-ended tasks like story generation.

Diversity Retrieval +1

TorchRL: A data-driven decision-making library for PyTorch

2 code implementations1 Jun 2023 Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni de Fabritiis, Vincent Moens

PyTorch has ascended as a premier machine learning framework, yet it lacks a native and comprehensive library for decision and control tasks suitable for large development teams dealing with complex real-world data and environments.

Computational Efficiency Decision Making +1

Masked and Permuted Implicit Context Learning for Scene Text Recognition

1 code implementation25 May 2023 Xiaomeng Yang, Zhi Qiao, Jin Wei, Dongbao Yang, Yu Zhou

We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process and replace the undetermined characters with mask tokens.

Decoder Language Modeling +3

Sample-efficient Surrogate Model for Frequency Response of Linear PDEs using Self-Attentive Complex Polynomials

no code implementations6 Jan 2023 Andrew Cohen, Weiping Dou, Jiang Zhu, Slawomir Koziel, Peter Renner, Jan-Ove Mattsson, Xiaomeng Yang, Beidi Chen, Kevin Stone, Yuandong Tian

Linear Partial Differential Equations (PDEs) govern the spatial-temporal dynamics of physical systems that are essential to building modern technology.

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

1 code implementation20 Jun 2022 Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster

We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability.

Imitation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.