Search Results for author: Mingxiao Li

Found 24 papers, 14 papers with code

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

1 code implementation2 Feb 2025 Teng Xiao, Yige Yuan, Zhengyu Chen, Mingxiao Li, Shangsong Liang, Zhaochun Ren, Vasant G Honavar

Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models.

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding

1 code implementation19 Jan 2025 Zhanpeng Chen, Mingxiao Li, Ziyang Chen, Nan Du, Xiaolong Li, Yuexian Zou

Vision-language Models (VLMs) have shown remarkable capabilities in advancing general artificial intelligence, yet the irrational encoding of visual positions persists in inhibiting the models' comprehensive perception performance across different levels of granularity.

Position

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

1 code implementation19 Dec 2024 Mang Ning, Mingxiao Li, Jianlin Su, Haozhe Jia, Lanmiao Liu, Martin Beneš, Albert Ali Salah, Itir Onal Ertugrul

The effectiveness of DCTdiff and the introduced properties suggest a promising direction for image modeling in the frequency space.

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment

1 code implementation19 Dec 2024 Teng Xiao, Yige Yuan, Huaisheng Zhu, Mingxiao Li, Vasant G Honavar

Contrastive preference optimization has shown promising results in aligning LLMs with available preference data by optimizing the implicit reward associated with the policy.

Language Modeling Language Modelling

Action-based image editing guided by human instructions

no code implementations5 Dec 2024 Maria Mihaela Trusca, Mingxiao Li, Marie-Francine Moens

We show substantial improvements in image editing using action-based text instructions and high reasoning capabilities that allow our model to use the input image as a starting scene for an action while generating a new image that shows the final scene of the action.

Text-based Image Editing

How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective

1 code implementation14 Oct 2024 Teng Xiao, Mingxiao Li, Yige Yuan, Huaisheng Zhu, Chao Cui, Vasant G Honavar

This paper introduces a novel generalized self-imitation learning ($\textbf{GSIL}$) framework, which effectively and efficiently aligns large language models with offline demonstration data.

Density Ratio Estimation GSM8K +6

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

1 code implementation7 Oct 2024 Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo

To address the limitations of both on- and off-policy RLHF, we propose a preference optimization method that aligns DMs with preferences without relying on reward models or paired human-annotated data.

Model Selection

Animate Your Motion: Turning Still Images into Dynamic Videos

no code implementations15 Mar 2024 Mingxiao Li, Bo Wan, Marie-Francine Moens, Tinne Tuytelaars

For the first time, we integrate both semantic and motion cues within a diffusion model for video generation, as demonstrated in Fig 1.

Specificity Text-to-Video Generation +1

NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties

no code implementations2 Feb 2024 Jingyuan Sun, Mingxiao Li, Zijiao Chen, Marie-Francine Moens

In the pursuit to understand the intricacies of human brain's visual processing, reconstructing dynamic visual experiences from brain activities emerges as a challenging yet fascinating endeavor.

Contrastive Learning SSIM +1

Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence

no code implementations2 Oct 2023 Wei Sun, Mingxiao Li, Damien Sileo, Jesse Davis, Marie-Francine Moens

Medical Question Answering~(medical QA) systems play an essential role in assisting healthcare workers in finding answers to their questions.

Explanation Generation Question Answering

Decoding Realistic Images from Brain Activity with Contrastive Self-supervision and Latent Diffusion

no code implementations30 Sep 2023 Jingyuan Sun, Mingxiao Li, Marie-Francine Moens

Reconstructing visual stimuli from human brain activities provides a promising opportunity to advance our understanding of the brain's visual system and its connection with computer vision models.

Contrastive Learning

Elucidating the Exposure Bias in Diffusion Models

5 code implementations29 Aug 2023 Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, Itir Onal Ertugrul

In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue.

Attribute Image Generation

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

1 code implementation24 May 2023 Mingxiao Li, Tingyu Qu, Ruicong Yao, Wei Sun, Marie-Francine Moens

In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model.

Denoising

Crossword: A Semantic Approach to Data Compression via Masking

no code implementations3 Apr 2023 Mingxiao Li, Rui Jin, Liyao Xiang, Kaiming Shen, Shuguang Cui

The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i. i. d.

Data Compression Decoder

Layout-aware Dreamer for Embodied Referring Expression Grounding

1 code implementation30 Nov 2022 Mingxiao Li, Zehao Wang, Tinne Tuytelaars, Marie-Francine Moens

In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction.

Common Sense Reasoning Navigate +1

Find a Way Forward: a Language-Guided Semantic Map Navigator

no code implementations7 Mar 2022 Zehao Wang, Mingxiao Li, Minye Wu, Marie-Francine Moens, Tinne Tuytelaars

In this paper, we introduce the map-language navigation task where an agent executes natural language instructions and moves to the target position based only on a given 3D semantic map.

Imitation Learning

Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering

1 code implementation6 Mar 2022 Mingxiao Li, Marie-Francine Moens

Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to correctly answer image-related questions using knowledge that is not presented in the given image.

Graph Attention Question Answering +2

Modeling Coreference Relations in Visual Dialog

no code implementations EACL 2021 Mingxiao Li, Marie-Francine Moens

Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image.

Question Answering Visual Dialog +1

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

no code implementations13 Jun 2021 Jaron Maene, Mingxiao Li, Marie-Francine Moens

The lottery ticket hypothesis states that sparse subnetworks exist in randomly initialized dense networks that can be trained to the same accuracy as the dense network they reside in.

Linear Mode Connectivity

Multiscale Dynamic Human Mobility Flow Dataset in the U.S. during the COVID-19 Epidemic

7 code implementations27 Aug 2020 Yuhao Kang, Song Gao, Yunlei Liang, Mingxiao Li, Jinmeng Rao, Jake Kruse

Understanding dynamic human mobility changes and spatial interaction patterns at different geographic scales is crucial for monitoring and measuring the impacts of non-pharmaceutical interventions (such as stay-at-home orders) during the pandemic.

Social and Information Networks Physics and Society

Cannot find the paper you are looking for? You can Submit a new open access paper.