Search Results for author: Yexin Liu

Found 20 papers, 8 papers with code

ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos

no code implementations20 Mar 2025 Haolin Yang, Feilong Tang, Ming Hu, Yulong Li, Yexin Liu, Zelin Peng, Junjun He, ZongYuan Ge, Imran Razzak

Specifically, we perform one-step denoising to convert initial noises into a clip and subsequently evaluate its long-term value, leveraging a reward model anchored by previously generated content.

Denoising Diversity +1

Temporal Regularization Makes Your Video Generator Stronger

no code implementations19 Mar 2025 Harold Haodong Chen, Haojian Huang, Xianfeng Wu, Yexin Liu, Yajing Bai, Wen-Jie Shu, Harry Yang, Ser-Nam Lim

Temporal quality is a critical aspect of video generation, as it ensures consistent motion and realistic dynamics across frames.

Diversity Video Generation

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention

no code implementations19 Mar 2025 Mingzhe Zheng, Yongqi Xu, Haojian Huang, Xuran Ma, Yexin Liu, Wenjie Shu, Yatian Pang, Feilong Tang, Qifeng Chen, Harry Yang, Ser-Nam Lim

Current video generation models excel at short clips but fail to produce cohesive multi-shot narratives due to disjointed visual dynamics and fractured storylines.

Video Generation

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation

1 code implementation3 Dec 2024 Mingzhe Zheng, Yongqi Xu, Haojian Huang, Xuran Ma, Yexin Liu, Wenjie Shu, Yatian Pang, Feilong Tang, Qifeng Chen, Harry Yang, Ser-Nam Lim

Cross-Shot Consistency: We ensure temporal and identity consistency by leveraging identity-preserving (IP) embeddings across shots, which are automatically created from the narrative.

Script Generation Video Generation

GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation

no code implementations17 Aug 2024 Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang

The `out-of-the-box' insight of GoodSAM++ is to introduce a teacher assistant (TA) to provide semantic information for SAM, integrated with SAM to obtain reliable pseudo semantic maps to bridge both domain and capacity gaps.

Domain Adaptation Instance Segmentation +2

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

1 code implementation15 Jun 2024 Yexin Liu, Zhengyang Liang, Yueze Wang, Xianfeng Wu, Feilong Tang, Muyang He, Jian Li, Zheng Liu, Harry Yang, SerNam Lim, Bo Zhao

To this end, we manually construct a benchmark with 12 categories and design evaluation metrics that assess the degree of error in MLLM responses even when the visual content is seemingly understood.

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

no code implementations18 May 2024 Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He

EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging.

Question Answering Visual Question Answering

Efficient Multimodal Large Language Models: A Survey

1 code implementation17 May 2024 Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.

Edge-computing Question Answering +2

Evaluating large language models in medical applications: a survey

no code implementations13 May 2024 Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine.

Survey

Unsupervised Visible-Infrared ReID via Pseudo-label Correction and Modality-level Alignment

no code implementations10 Apr 2024 Yexin Liu, Weiming Zhang, Athanasios V. Vasilakos, Lin Wang

Specifically, to address the first challenge, we propose a pseudo-label correction strategy that utilizes a Beta Mixture Model to predict the probability of mis-clustering based network's memory effect and rectifies the correspondence by adding a perceptual term to contrastive learning.

Clustering Contrastive Learning +4

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

no code implementations CVPR 2024 Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang

To this end, we propose a novel framework, called GoodSAM, that introduces a teacher assistant (TA) to provide semantic information, integrated with SAM to generate ensemble logits to achieve knowledge transfer.

Domain Adaptation Instance Segmentation +3

SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model

1 code implementation28 Feb 2024 Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao

To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models.

Image Generation Language Modeling +1

Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation

no code implementations10 Jul 2023 Yexin Liu, Weiming Zhang, Guoyang Zhao, Jinjing Zhu, Athanasios Vasilakos, Lin Wang

we propose the first test-time adaptation (TTA) framework, dubbed Night-TTA, to address the problems for nighttime RGBT semantic segmentation without access to the source (daytime) data during adaptation.

Scene Understanding Semantic Segmentation +1

Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation

no code implementations CVPR 2023 Xu Zheng, Jinjing Zhu, Yexin Liu, Zidong Cao, Chong Fu, Lin Wang

Moreover, adversarial intra-projection training is proposed to reduce the inherent gap, between the features of the pinhole images and those of the ERP and TP images, respectively.

ERP Scene Understanding +2

Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

1 code implementation17 Feb 2023 Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, DaCheng Tao, Lin Wang

Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes.

Deblurring Deep Learning +6

Cannot find the paper you are looking for? You can Submit a new open access paper.