Search Results for author: Yuxuan Cai

Found 24 papers, 11 papers with code

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

no code implementations16 Jun 2025 Zhucun Xue, Jiangning Zhang, Teng Hu, Haoyang He, Yinan Chen, Yuxuan Cai, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, DaCheng Tao

In addition, we expand Wan to UltraWan-1K/-4K, which can natively generate high-quality 1K/4K videos with more consistent text controllability, demonstrating the effectiveness of our data curation. We believe that this work can make a significant contribution to future research on UHD video generation.

4k 8k +1

Task-Core Memory Management and Consolidation for Long-term Continual Learning

no code implementations15 May 2025 Tianyu Huai, Jie zhou, Yuxuan Cai, Qin Chen, Wen Wu, Xingjiao Wu, Xipeng Qiu, Liang He

In this paper, we focus on a long-term continual learning (CL) task, where a model learns sequentially from a stream of vast tasks over time, acquiring new knowledge while retaining previously learned information in a manner akin to human learning.

Continual Learning Management

Omni-AD: Learning to Reconstruct Global and Local Features for Multi-class Anomaly Detection

1 code implementation27 Mar 2025 Jiajie Quan, Ao Tong, Yuxuan Cai, Xinwei He, Yulong Wang, Yang Zhou

To address that, we propose to learn the input features in global and local manners, forcing the network to memorize the normal patterns more comprehensively.

Decoder Multi-class Anomaly Detection +1

Fleximo: Towards Flexible Text-to-Human Motion Video Generation

no code implementations29 Nov 2024 Yuhang Zhang, Yuan Zhou, Zeyu Liu, Yuxuan Cai, Qiuyue Wang, Aidong Men, Huan Yang

Current methods for generating human motion videos rely on extracting pose sequences from reference videos, which restricts flexibility and control.

Image to Video Generation Large Language Model +1

Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention

no code implementations28 Nov 2024 Huiguo He, Qiuyue Wang, Yuan Zhou, Yuxuan Cai, Hongyang Chao, Jian Yin, Huan Yang

This ensures that subjects in the target image can better reference those in the reference image, thereby maintaining better consistency.

Image Generation

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

1 code implementation CVPR 2025 Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Lei Xie

CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios.

Mamba State Space Models

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

1 code implementation21 Oct 2024 Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai

The success of Large Language Models (LLM) has led researchers to explore Multimodal Large Language Models (MLLM) for unified visual and linguistic understanding.

Allegro: Open the Black Box of Commercial-Level Video Generation Model

1 code implementation20 Oct 2024 Yuan Zhou, Qiuyue Wang, Yuxuan Cai, Huan Yang

Significant advancements have been made in the field of video generation, with the open-source community contributing a wealth of research papers and tools for training high-quality models.

Video Generation

Attention-Guided Perturbation for Unsupervised Image Anomaly Detection

no code implementations14 Aug 2024 Tingfeng Huang, Yuxuan Cheng, Jingbo Xia, Rui Yu, Yuxuan Cai, Jinhai Xiang, Xinwei He, Xiang Bai

The reconstruction branch is simply a plain reconstruction network that learns to reconstruct normal samples, while the auxiliary branch aims to produce attention masks to guide the noise perturbation process for normal samples from easy to hard.

Unsupervised Anomaly Detection

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

1 code implementation5 Jun 2024 Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong liu

This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework that is highly extensible for new methods.

Benchmarking Lesion Detection +1

High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

2 code implementations26 May 2024 Jiakui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo Xu, Guoqi Li

This work is expected to break the technical bottleneck of significantly increasing memory cost and training time for large-scale SNNs while maintaining high performance and low inference energy cost.

Anomaly Detection by Adapting a pre-trained Vision Language Model

no code implementations14 Mar 2024 Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Bai

Recently, large vision and language models have shown their success when adapting them to many downstream tasks.

Anomaly Detection Language Modeling +2

Yi: Open Foundation Models by 01.AI

1 code implementation7 Mar 2024 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai

The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models.

 Ranked #1 on Chatbot on AlpacaEval (using extra training data)

Attribute Chatbot +4

A Discrepancy Aware Framework for Robust Anomaly Detection

1 code implementation11 Oct 2023 Yuxuan Cai, Dingkang Liang, Dongliang Luo, Xinwei He, Xin Yang, Xiang Bai

To alleviate this issue, we present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies across different anomaly detection benchmarks.

Anomaly Detection Decoder +3

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

1 code implementation NeurIPS 2023 Qi Han, Yuxuan Cai, Xiangyu Zhang

Such design enables our architecture with the nice property: maintaining disentangled low-level and semantic information at the end of the network in MIM pre-training.

Decoder image-classification +5

Reversible Column Networks

1 code implementation22 Dec 2022 Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang

Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does.

Ranked #10 on Semantic Segmentation on ADE20K (using extra training data)

image-classification Image Classification +4

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

no code implementations22 Nov 2021 Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices.

Model Compression

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

3 code implementations12 Sep 2020 Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang

In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design.

Computational Efficiency Object +2

Cannot find the paper you are looking for? You can Submit a new open access paper.