Search Results for author: Xiaoxin Chen

Found 23 papers, 8 papers with code

EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices

no code implementations28 Mar 2025 Jiyu Chen, Shuang Peng, Daxiong Luo, Fan Yang, Renshou Wu, Fangyuan Li, Xiaoxin Chen

Transformer-based large language models (LLMs) encounter challenges in processing long sequences on edge devices due to the quadratic complexity of attention mechanisms and growing memory demands from Key-Value (KV) cache.

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

1 code implementation13 Mar 2025 Rui Hu, Lianghui Zhu, Yuxuan Zhang, Tianheng Cheng, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

Pixel grounding, encompassing tasks such as Referring Expression Segmentation (RES), has garnered considerable attention due to its immense potential for bridging the gap between vision and language modalities.

Diversity Language Modeling +3

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

no code implementations8 Mar 2025 Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li

Large Language Models (LLMs) have become integral to daily life, especially advancing as intelligent assistants through on-device deployment on smartphones.

Text Summarization

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

1 code implementation2 Mar 2025 Kashun Shum, Yuzhen Huang, Hongjian Zou, Ding Qi, Yixuan Liao, Xiaoxin Chen, Qian Liu, Junxian He

Through comprehensive experiments with 1B and 3B parameter models, we demonstrate that models trained on 30B tokens selected with PreSelect surpasses the performance of a vanilla baseline trained on 300B tokens, achieving a 10x reduction in compute requirements.

Autonomous Deep Agent

no code implementations10 Feb 2025 Amy Yu, Erik Lebedev, Lincoln Everett, Xiaoxin Chen, Terry Chen

Through this sophisticated architecture, Deep Agent establishes a novel paradigm in self-governing AI systems, demonstrating robust capability to independently handle intricate, multi-step tasks while maintaining consistent efficiency and reliability through continuous self-optimization.

Large Language Model

Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

no code implementations9 Dec 2024 Min Zeng, Caiquan Liu, Shiqi Zhang, Li Xie, Chen Sang, Xiaoxin Chen

Subsequently, this model is used to predict the outcomes for the unsampled data, categorizing incorrectly predicted data into uncovered, difficult, and noisy data.

Diversity text-classification +1

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

no code implementations16 Nov 2024 Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu, Yan Hu, Yi Zeng, Lei Wu, Liuyang Bian, Zhaoxiong Wang, Long Liu, Yanzhou Yang, Han Xiao, Aojun Zhou, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li

To be specific, we redesign the dynamic resolution scheme adopted by mainstream MLLMs and implement system optimization for hardware-aware deployment to optimize model inference on mobile phones.

Quantization

A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models

no code implementations5 Oct 2024 Zhihao Wang, Shiyu Liu, Jianheng Huang, Zheng Wang, Yixuan Liao, Xiaoxin Chen, Junfeng Yao, Jinsong Su

Preliminary experiments demonstrate that PTFS achieves better pre-training performance, while CPT has lower training cost.

ControlAR: Controllable Image Generation with Autoregressive Models

1 code implementation3 Oct 2024 Zongming Li, Tianheng Cheng, Shoufa Chen, Peize Sun, Haocheng Shen, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

Firstly, we explore control encoding for AR models and propose a lightweight control encoder to transform spatial inputs (e. g., canny edges or depth maps) into control tokens.

Image Generation

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

1 code implementation28 Jun 2024 Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e. g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation.

Ranked #3 on Referring Expression Segmentation on RefCOCO+ test B (using extra training data)

Interactive Segmentation Language Modeling +4

FAGhead: Fully Animate Gaussian Head from Monocular Videos

no code implementations27 Jun 2024 Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan

High-fidelity reconstruction of 3D human avatars has a wild application in visual reality.

Meta-Auxiliary Learning for Micro-Expression Recognition

no code implementations18 Apr 2024 Jingyao Wang, Yunhan Tian, Yuxuan Yang, Xiaoxin Chen, Changwen Zheng, Wenwen Qiang

Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection.

Auxiliary Learning Micro Expression Recognition +1

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

no code implementations21 Nov 2023 Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen

To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis.

Image Generation Text-to-Video Generation +1

DPL: Decoupled Prompt Learning for Vision-Language Models

no code implementations19 Aug 2023 Chen Xu, Yuhan Zhu, Guozhen Zhang, Haocheng Shen, Yixuan Liao, Xiaoxin Chen, Gangshan Wu, LiMin Wang

Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e. g., CLIP) to downstream tasks.

Prompt Learning

Progressive Visual Prompt Learning with Contrastive Feature Re-formation

1 code implementation17 Apr 2023 Chen Xu, Yuhan Zhu, Haocheng Shen, Boheng Chen, Yixuan Liao, Xiaoxin Chen, LiMin Wang

To the best of our knowledge, we are the first to demonstrate the superior performance of visual prompts in V-L models to previous prompt-based methods in downstream tasks.

Prompt Learning

Real-Time Image Demoireing on Mobile Devices

1 code implementation4 Feb 2023 Yuxin Zhang, Mingbao Lin, Xunchao Li, Han Liu, Guozhi Wang, Fei Chao, Shuai Ren, Yafei Wen, Xiaoxin Chen, Rongrong Ji

In this paper, we launch the first study on accelerating demoireing networks and propose a dynamic demoireing acceleration method (DDA) towards a real-time deployment on mobile devices.

Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images

no code implementations CVPR 2021 Xinggang Wang, Jiapei Feng, Bin Hu, Qi Ding, Longjin Ran, Xiaoxin Chen, Wenyu Liu

Humans have a strong class-agnostic object segmentation ability and can outline boundaries of unknown objects precisely, which motivates us to propose a box-supervised class-agnostic object segmentation (BoxCaseg) based solution for weakly-supervised instance segmentation.

Ranked #5 on Box-supervised Instance Segmentation on COCO test-dev (using extra training data)

Box-supervised Instance Segmentation Multi-Task Learning +5

Hierarchical Reinforcement Learning for Multi-agent MOBA Game

no code implementations23 Jan 2019 Zhijian Zhang, Haozheng Li, Luo Zhang, Tianyin Zheng, Ting Zhang, Xiong Hao, Xiaoxin Chen, Min Chen, Fangxu Xiao, Wei Zhou

Real Time Strategy (RTS) games require macro strategies as well as micro strategies to obtain satisfactory performance since it has large state space, action space, and hidden information.

Hierarchical Reinforcement Learning Imitation Learning +4

Cannot find the paper you are looking for? You can Submit a new open access paper.