Search Results for author: Tao Yuan

Found 18 papers, 3 papers with code

A Simple Linear Patch Revives Layer-Pruned Large Language Models

no code implementations30 May 2025 Xinrui Chen, Haoli Bai, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan

With only 5K samples, the retained performance of LinearPatch can be further boosted to 95. 16% within 30 minutes on a single computing card.

Knowledge Distillation Question Answering

Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs

no code implementations26 May 2025 Hanting Chen, Jiarui Qin, Jialong Guo, Tao Yuan, Yichun Yin, HuiLing Zhen, Yasheng Wang, Jinpeng Li, Xiaojun Meng, Meng Zhang, Rongju Ruan, Zheyuan Bai, Yehui Tang, Can Chen, Xinghao Chen, Fisher Yu, Ruiming Tang, Yunhe Wang

While structured pruning offers a promising avenue for model compression, existing methods often struggle with the detrimental effects of aggressive, simultaneous width and depth reductions, leading to substantial performance degradation.

Model Compression

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

no code implementations17 Apr 2025 Bofei Zhang, Zirui Shang, Zhi Gao, Wang Zhang, Rui Xie, Xiaojian Ma, Tao Yuan, Xinxiao wu, Song-Chun Zhu, Qing Li

Building Graphical User Interface (GUI) agents is a promising research direction, which simulates human interaction with computers or mobile phones to perform diverse GUI tasks.

Megrez-Omni Technical Report

no code implementations19 Feb 2025 Boxun Li, Yadong Li, Zhiyuan Li, Congyi Liu, Weilin Liu, Guowei Niu, Zheyue Tan, Haiyang Xu, Zhuyu Yao, Tao Yuan, Dong Zhou, Yueqing Zhuang, Shengen Yan, Guohao Dai, Yu Wang

In this work, we present the Megrez models, comprising a language model (Megrez-3B-Instruct) and a multimodal model (Megrez-3B-Omni).

Language Modeling Language Modelling

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

1 code implementation24 Dec 2024 Yang shen, Xiu-Shen Wei, Yifan Sun, Yuxin Song, Tao Yuan, Jian Jin, Heyang Xu, Yazhou Yao, Errui Ding

In this paper, we explore the idea that CV adopts discrete and terminological task definitions (\eg, ``image segmentation''), which may be a key barrier to zero-shot task generalization.

Image Segmentation Semantic Segmentation +1

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

no code implementations20 Dec 2024 Zhi Gao, Bofei Zhang, Pengxiang Li, Xiaojian Ma, Tao Yuan, Yue Fan, Yuwei Wu, Yunde Jia, Song-Chun Zhu, Qing Li

The advancement of large language models (LLMs) prompts the development of multi-modal agents, which are used as a controller to call external tools, providing a feasible way to solve practical tasks.

Language Modeling Language Modelling

Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs

no code implementations21 Oct 2024 Kang Zhao, Tao Yuan, Han Bao, Zhenfeng Su, Chang Gao, Zhaofeng Sun, Zichen Liang, Liping Jing, Jianfei Chen

In this study, we thoroughly investigate the application of V:N:M sparsity in vision models and LLMs across multiple tasks, from pertaining to downstream tasks.

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

no code implementations16 Jul 2024 Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li

Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction.

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

1 code implementation6 Feb 2024 Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.

16k Benchmarking

Structured Attention for Unsupervised Dialogue Structure Induction

1 code implementation EMNLP 2020 Liang Qiu, Yizhou Zhao, Weiyan Shi, Yuan Liang, Feng Shi, Tao Yuan, Zhou Yu, Song-Chun Zhu

Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics.

Inductive Bias Sentence +1

Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

no code implementations25 Apr 2020 Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

Aiming to understand how human (false-)belief--a core socio-cognitive ability--would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs.

Object Object Tracking

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

no code implementations NeurIPS 2019 Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate.

Ranked #2 on Monocular 3D Object Detection on SUN RGB-D (AP@0.15 (10 / PNet-30) metric)

Monocular 3D Object Detection Object +1

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

no code implementations ICCV 2019 Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction---3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation.

3D Human Pose Estimation Human-Object Interaction Detection +1

HUGE2: a Highly Untangled Generative-model Engine for Edge-computing

no code implementations25 Jul 2019 Feng Shi, Ziheng Xu, Tao Yuan, Song-Chun Zhu

In this work, we propose a Highly Untangled Generative-model Engine for Edge-computing or HUGE2 for accelerating these two special convolutions on the edge-computing platform by decomposing the kernels and untangling these smaller convolutions by performing basic matrix multiplications.

Deep Learning Edge-computing +1

Scene-centric Joint Parsing of Cross-view Videos

no code implementations16 Sep 2017 Hang Qi, Yuanlu Xu, Tao Yuan, Tianfu Wu, Song-Chun Zhu

The proposed joint parsing framework represents such correlations and constraints explicitly and generates semantic scene-centric parse graphs.

Video Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.