Search Results for author: Yichen Zhu

Found 32 papers, 8 papers with code

MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?

no code implementations28 Jun 2024 Jinming Li, Yichen Zhu, Zhiyuan Xu, Jindong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang

It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning.

Visual Reasoning

Non-confusing Generation of Customized Concepts in Diffusion Models

no code implementations11 May 2024 Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang

We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs).

Safety of Multimodal Large Language Models on Images and Texts

2 code implementations1 Feb 2024 Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao

In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text.

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

no code implementations31 Jan 2024 Dong Chen, Ning Liu, Yichen Zhu, Zhengping Che, Rui Ma, Fachao Zhang, Xiaofeng Mou, Yi Chang, Jian Tang

Instead of a simple combination of pruning and SD, EPSD enables the pruned network to favor SD by keeping more distillable weights before training to ensure better distillation of the pruned network.

Knowledge Distillation Network Pruning +1

Infinite-Horizon Graph Filters: Leveraging Power Series to Enhance Sparse Information Aggregation

1 code implementation18 Jan 2024 Ruizhe Zhang, Xinke Jiang, Yuchen Fang, Jiayuan Luo, Yongxin Xu, Yichen Zhu, Xu Chu, Junfeng Zhao, Yasha Wang

Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years.

Graph Learning Node Classification

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

no code implementations8 Jan 2024 Minjie Zhu, Yichen Zhu, Jinming Li, Junjie Wen, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang

The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning.

Decision Making Intent Recognition +2

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model

1 code implementation4 Jan 2024 Yichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang

In this paper, we introduce LLaVA-$\phi$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues.

Language Modelling Visual Question Answering

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

1 code implementation29 Nov 2023 Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao

The security concerns surrounding Large Language Models (LLMs) have been extensively explored, yet the safety of Multimodal Large Language Models (MLLMs) remains understudied.

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models

no code implementations15 Oct 2023 Zijian Zhang, Luping Liu, Zhijie Lin, Yichen Zhu, Zhou Zhao

We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

1 code implementation ICCV 2023 Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

To accomplish this, we design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.

3D visual grounding Object +3

Make A Long Image Short: Adaptive Token Length for Vision Transformers

no code implementations5 Jul 2023 Qiqi Zhou, Yichen Zhu

The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed.

Action Recognition Image Classification

Prediction with Incomplete Data under Agnostic Mask Distribution Shift

no code implementations18 May 2023 Yichen Zhu, Jian Yuan, Bo Jiang, Tao Lin, Haiming Jin, Xinbing Wang, Chenghu Zhou

We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i. e., mask distribution may shift agnostically between training and testing.

ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector

no code implementations CVPR 2023 Yichen Zhu, Qiqi Zhou, Ning Liu, Zhiyuan Xu, Zhicai Ou, Xiaofeng Mou, Jian Tang

Unlike existing works that struggle to balance the trade-off between inference speed and SOD performance, in this paper, we propose a novel Scale-aware Knowledge Distillation (ScaleKD), which transfers knowledge of a complex teacher model to a compact student model.

Knowledge Distillation object-detection +2

Label-Guided Auxiliary Training Improves 3D Object Detector

1 code implementation24 Jul 2022 Yaomin Huang, Xinmei Liu, Yichen Zhu, Zhiyuan Xu, Chaomin Shen, Zhengping Che, Guixu Zhang, Yaxin Peng, Feifei Feng, Jian Tang

Detecting 3D objects from point clouds is a practical yet challenging task that has attracted increasing attention recently.

3D Object Detection Object +1

Optimal market completion through financial derivatives with applications to volatility risk

no code implementations16 Feb 2022 Matt Davison, Marcos Escobar-Anel, Yichen Zhu

This paper investigates the optimal choices of financial derivatives to complete a financial market in the framework of stochastic volatility (SV) models.

Derivatives-based portfolio decisions. An expected utility insight

no code implementations11 Jan 2022 Marcos Escobar-Anel, Matt Davison, Yichen Zhu

This paper challenges the use of stocks in portfolio construction, instead we demonstrate that Asian derivatives, straddles, or baskets could be more convenient substitutes.

Management

Make A Long Image Short: Adaptive Token Length for Vision Transformers

no code implementations3 Dec 2021 Yichen Zhu, Yuqin Zhu, Jie Du, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang

The TLA enables the ReViT to process the image with the minimum sufficient number of tokens during inference.

Action Recognition Image Classification

Training BatchNorm Only in Neural Architecture Search and Beyond

no code implementations1 Dec 2021 Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang

Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the difference between the train-BN-only supernet and the standard-train supernet.

Fairness Neural Architecture Search

Student Customized Knowledge Distillation: Bridging the Gap Between Student and Teacher

no code implementations ICCV 2021 Yichen Zhu, Yi Wang

We formulate the knowledge distillation as a multi-task learning problem so that the teacher transfers knowledge to the student only if the student can benefit from learning such knowledge.

Image Classification Knowledge Distillation +4

Classification Trees for Imbalanced and Sparse Data: Surface-to-Volume Regularization

no code implementations26 Apr 2020 Yichen Zhu, Cheng Li, David B. Dunson

When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error.

General Classification

Resizable Neural Networks

no code implementations25 Sep 2019 Yichen Zhu, Xiangyu Zhang, Tong Yang, Jian Sun

We introduce the adaptive resizable networks as dynamic networks, which further improve the performance with less computational cost via data-dependent inference.

Data Augmentation Neural Architecture Search

VAENAS: Sampling Matters in Neural Architecture Search

no code implementations25 Sep 2019 Shizheng Qin, Yichen Zhu, Pengfei Hou, Xiangyu Zhang, Wenqiang Zhang, Jian Sun

In this paper, we propose a learnable sampling module based on variational auto-encoder (VAE) for neural architecture search (NAS), named as VAENAS, which can be easily embedded into existing weight sharing NAS framework, e. g., one-shot approach and gradient-based approach, and significantly improve the performance of searching results.

Neural Architecture Search

CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario

1 code implementation13 May 2019 Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Wei-Nan Zhang, Yong Yu, Haiming Jin, Zhenhui Li

The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios.

Multi-agent Reinforcement Learning reinforcement-learning +1

Nonoverlap-Promoting Variable Selection

no code implementations ICML 2018 Pengtao Xie, Hongbao Zhang, Yichen Zhu, Eric Xing

Variable selection is a classic problem in machine learning (ML), widely used to find important explanatory factors, and improve generalization performance and interpretability of ML models.

Variable Selection

Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis

no code implementations ICML 2018 Pengtao Xie, Wei Wu, Yichen Zhu, Eric P. Xing

In this paper, we address these three issues by (1) seeking convex relaxations of the original nonconvex problems so that the global optimal is guaranteed to be achievable; (2) providing a formal analysis on OPR's capability of promoting balancedness; (3) providing a theoretical analysis that directly reveals the relationship between OPR and generalization performance.

Metric Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.