1 code implementation • 10 Mar 2024 • Minjie Zhu, Yichen Zhu, Xin Liu, Ning Liu, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Zhicai Ou, Feifei Feng, Jian Tang
Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning.
Ranked #55 on Visual Question Answering on MM-Vet
no code implementations • 17 Jan 2024 • Kun Wu, Ning Liu, Zhen Zhao, Di Qiu, Jinming Li, Zhengping Che, Zhiyuan Xu, Qinru Qiu, Jian Tang
Imitation learning (IL), aiming to learn optimal control policies from expert demonstrations, has been an effective method for robot manipulation tasks.
no code implementations • 17 Jan 2024 • Yinuo Zhao, Kun Wu, Tianjiao Yi, Zhiyuan Xu, Xiaozhu Ju, Zhengping Che, Qinru Qiu, Chi Harold Liu, Jian Tang
Visuomotor policies, which learn control mechanisms directly from high-dimensional visual observations, confront challenges in adapting to new environments with intricate visual variations.
no code implementations • 8 Jan 2024 • Minjie Zhu, Yichen Zhu, Jinming Li, Junjie Wen, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang
The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning.
no code implementations • 5 Jan 2024 • Junjie Wen, Yichen Zhu, Minjie Zhu, Jinming Li, Zhiyuan Xu, Zhengping Che, Chaomin Shen, Yaxin Peng, Dong Liu, Feifei Feng, Jian Tang
Humans interpret scenes by recognizing both the identities and positions of objects in their observations.
no code implementations • 20 Dec 2023 • Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Qinru Qiu, Jian Tang
Video Question Answering (VideoQA) is a very attractive and challenging research direction aiming to understand complex semantics of heterogeneous data from two domains, i. e., the spatio-temporal video content and the word sequence in question.
no code implementations • 20 Dec 2023 • Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Jian Tang
Visual Question Answering (VQA) has emerged as one of the most challenging tasks in artificial intelligence due to its multi-modal nature.
no code implementations • 18 Dec 2023 • Wanying Wang, Yichen Zhu, Yirui Zhou, Chaomin Shen, Jian Tang, Zhiyuan Xu, Yaxin Peng, Yangchun Zhang
Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in imitation learning.
no code implementations • 26 Sep 2023 • Mateo Cámara, Zhiyuan Xu, Yisu Zong, José Luis Blanco, Joshua D. Reiss
We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model.
no code implementations • 4 Aug 2023 • Haowen Wang, Zhipeng Fan, Zhen Zhao, Zhengping Che, Zhiyuan Xu, Dong Liu, Feifei Feng, Yakun Huang, XIUQUAN QIAO, Jian Tang
We introduce a pose regression module that shares the deformation features and template codes from the fields to estimate the accurate 6D pose of each object in the scene.
no code implementations • 6 Jun 2023 • Haowen Wang, Zhengping Che, Mingyuan Wang, Zhiyuan Xu, XIUQUAN QIAO, Mengshi Qi, Feifei Feng, Jian Tang
The raw depth image captured by indoor depth sensors usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and the limited distance range.
no code implementations • 23 Mar 2023 • Mingze Wei, Yaomin Huang, Zhiyuan Xu, Ning Liu, Zhengping Che, Xinyu Zhang, Chaomin Shen, Feifei Feng, Chun Shan, Jian Tang
Our work significantly outperforms the state-of-the-art for three-finger robotic hands.
no code implementations • 23 Mar 2023 • Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
CP$^3$ is elaborately designed to leverage the characteristics of point clouds and PNNs in order to enable 2D channel pruning methods for PNNs.
no code implementations • CVPR 2023 • Yaomin Huang, Ning Liu, Zhengping Che, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Guixu Zhang, Xinmei Liu, Feifei Feng, Jian Tang
Directly implementing the 2D CNN channel pruning methods to PNNs undermine the performance of PNNs because of the different representations of 2D images and 3D point clouds as well as the network architecture disparity.
no code implementations • CVPR 2023 • Yichen Zhu, Qiqi Zhou, Ning Liu, Zhiyuan Xu, Zhicai Ou, Xiaofeng Mou, Jian Tang
Unlike existing works that struggle to balance the trade-off between inference speed and SOD performance, in this paper, we propose a novel Scale-aware Knowledge Distillation (ScaleKD), which transfers knowledge of a complex teacher model to a compact student model.
1 code implementation • 21 Nov 2022 • Zhen Tian, Ting Bai, Zibin Zhang, Zhiyuan Xu, Kangyi Lin, Ji-Rong Wen, Wayne Xin Zhao
Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for accelerating the online model inference.
1 code implementation • 24 Jul 2022 • Yaomin Huang, Xinmei Liu, Yichen Zhu, Zhiyuan Xu, Chaomin Shen, Zhengping Che, Guixu Zhang, Yaxin Peng, Feifei Feng, Jian Tang
Detecting 3D objects from point clouds is a practical yet challenging task that has attracted increasing attention recently.
no code implementations • 10 Jul 2022 • Kun Wu, Chengxiang Yin, Jian Tang, Zhiyuan Xu, Yanzhi Wang, Dejun Yang
In this paper, we define a new problem called continual few-shot learning, in which tasks arrive sequentially and each task is associated with a few training samples.
no code implementations • CVPR 2022 • Haowen Wang, Mingyuan Wang, Zhengping Che, Zhiyuan Xu, XIUQUAN QIAO, Mengshi Qi, Feifei Feng, Jian Tang
In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map.
1 code implementation • 17 Feb 2022 • Yinuo Zhao, Kun Wu, Zhiyuan Xu, Zhengping Che, Qi Lu, Jian Tang, Chi Harold Liu
Vision-based autonomous urban driving in dense traffic is quite challenging due to the complicated urban environment and the dynamics of the driving behaviors.
no code implementations • ICCV 2021 • Chengxiang Yin, Kun Wu, Zhengping Che, Bo Jiang, Zhiyuan Xu, Jian Tang
Deep learning has made tremendous success in computer vision, natural language processing and even visual-semantic learning, which requires a huge amount of labeled training data.
1 code implementation • NeurIPS 2020 • Zhiyuan Xu, Kun Wu, Zhengping Che, Jian Tang, Jieping Ye
While Deep Reinforcement Learning (DRL) has emerged as a promising approach to many complex tasks, it remains challenging to train a single DRL agent that is capable of undertaking multiple different continuous control tasks.
no code implementations • 6 Jul 2019 • Ning Liu, Xiaolong Ma, Zhiyuan Xu, Yanzhi Wang, Jian Tang, Jieping Ye
This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem.
no code implementations • 8 Jun 2018 • Chengxiang Yin, Jian Tang, Zhiyuan Xu, Yanzhi Wang
Meta-learning enables a model to learn from very limited data to undertake a new task.
no code implementations • 2 Mar 2018 • Teng Li, Zhiyuan Xu, Jian Tang, Yanzhi Wang
Specifically, we, for the first time, propose to leverage emerging Deep Reinforcement Learning (DRL) for enabling model-free control in DSDPSs; and present design, implementation and evaluation of a novel and highly effective DRL-based control framework, which minimizes average end-to-end tuple processing time by jointly learning the system environment via collecting very limited runtime statistics data and making decisions under the guidance of powerful Deep Neural Networks.
no code implementations • 28 Jan 2018 • Ning Liu, Ying Liu, Brent Logan, Zhiyuan Xu, Jian Tang, Yanzhi Wang
This paper presents the first deep reinforcement learning (DRL) framework to estimate the optimal Dynamic Treatment Regimes from observational medical data.
no code implementations • 17 Jan 2018 • Zhiyuan Xu, Jian Tang, Jingsong Meng, Weiyi Zhang, Yanzhi Wang, Chi Harold Liu, Dejun Yang
Modern communication networks have become very complicated and highly dynamic, which makes them hard to model, predict and control.
no code implementations • 13 Mar 2017 • Ning Liu, Zhe Li, Zhiyuan Xu, Jielong Xu, Sheng Lin, Qinru Qiu, Jian Tang, Yanzhi Wang
Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system.