Search Results for author: Jia Pan

Found 56 papers, 22 papers with code

Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation

1 code implementation ACL 2022 Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +6

Voice Attribute Editing with Text Prompt

no code implementations13 Apr 2024 Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt.


LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation

no code implementations26 Mar 2024 Ke Guo, Zhenwei Miao, Wei Jing, Weiwei Liu, Weizi Li, Dayang Hao, Jia Pan

Due to the covariate shift issue, existing imitation learning-based simulators often fail to generate stable long-term simulations.

Imitation Learning

NetTrack: Tracking Highly Dynamic Objects with a Net

no code implementations17 Mar 2024 Guangze Zheng, ShiJie Lin, Haobo Zuo, Changhong Fu, Jia Pan

Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects.

Multi-Object Tracking Object

Neuromorphic Synergy for Video Binarization

1 code implementation20 Feb 2024 ShiJie Lin, Xiang Zhang, Lei Yang, Lei Yu, Bin Zhou, Xiaowei Luo, Wenping Wang, Jia Pan

We also develop an efficient integration method to propagate this binary image to high frame rate binary video.

Binarization Camera Calibration +1

Evaluating Explanation Methods for Vision-and-Language Navigation

no code implementations10 Oct 2023 Guanqi Chen, Lei Yang, Guanhua Chen, Jia Pan

The ability to navigate robots with natural language instructions in an unknown environment is a crucial step for achieving embodied artificial intelligence (AI).

Decision Making Navigate +3

Memory-Constrained Semantic Segmentation for Ultra-High Resolution UAV Imagery

no code implementations7 Oct 2023 Qi Li, Jiaxin Cai, Yuanlong Yu, Jason Gu, Jia Pan, Wenxi Liu

Within the domain of UAV imagery analysis, the segmentation of ultra-high resolution images emerges as a substantial and intricate challenge, especially when grappling with the constraints imposed by GPU memory-restricted computational devices.

Segmentation Semantic Segmentation

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations15 Sep 2023 Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

no code implementations28 Aug 2023 Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.

speaker-diarization Speaker Diarization +2

mCLIP: Multilingual CLIP via Cross-lingual Transfer

1 code implementation ACL 2023 Guanhua Chen, Lu Hou, Yun Chen, Wenliang Dai, Lifeng Shang, Xin Jiang, Qun Liu, Jia Pan, Wenping Wang

Furthermore, to enhance the token- and sentence-level multilingual representation of the MTE, we propose to train it with machine translation and contrastive learning jointly before the TriKD to provide a better initialization.

Contrastive Learning Cross-Lingual Transfer +7

SAM-DA: UAV Tracks Anything at Night with SAM-Powered Domain Adaptation

1 code implementation3 Jul 2023 Changhong Fu, Liangliang Yao, Haobo Zuo, Guangze Zheng, Jia Pan

However, the state-of-the-art (SOTA) DA still lacks the potential object with accurate pixel-level location and boundary to generate the high-quality target domain training sample.

Domain Adaptation Transfer Learning +1

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

1 code implementation1 Jun 2023 Hong-Yu Zhou, Yizhou Yu, Chengdi Wang, Shu Zhang, Yuanxu Gao, Jia Pan, Jun Shao, Guangming Lu, Kang Zhang, Weimin Li

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results.

Representation Learning

Mixed Traffic Control and Coordination from Pixels

no code implementations17 Feb 2023 Michael Villarreal, Bibek Poudel, Jia Pan, Weizi Li

In certain scenarios, our approach even outperforms using precision observations, e. g., up to 8% increase in average vehicle velocity in the merge environment, despite only using local traffic information as opposed to global traffic information.

Reinforcement Learning (RL)

Learning to Control and Coordinate Mixed Traffic Through Robot Vehicles at Complex and Unsignalized Intersections

1 code implementation12 Jan 2023 Dawei Wang, Weizi Li, Lei Zhu, Jia Pan

In contrast, without RVs, congestion starts to develop when the traffic demand reaches as low as 200 vehicles per hour.

Multi-agent Reinforcement Learning

Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

no code implementations6 Dec 2022 Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu

AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher.

Language Modelling

Siamese Object Tracking for Vision-Based UAM Approaching with Pairwise Scale-Channel Attention

1 code implementation26 Nov 2022 Guangze Zheng, Changhong Fu, Junjie Ye, Bowen Li, Geng Lu, Jia Pan

The key to the visual UAM approaching lies in object tracking, while current UAM tracking typically relies on costly model-based methods.

Object Object Tracking

Monocular BEV Perception of Road Scenes via Front-to-Top View Projection

no code implementations15 Nov 2022 Wenxi Liu, Qi Li, Weixiang Yang, Jiaxin Cai, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan

We propose a front-to-top view projection (FTVP) module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding.

Autonomous Driving Road Segmentation +1

Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions

no code implementations25 Jun 2022 Weilin Wan, Lei Yang, Lingjie Liu, Zhuoying Zhang, Ruixing Jia, Yi-King Choi, Jia Pan, Christian Theobalt, Taku Komura, Wenping Wang

We also observe that an object's intrinsic physical properties are useful for the object motion prediction, and thus design a set of object dynamic descriptors to encode such intrinsic properties.

Human-Object Interaction Detection motion prediction +1

ModLaNets: Learning Generalisable Dynamics via Modularity and Physical Inductive Bias

1 code implementation24 Jun 2022 Yupu Lu, ShiJie Lin, Guanqi Chen, Jia Pan

Deep learning models are able to approximate one specific dynamical system but struggle at learning generalisable dynamics, where dynamical systems obey the same laws of physics but contain different numbers of elements (e. g., double- and triple-pendulum systems).

Inductive Bias

Is Lip Region-of-Interest Sufficient for Lipreading?

no code implementations28 May 2022 Jing-Xuan Zhang, Gen-Shun Wan, Jia Pan

In this work, we propose to adopt the entire face for lipreading with self-supervised learning.

Lipreading Self-Supervised Learning +2

End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps

1 code implementation CVPR 2022 Ke Guo, Wenxi Liu, Jia Pan

In this paper, we aim to forecast a future trajectory distribution of a moving agent in the real world, given the social scene images and historical trajectories.

Trajectory Prediction

High-resolution Face Swapping via Latent Semantics Disentanglement

1 code implementation CVPR 2022 Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, Shengfeng He

Although previous research can leverage generative priors to produce high-resolution results, their quality can suffer from the entangled semantics of the latent space.

Disentanglement Face Swapping +2

Autofocus for Event Cameras

no code implementations CVPR 2022 ShiJie Lin, Yinqiang Zhang, Lei Yu, Bin Zhou, Xiaowei Luo, Jia Pan

Focus control (FC) is crucial for cameras to capture sharp images in challenging real-world scenarios.

Visual-Tactile Sensing for Real-time Liquid Volume Estimation in Grasping

no code implementations23 Feb 2022 Fan Zhu, Ruixing Jia, Lei Yang, Youcan Yan, Zheng Wang, Jia Pan, Wenping Wang

We propose a deep visuo-tactile model for realtime estimation of the liquid inside a deformable container in a proprioceptive way. We fuse two sensory modalities, i. e., the raw visual inputs from the RGB camera and the tactile cues from our specific tactile sensor without any extra sensor calibrations. The robotic system is well controlled and adjusted based on the estimation model in real time.

Multi-Task Learning

Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations

1 code implementation CVPR 2022 Zhixuan Zhong, Liangyu Chai, Yang Zhou, Bailin Deng, Jia Pan, Shengfeng He

This paper presents a Generative prior ReciprocAted Invertible rescaling Network (GRAIN) for generating faithful high-resolution (HR) images from low-resolution (LR) invertible images with an extreme upscaling factor (64x).

DiffSRL: Learning Dynamical State Representation for Deformable Object Manipulation with Differentiable Simulator

1 code implementation24 Oct 2021 Sirui Chen, Yunhao Liu, Jialong Li, Shang Wen Yao, Tingxiang Fan, Jia Pan

We propose DiffSRL, a dynamic state representation learning pipeline utilizing differentiable simulation that can embed complex dynamics models as part of the end-to-end training.

Deformable Object Manipulation Motion Planning +1

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

1 code implementation16 Oct 2021 Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei

When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +6

Modular Lagrangian Neural Networks: Designing Structures of Networks with Physical Inductive Biases

no code implementations29 Sep 2021 Yupu Lu, ShiJie Lin, Jia Pan

At the same time, we directly applied our trained models to predict the motion of multi-pendulum and multi-body systems, demonstrating the intriguing performance in the extrapolation of our method.

Learning Selective Communication for Multi-Agent Path Finding

1 code implementation12 Sep 2021 Ziyuan Ma, Yudong Luo, Jia Pan

Learning communication via deep reinforcement learning (RL) or imitation learning (IL) has recently been shown to be an effective way to solve Multi-Agent Path Finding (MAPF).

Imitation Learning Multi-Agent Path Finding +1

Learning-based Optoelectronically Innervated Tactile Finger for Rigid-Soft Interactive Grasping

no code implementations29 Jan 2021 Linhan Yang, Xudong Han, Weijie Guo, Fang Wan, Jia Pan, Chaoyang Song

This paper presents a novel design of a soft tactile finger with omni-directional adaptation using multi-channel optical fibers for rigid-soft interactive grasping.


A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

no code implementations25 Oct 2020 Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan

We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions.

regression Speech Enhancement

Over-crowdedness Alert! Forecasting the Future Crowd Distribution

no code implementations9 Jun 2020 Yuzhen Niu, Weifeng Shi, Wenxi Liu, Shengfeng He, Jia Pan, Antoni B. Chan

In this paper, we formulate a novel crowd analysis problem, in which we aim to predict the crowd distribution in the near future given sequential frames of a crowd video without any identity annotations.

Keyfilter-Aware Real-Time UAV Object Tracking

1 code implementation11 Mar 2020 Yiming Li, Changhong Fu, Ziyuan Huang, Yinqiang Zhang, Jia Pan

Correlation filter-based tracking has been widely applied in unmanned aerial vehicle (UAV) with high efficiency.

Object Object Tracking +2

Rigid-Soft Interactive Learning for Robust Grasping

2 code implementations29 Feb 2020 Linhan Yang, Fang Wan, Haokun Wang, Xiaobo Liu, Yujia Liu, Jia Pan, Chaoyang Song

We use soft, stuffed toys for training, instead of everyday objects, to reduce the integration complexity and computational burden and exploit such rigid-soft interaction by changing the gripper fingers to the soft ones when dealing with rigid, daily-life items such as the Yale-CMU-Berkeley (YCB) objects.

Small Data Image Classification

A Configuration-Space Decomposition Scheme for Learning-based Collision Checking

no code implementations17 Nov 2019 Yiheng Han, Wang Zhao, Jia Pan, Zipeng Ye, Ran Yi, Yong-Jin Liu

Motion planning for robots of high degrees-of-freedom (DOFs) is an important problem in robotics with sampling-based methods in configuration space C as one popular solution.

BIG-bench Machine Learning Motion Planning +1

Learning Resilient Behaviors for Navigation Under Uncertainty

no code implementations22 Oct 2019 Tingxiang Fan, Pinxin Long, Wenxi Liu, Jia Pan, Ruigang Yang, Dinesh Manocha

Deep reinforcement learning has great potential to acquire complex, adaptive behaviors for autonomous agents automatically.

Autonomous Driving

DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance

no code implementations4 Oct 2019 Qingyang Tan, Tingxiang Fan, Jia Pan, Dinesh Manocha

We present a novel algorithm (DeepMNavigate) for global multi-agent navigation in dense scenarios using deep reinforcement learning (DRL).

Collision Avoidance Position +3

Augmented Memory for Correlation Filters in Real-Time UAV Tracking

1 code implementation24 Sep 2019 Yiming Li, Changhong Fu, Fangqiang Ding, Ziyuan Huang, Jia Pan

The outstanding computational efficiency of discriminative correlation filter (DCF) fades away with various complicated improvements.

Computational Efficiency

Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery

no code implementations ICCV 2019 Xiaosheng Yan, Yuanlong Yu, Feigege Wang, Wenxi Liu, Shengfeng He, Jia Pan

We conduct comparison experiments on this dataset and demonstrate that our model outperforms the state-of-the-art in tasks of recovering segmentation mask and appearance for occluded vehicles.


Robust Shape Estimation for 3D Deformable Object Manipulation

1 code implementation26 Sep 2018 Tao Han, Xuan Zhao, Peigen Sun, Jia Pan

Existing shape estimation methods for deformable object manipulation suffer from the drawbacks of being off-line, model dependent, noise-sensitive or occlusion-sensitive, and thus are not appropriate for manipulation tasks requiring high precision.


Safe Navigation with Human Instructions in Complex Scenes

no code implementations12 Sep 2018 Zhe Hu, Jia Pan, Tingxiang Fan, Ruigang Yang, Dinesh Manocha

In this paper, we present a robotic navigation algorithm with natural language interfaces, which enables a robot to safely walk through a changing environment with moving persons by following human instructions such as "go to the restaurant and keep away from people".

Collision Avoidance Motion Planning +2

Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning

2 code implementations28 Sep 2017 Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan

We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system.

Collision Avoidance reinforcement-learning +1

Deep-Learned Collision Avoidance Policy for Distributed Multi-Agent Navigation

no code implementations22 Sep 2016 Pinxin Long, Wenxi Liu, Jia Pan

We validate the learned deep neural network policy in a set of simulated and real scenarios with noisy measurements and demonstrate that our method is able to generate a robust navigation strategy that is insensitive to imperfect sensing and works reliably in all situations.

Collision Avoidance

Cannot find the paper you are looking for? You can Submit a new open access paper.