Search Results for author: Yi Zhu

Found 139 papers, 54 papers with code

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

no code implementations3 Dec 2024 Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang, xiangyang xue, Yanwei Fu

Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects.

Robotic Grasping

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

no code implementations14 Nov 2024 Youpeng Wen, Junfan Lin, Yi Zhu, Jianhua Han, Hang Xu, Shen Zhao, Xiaodan Liang

Specifically, in the first stage, VidMan is pre-trained on the Open X-Embodiment dataset (OXE) for predicting future visual trajectories in a video denoising diffusion manner, enabling the model to develop a long horizontal awareness of the environment's dynamics.

Denoising Robot Manipulation +2

Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge

no code implementations9 Oct 2024 Yi Zhu, Chirag Goel, Surya Koppisetti, Trang Tran, Ankur Kumar, Gaurav Bharaj

Our system SLIM learns the style-linguistics dependency embeddings from various types of bonafide speech using self-supervised contrastive learning.

Audio Deepfake Detection Contrastive Learning +2

Cross-Organ Domain Adaptive Neural Network for Pancreatic Endoscopic Ultrasound Image Segmentation

no code implementations7 Sep 2024 Zhichao Yan, Hui Xue, Yi Zhu, Bin Xiao, Hao Yuan

Accurate segmentation of lesions in pancreatic endoscopic ultrasound (EUS) images is crucial for effective diagnosis and treatment.

Domain Adaptation Image Segmentation +1

UNIT: Unifying Image and Text Recognition in One Vision Encoder

no code implementations6 Sep 2024 Yi Zhu, Yanpeng Zhou, Chunwei Wang, Yang Cao, Jianhua Han, Lu Hou, Hang Xu

Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities.

Decoder Optical Character Recognition (OCR)

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

no code implementations26 Jul 2024 Yi Zhu, Surya Koppisetti, Trang Tran, Gaurav Bharaj

The learned features are then used in complement with standard pretrained acoustic features (e. g., Wav2vec) to learn a classifier on the real and fake classes.

Audio Deepfake Detection DeepFake Detection +1

WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model

1 code implementation26 Jun 2024 Yi Zhu, Tiago Falk

Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring.

Privacy Preserving

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

no code implementations17 Jun 2024 Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, JianPing Wang

To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction.

Autonomous Driving Trajectory Prediction

UrBAN: Urban Beehive Acoustics and PheNotyping Dataset

no code implementations5 Jun 2024 Mahsa Abdollahi, Yi Zhu, Heitor R. Guimarães, Nico Coallier, Ségolène Maucourt, Pierre Giovenazzo, Tiago H. Falk

In this paper, we present a multimodal dataset obtained from a honey bee colony in Montr\'eal, Quebec, Canada, spanning the years of 2021 to 2022.

Correctable Landmark Discovery via Large Models for Vision-Language Navigation

1 code implementation29 May 2024 Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang

To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery.

Vision-Language Navigation

You Only Cache Once: Decoder-Decoder Architectures for Language Models

1 code implementation8 May 2024 Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.

Decoder Retrieval

Prompt-tuning for Clickbait Detection via Text Summarization

no code implementations17 Apr 2024 Haoxiang Deng, Yi Zhu, Ye Wang, Jipeng Qiang, Yunhao Yuan, Yun Li, Runmei Zhang

To address this problem, we propose a prompt-tuning method for clickbait detection via text summarization in this paper, text summarization is introduced to summarize the contents, and clickbait detection is performed based on the similarity between the generated summary and the contents.

Clickbait Detection Semantic Similarity +2

Cross-to-merge training with class balance strategy for learning with noisy labels

1 code implementation Expert Systems with Applications 2024 Qian Zhang, Yi Zhu, Ming Yang, Ge Jin, YingWen Zhu, Qiu Chen

Although sample selection is a mainstream method in the field of learning with noisy labels, which aims to mitigate the impact of noisy labels during model training, the testing performance of these methods exhibits significant fluctuations across different noise rates and types.

Learning with noisy labels

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation

no code implementations13 Mar 2024 ZiCheng Zhang, Tong Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Qixiang Ye, Wei Ke

To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information. Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.

Decoder Language Modelling +2

Weak Collocation Regression for Inferring Stochastic Dynamics with Lévy Noise

no code implementations13 Mar 2024 Liya Guo, Liwei Lu, Zhijun Zeng, Pipi Hu, Yi Zhu

In this work, we propose a Weak Collocation Regression (WCR) to explicitly reveal unknown stochastic dynamical systems, i. e., the Stochastic Differential Equation (SDE) with both $\alpha$-stable L\'{e}vy noise and Gaussian noise, from discrete aggregate data.

regression

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

no code implementations9 Mar 2024 Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.

Contrastive Learning Navigate +1

Reconstruction of dynamical systems from data without time labels

no code implementations7 Dec 2023 Zhijun Zeng, Pipi Hu, Chenglong Bao, Yi Zhu, Zuoqiang Shi

In this paper, we study the method to reconstruct dynamical systems from data without time labels.

Efficient Large Language Models: A Survey

3 code implementations6 Dec 2023 Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang

We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient LLMs research and inspire them to contribute to this important and exciting field.

Natural Language Understanding Survey +1

Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection

1 code implementation15 Sep 2023 Yi Zhu, Saurabh Powar, Tiago H. Falk

Existing deepfake speech detection systems lack generalizability to unseen attacks (i. e., samples generated by generative algorithms not seen during training).

DeepFake Detection Face Swapping

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

no code implementations ICCV 2023 Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin Li, Guangrun Wang, Xiaodan Liang

To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence.

Segmentation Semantic Segmentation +1

Multilingual Lexical Simplification via Paraphrase Generation

1 code implementation28 Jul 2023 Kang Liu, Jipeng Qiang, Yun Li, Yunhao Yuan, Yi Zhu, Kaixun Hua

After feeding the input sentence into the encoder of paraphrase modeling, we generate the substitutes based on a novel decoding strategy that concentrates solely on the lexical variations of the complex word.

Diversity Lexical Simplification +4

PreDiff: Precipitation Nowcasting with Latent Diffusion Models

1 code implementation NeurIPS 2023 Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang

We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset.

Denoising Earth Observation

Temporal Difference Learning for High-Dimensional PIDEs with Jumps

no code implementations6 Jul 2023 Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu

In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning.

Clickbait Detection via Large Language Models

1 code implementation16 Jun 2023 Han Wang, Yi Zhu, Ye Wang, Yun Li, Yunhao Yuan, Jipeng Qiang

Clickbait, which aims to induce users with some surprising and even thrilling headlines for increasing click-through rates, permeates almost all online content publishers, such as news portals and social media.

Clickbait Detection

Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation

1 code implementation16 May 2023 Yuxin Ren, Zihan Zhong, Xingjian Shi, Yi Zhu, Chun Yuan, Mu Li

It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer.

Knowledge Distillation text-classification +2

ParaLS: Lexical Substitution via Pretrained Paraphraser

1 code implementation14 May 2023 Jipeng Qiang, Kang Liu, Yun Li, Yunhao Yuan, Yi Zhu

Lexical substitution (LS) aims at finding appropriate substitutes for a target word in a sentence.

Sentence

On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection

no code implementations5 Apr 2023 Yi Zhu, Mohamed Imoussaïne-Aïkous, Carolyn Côté-Lussier, Tiago H. Falk

We validate the effectiveness of the anonymization methods, compare their computational complexity, and quantify the impact across different testing scenarios for both within- and across-dataset conditions.

COVID-19 Diagnosis Data Augmentation

Sentence Simplification via Large Language Models

2 code implementations23 Feb 2023 Yutao Feng, Jipeng Qiang, Yun Li, Yunhao Yuan, Yi Zhu

Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning.

Few-Shot Learning Sentence

Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

no code implementations13 Feb 2023 Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position.

Re-Ranking Vision-Language Navigation

Towards Geospatial Foundation Models via Continual Pretraining

2 code implementations ICCV 2023 Matias Mendieta, Boran Han, Xingjian Shi, Yi Zhu, Chen Chen

Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.

Change Detection Continual Pretraining +5

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

no code implementations7 Feb 2023 Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha

Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.

Semantic Segmentation

AIM: Adapting Image Models for Efficient Video Action Recognition

1 code implementation6 Feb 2023 Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li

Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.

Ranked #3 on Action Recognition on Diving-48 (using extra training data)

Action Classification Action Recognition +2

ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency

1 code implementation31 Jan 2023 Pengzhen Ren, Changlin Li, Hang Xu, Yi Zhu, Guangrun Wang, Jianzhuang Liu, Xiaojun Chang, Xiaodan Liang

Specifically, we first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.

Segmentation Semantic Segmentation

What Makes for Good Tokenizers in Vision Transformer?

no code implementations21 Dec 2022 Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

no code implementations4 Dec 2022 ZiCheng Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Wei Ke

Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target.

Image Segmentation Semantic Segmentation +3

Edge Deep Learning Enabled Freezing of Gait Detection in Parkinson's Patients

no code implementations27 Nov 2022 Ourong Lin, Tian Yu, Yuhan Hou, Yi Zhu, Xilin Liu

In a validation using a public dataset, the prototype developed achieved a FoG detection sensitivity of 88. 8% and an F1 score of 85. 34%, using less than 20 k trainable parameters per sensor node.

Deep Learning

Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

no code implementations2 Nov 2022 Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang

Inspired by the success of vision-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner.

Object object-detection +5

Weak Collocation Regression method: fast reveal hidden stochastic dynamics from high-dimensional aggregate data

no code implementations6 Sep 2022 Liwei Lu, Zhijun Zeng, Yan Jiang, Yi Zhu, Pipi Hu

Taking the collocations of Gaussian functions as the test functions in the weak form of the FP equation, we transfer the derivatives to the Gaussian functions and thus approximate the weak form by the expectational sum of the data.

regression

Earthformer: Exploring Space-Time Transformers for Earth System Forecasting

2 code implementations12 Jul 2022 Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Yuyang Wang, Mu Li, Dit-yan Yeung

With the explosive growth of the spatiotemporal Earth observation data in the past decade, data-driven models that apply Deep Learning (DL) are demonstrating impressive potential for various Earth system forecasting tasks.

Earth Observation Earth Surface Forecasting +1

Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations

1 code implementation11 Jul 2022 Andrii Zadaianchuk, Matthaeus Kleindessner, Yi Zhu, Francesco Locatello, Thomas Brox

In this paper, we show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago.

Clustering Object +3

Pixel-level Correspondence for Self-Supervised Learning from Video

no code implementations8 Jul 2022 Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox

While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision.

Contrastive Learning Image Classification +4

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

no code implementations CVPR 2022 Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang, Jianzhuang Liu, Xiaodan Liang

Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i. e., make instruction-asked actions sequentially in complex visual environments.

Vision-Language Navigation

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations11 May 2022 Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue

no code implementations18 Apr 2022 Jiudong Yang, Peiying Wang, Yi Zhu, Mingchao Feng, Meng Chen, Xiaodong He

Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems.

Contrastive Learning Data Augmentation +2

Chinese Idiom Paraphrasing

1 code implementation15 Apr 2022 Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li, Yunhao Yuan, Yi Zhu, Xindong Wu

Idioms, are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters.

Machine Translation Paraphrase Generation +1

Harnessing Interpretable Machine Learning for Holistic Inverse Design of Origami

2 code implementations12 Apr 2022 Yi Zhu, Evgueni T. Filipov

This work harnesses interpretable machine learning methods to address the challenging inverse design problem of origami-inspired systems.

BIG-bench Machine Learning Interpretable Machine Learning

ImpDet: Exploring Implicit Fields for 3D Object Detection

no code implementations31 Mar 2022 Xuelin Qian, Li Wang, Yi Zhu, Li Zhang, Yanwei Fu, xiangyang xue

Conventional 3D object detection approaches concentrate on bounding boxes representation learning with several parameters, i. e., localization, dimension, and orientation.

3D Object Detection Object +2

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis

no code implementations22 Mar 2022 Zexun Wang, Yuquan Le, Yi Zhu, Yuming Zhao, Mingchao Feng, Meng Chen, Xiaodong He

Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Prompt-Learning for Short Text Classification

no code implementations23 Feb 2022 Yi Zhu, Xinke Zhou, Jipeng Qiang, Yun Li, Yunhao Yuan, Xindong Wu

In the short text, the extremely short length, feature sparsity, and high ambiguity pose huge challenges to classification tasks.

text-classification Text Classification

Learning Canonical F-Correlation Projection for Compact Multiview Representation

no code implementations CVPR 2022 Yun-Hao Yuan, Jin Li, Yun Li, Jipeng Qiang, Yi Zhu, Xiaobo Shen, Jianping Gou

With this framework as a tool, we propose a correlative covariation projection (CCP) method by using an explicit nonlinear mapping.

Representation Learning

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation

1 code implementation8 Dec 2021 Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang

The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.

Contrastive Learning Navigate +1

Blending Anti-Aliasing into Vision Transformer

no code implementations NeurIPS 2021 Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.

Progressive Coordinate Transforms for Monocular 3D Object Detection

1 code implementation NeurIPS 2021 Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, xiangyang xue

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment.

AI Agent Monocular 3D Object Detection +3

A Unified Efficient Pyramid Transformer for Semantic Segmentation

no code implementations29 Jul 2021 Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo wu, Yanwei Fu, Mu Li

Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries.

Segmentation Semantic Segmentation

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

1 code implementation23 Jul 2021 Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin

Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.

Vision and Language Navigation Vision-Language Navigation

Optimal Resource Allocation for Serverless Queries

no code implementations19 Jul 2021 Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen

We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.

Data Augmentation

Deep Learning for Embodied Vision Navigation: A Survey

no code implementations7 Jul 2021 Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang

A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc.

Autonomous Driving Deep Learning +3

AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain Adaptation

no code implementations24 Jun 2021 Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam

Neural network-based semantic segmentation has achieved remarkable results when large amounts of annotated data are available, that is, in the supervised case.

Neural Architecture Search Semantic Segmentation +1

Analyzing Adversarial Robustness of Deep Neural Networks in Pixel Space: a Semantic Perspective

no code implementations18 Jun 2021 Lina Wang, Xingshu Chen, Yulong Wang, Yawei Yue, Yi Zhu, Xuemei Zeng, Wei Wang

Previous works study the adversarial robustness of image classifiers on image level and use all the pixel information in an image indiscriminately, lacking of exploration of regions with different semantic meanings in the pixel space of an image.

Adversarial Robustness

Domain Consensus Clustering for Universal Domain Adaptation

1 code implementation CVPR 2021 Guangrui Li, Guoliang Kang, Yi Zhu, Yunchao Wei, Yi Yang

To better exploit the intrinsic structure of the target domain, we propose Domain Consensus Clustering (DCC), which exploits the domain consensus knowledge to discover discriminative clusters on both common samples and private ones.

Clustering domain classification +3

VidTr: Video Transformer Without Convolutions

no code implementations ICCV 2021 Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe

We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.

Action Classification Action Recognition +1

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

1 code implementation CVPR 2021 Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.

Attribute Navigate +2

A phase field model for mass transport with semi-permeable interfaces

no code implementations11 Mar 2021 Yuzhe Qin, Huaxiong Huang, Yi Zhu, Chun Liu, Shixin Xu

Numerical simulations first illustrate the consistency of theoretical results on the sharp interface limit.

Numerical Analysis Numerical Analysis 76Z99, 92B05, 76R50

Rapid Multi-Physics Simulation for Electro-Thermal Origami Systems

1 code implementation19 Feb 2021 Yi Zhu, Evgueni T. Filipov

Electro-thermally actuated origami provides a novel method for creating 3-D systems with advanced morphing and functional capabilities.

Robotics

Three-fold Weyl points in the Schrödinger operator with periodic potentials

no code implementations17 Feb 2021 Haimo Guo, Meirong Zhang, Yi Zhu

Weyl points are degenerate points on the spectral bands at which energy bands intersect conically.

Mathematical Physics Mathematical Physics Spectral Theory

CrossNorm and SelfNorm for Generalization under Distribution Shifts

1 code implementation ICCV 2021 Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris Metaxas

Can we develop new normalization methods to improve generalization robustness under distribution shifts?

Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification

1 code implementation EACL 2021 Yi Zhu, Ehsan Shareghi, Yingzhen Li, Roi Reichart, Anna Korhonen

Semi-supervised learning through deep generative models and multi-lingual pretraining techniques have orchestrated tremendous success across different areas of NLP.

Classification Document Classification +1

Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation

no code implementations ICCV 2021 Yi Zhu, Yue Weng, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Yutong Lu, Jianbin Jiao

Vision-Dialog Navigation (VDN) requires an agent to ask questions and navigate following the human responses to find target objects.

Imitation Learning Navigate

Unity of Opposites: SelfNorm and CrossNorm for Model Robustness

no code implementations1 Jan 2021 Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris N. Metaxas

CrossNorm exchanges styles between feature channels to perform style augmentation, diversifying the content and style mixtures.

Object Recognition Unity

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

no code implementations ACL 2021 Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vulić, Roi Reichart, Anna Korhonen, Hinrich Schütze

Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT.

Few-Shot Learning

NUTA: Non-uniform Temporal Aggregation for Action Recognition

no code implementations15 Dec 2020 Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe

In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video.

Action Recognition

Global Regularity to Incompressible Viscoelastic System With a Class of Large Initial Data

no code implementations15 Dec 2020 Yi Zhu

The global existence of solutions to incompressible viscoelastic flows has been a longstanding open problem, even for the global weak solution.

Analysis of PDEs 76A10, 76D03, 35B65

Improving adversarial robustness of deep neural networks by using semantic information

no code implementations18 Aug 2020 Li-Na Wang, Rui Tang, Yawei Yue, Xingshu Chen, Wei Wang, Yi Zhu, Xuemei Zeng

The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original inputs, raises concerns about the robustness of DNNs to such attacks.

Adversarial Attack Adversarial Robustness

LSBert: A Simple Framework for Lexical Simplification

1 code implementation25 Jun 2020 Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu

Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning, to simplify the sentence.

Language Modelling Lexical Simplification +2

Identification of hydrodynamic instability by convolutional neural networks

no code implementations2 Jun 2020 Wuyue Yang, Liangrong Peng, Yi Zhu, Liu Hong

The onset of hydrodynamic instabilities is of great importance in both industry and daily life, due to the dramatic mechanical and thermodynamic changes for different types of flow motions.

When Machine Learning Meets Multiscale Modeling in Chemical Reactions

no code implementations1 Jun 2020 Wuyue Yang, Liangrong Peng, Yi Zhu, Liu Hong

Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applications of traditional machine learning algorithms may face with many difficulties.

BIG-bench Machine Learning

Generating Semantically Valid Adversarial Questions for TableQA

no code implementations26 May 2020 Yi Zhu, Yiwei Zhou, Menglin Xia

Finally, we demonstrate that adversarial training with SAGE augmented data can improve performance and robustness of TableQA systems.

Adversarial Attack Question Answering +1

Revealing hidden dynamics from time-series data by ODENet

no code implementations11 May 2020 Pipi Hu, Wuyue Yang, Yi Zhu, Liu Hong

To derive the hidden dynamics from observed data is one of the fundamental but also challenging problems in many different fields.

BIG-bench Machine Learning Numerical Integration +2

Improving Semantic Segmentation via Self-Training

no code implementations30 Apr 2020 Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.

Domain Generalization Segmentation +1

Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior

1 code implementation ECCV 2020 Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang

Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored.

Adversarial Attack Video Classification

Vision-Dialog Navigation by Exploring Cross-modal Memory

1 code implementation CVPR 2020 Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang

Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step.

Decision Making

Generalizing Deep Models for Overhead Image Segmentation Through Getis-Ord Gi* Pooling

no code implementations23 Dec 2019 Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam

Inspired by this, we investigate methods to inform or guide deep learning models for geospatial image analysis to increase their performance when a limited amount of training data is available or when they are applied to scenarios other than which they were trained on.

Image Segmentation Semantic Segmentation

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks

no code implementations CVPR 2020 Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang

In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.

Navigate Vision-Language Navigation

On Constructing Confidence Region for Model Parameters in Stochastic Gradient Descent via Batch Means

no code implementations4 Nov 2019 Yi Zhu, Jing Dong

In this paper, we study a simple algorithm to construct asymptotically valid confidence regions for model parameters using the batch means method.

valid

Uncertainty Quantification and Exploration for Reinforcement Learning

no code implementations ICLR 2020 YI Zhu, Jing Dong, Henry Lam

We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy.

reinforcement-learning Reinforcement Learning +3

Semi-supervised representation learning via dual autoencoders for domain adaptation

1 code implementation4 Aug 2019 Shuai Yang, Hao Wang, Yuhong Zhang, Pei-Pei Li, Yi Zhu, Xuegang Hu

Domain adaptation aims to exploit the knowledge in source domain to promote the learning tasks in target domain, which plays a critical role in real-world applications.

Denoising Representation Learning +1

Motion-Aware Feature for Improved Video Anomaly Detection

no code implementations24 Jul 2019 Yi Zhu, Shawn Newsam

Motivated by our observation that motion information is the key to good anomaly detection performance in video, we propose a temporal augmented network to learn a motion-aware feature.

Action Recognition Anomaly Detection +2

Lexical Simplification with Pretrained Encoders

3 code implementations14 Jul 2019 Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu

Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning.

Language Modelling Lexical Simplification +1

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

3 code implementations9 Jul 2019 Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

Deep Learning

Bayesian Learning for Neural Dependency Parsing

no code implementations NAACL 2019 Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart, Anna Korhonen

While neural dependency parsers provide state-of-the-art accuracy for several languages, they still rely on large amounts of costly labeled training data.

Dependency Parsing POS +2

Exploring Temporal Information for Improved Video Understanding

1 code implementation25 May 2019 Yi Zhu

In this dissertation, I present my work towards exploring temporal information for better video understanding.

Action Recognition Optical Flow Estimation +5

A Systematic Study of Leveraging Subword Information for Learning Word Representations

1 code implementation NAACL 2019 Yi Zhu, Ivan Vulić, Anna Korhonen

The use of subword-level information (e. g., characters, character n-grams, morphemes) has become ubiquitous in modern word representation learning.

Dependency Parsing Entity Typing +3

Using Conditional Generative Adversarial Networks to Generate Ground-Level Views From Overhead Imagery

no code implementations19 Feb 2019 Xueqing Deng, Yi Zhu, Shawn Newsam

This paper develops a deep-learning framework to synthesize a ground-level view of a location given an overhead image.

Decoder General Classification +2

Improving Semantic Segmentation via Video Propagation and Label Relaxation

5 code implementations CVPR 2019 Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.

Ranked #2 on Semantic Segmentation on KITTI Semantic Segmentation (using extra training data)

Segmentation Semantic Segmentation +1

Random Temporal Skipping for Multirate Video Analysis

no code implementations30 Oct 2018 Yi Zhu, Shawn Newsam

However, this does not work well for multirate videos, in which actions or subactions occur at different speeds.

Action Recognition Optical Flow Estimation +2

Gated Transfer Network for Transfer Learning

no code implementations30 Oct 2018 Yi Zhu, Jia Xue, Shawn Newsam

Deep neural networks have led to a series of breakthroughs in computer vision given sufficient annotated training datasets.

feature selection Transfer Learning

Learning Optical Flow via Dilated Networks and Occlusion Reasoning

no code implementations7 May 2018 Yi Zhu, Shawn Newsam

Despite the significant progress that has been made on estimating optical flow recently, most estimation methods, including classical and deep learning approaches, still have difficulty with multi-scale estimation, real-time computation, and/or occlusion reasoning.

Action Recognition Optical Flow Estimation +1

DenseNet for Dense Flow

1 code implementation19 Jul 2017 Yi Zhu, Shawn Newsam

Classical approaches for estimating optical flow have achieved rapid progress in the last decade.

Motion Estimation Optical Flow Estimation

Large-Scale Mapping of Human Activity using Geo-Tagged Videos

no code implementations24 Jun 2017 Yi Zhu, Sen Liu, Shawn Newsam

This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos.

UC Merced Submission to the ActivityNet Challenge 2016

no code implementations11 Apr 2017 Yi Zhu, Shawn Newsam, Zaikun Xu

This notebook paper describes our system for the untrimmed classification task in the ActivityNet challenge 2016.

Action Recognition General Classification +1

Hidden Two-Stream Convolutional Networks for Action Recognition

3 code implementations2 Apr 2017 Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs.

Action Recognition Optical Flow Estimation +2

Guided Optical Flow Learning

no code implementations8 Feb 2017 Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann

We study the unsupervised learning of CNNs for optical flow estimation using proxy ground truth data.

Image Reconstruction Optical Flow Estimation

Deep Local Video Feature for Action Recognition

no code implementations25 Jan 2017 Zhenzhong Lan, Yi Zhu, Alexander G. Hauptmann

We investigate the problem of representing an entire video using CNN features for human action recognition.

Action Recognition Temporal Action Localization

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

no code implementations22 Dec 2016 Yi Zhu, Shawn Newsam

We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition, and action localization refinement in parallel instead of the standard sequential pipeline that performs the steps in order.

Action Detection Action Recognition +5

Spatio-Temporal Sentiment Hotspot Detection Using Geotagged Photos

no code implementations21 Sep 2016 Yi Zhu, Shawn Newsam

We perform spatio-temporal analysis of public sentiment using geotagged photo collections.

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

no code implementations15 Aug 2016 Yi Zhu, Shawn Newsam

This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.