no code implementations • 3 Dec 2024 • Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang, xiangyang xue, Yanwei Fu
Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects.
no code implementations • 14 Nov 2024 • Youpeng Wen, Junfan Lin, Yi Zhu, Jianhua Han, Hang Xu, Shen Zhao, Xiaodan Liang
Specifically, in the first stage, VidMan is pre-trained on the Open X-Embodiment dataset (OXE) for predicting future visual trajectories in a video denoising diffusion manner, enabling the model to develop a long horizontal awareness of the environment's dynamics.
no code implementations • 9 Oct 2024 • Yi Zhu, Chirag Goel, Surya Koppisetti, Trang Tran, Ankur Kumar, Gaurav Bharaj
Our system SLIM learns the style-linguistics dependency embeddings from various types of bonafide speech using self-supervised contrastive learning.
1 code implementation • 7 Oct 2024 • Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei
Transformer tends to overallocate attention to irrelevant context.
no code implementations • 26 Sep 2024 • Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-yan Yeung, Xiao Chen, Zhenguo Li, Wei zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models.
no code implementations • 7 Sep 2024 • Zhichao Yan, Hui Xue, Yi Zhu, Bin Xiao, Hao Yuan
Accurate segmentation of lesions in pancreatic endoscopic ultrasound (EUS) images is crucial for effective diagnosis and treatment.
no code implementations • 6 Sep 2024 • Yi Zhu, Yanpeng Zhou, Chunwei Wang, Yang Cao, Jianhua Han, Lu Hou, Hang Xu
Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities.
no code implementations • 26 Jul 2024 • Yi Zhu, Surya Koppisetti, Trang Tran, Gaurav Bharaj
The learned features are then used in complement with standard pretrained acoustic features (e. g., Wav2vec) to learn a classifier on the real and fake classes.
1 code implementation • 26 Jun 2024 • Yi Zhu, Tiago Falk
Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring.
no code implementations • 17 Jun 2024 • Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, JianPing Wang
To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction.
no code implementations • 5 Jun 2024 • Mahsa Abdollahi, Yi Zhu, Heitor R. Guimarães, Nico Coallier, Ségolène Maucourt, Pierre Giovenazzo, Tiago H. Falk
In this paper, we present a multimodal dataset obtained from a honey bee colony in Montr\'eal, Quebec, Canada, spanning the years of 2021 to 2022.
1 code implementation • 29 May 2024 • Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang
To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery.
1 code implementation • 8 May 2024 • Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.
no code implementations • 17 Apr 2024 • Haoxiang Deng, Yi Zhu, Ye Wang, Jipeng Qiang, Yunhao Yuan, Yun Li, Runmei Zhang
To address this problem, we propose a prompt-tuning method for clickbait detection via text summarization in this paper, text summarization is introduced to summarize the contents, and clickbait detection is performed based on the similarity between the generated summary and the contents.
1 code implementation • Expert Systems with Applications 2024 • Qian Zhang, Yi Zhu, Ming Yang, Ge Jin, YingWen Zhu, Qiu Chen
Although sample selection is a mainstream method in the field of learning with noisy labels, which aims to mitigate the impact of noisy labels during model training, the testing performance of these methods exhibits significant fluctuations across different noise rates and types.
Ranked #2 on Learning with noisy labels on Clothing1M
no code implementations • 13 Mar 2024 • ZiCheng Zhang, Tong Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Qixiang Ye, Wei Ke
To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information. Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.
no code implementations • 13 Mar 2024 • Liya Guo, Liwei Lu, Zhijun Zeng, Pipi Hu, Yi Zhu
In this work, we propose a Weak Collocation Regression (WCR) to explicitly reveal unknown stochastic dynamical systems, i. e., the Stochastic Differential Equation (SDE) with both $\alpha$-stable L\'{e}vy noise and Gaussian noise, from discrete aggregate data.
no code implementations • 9 Mar 2024 • Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin
For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts.
no code implementations • 7 Dec 2023 • Zhijun Zeng, Pipi Hu, Chenglong Bao, Yi Zhu, Zuoqiang Shi
In this paper, we study the method to reconstruct dynamical systems from data without time labels.
3 code implementations • 6 Dec 2023 • Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang
We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient LLMs research and inspire them to contribute to this important and exciting field.
1 code implementation • 17 Nov 2023 • Yi Zhu, Mahsa Abdollahi, Ségolène Maucourt, Nico Coallier, Heitor R. Guimarães, Pierre Giovenazzo, Tiago H. Falk
We then provide an overview of the phenotypic data distribution as well as a visualization of the sensor data patterns.
no code implementations • 20 Oct 2023 • Tingqin Lai, Xiaolin Liang, Yi Zhu, Xinyi Wu, Lianye Liao, Xuelin Yuan, Ping Su, Shihai Sun
However, to eliminate the symmetry blur in the reconstructed images, a fixed background is required.
1 code implementation • 15 Sep 2023 • Yi Zhu, Saurabh Powar, Tiago H. Falk
Existing deepfake speech detection systems lack generalizability to unseen attacks (i. e., samples generated by generative algorithms not seen during training).
no code implementations • ICCV 2023 • David Fan, Jue Wang, Shuai Liao, Yi Zhu, Vimal Bhat, Hector Santos-Villalobos, Rohith MV, Xinyu Li
This suggests that the random masking strategy that is inherited from the image MAE is less effective for video MAE.
no code implementations • ICCV 2023 • Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin Li, Guangrun Wang, Xiaodan Liang
To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence.
1 code implementation • 28 Jul 2023 • Kang Liu, Jipeng Qiang, Yun Li, Yunhao Yuan, Yi Zhu, Kaixun Hua
After feeding the input sentence into the encoder of paraphrase modeling, we generate the substitutes based on a novel decoding strategy that concentrates solely on the lexical variations of the complex word.
1 code implementation • NeurIPS 2023 • Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset.
no code implementations • 6 Jul 2023 • Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu
In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning.
no code implementations • 28 Jun 2023 • Zhenlin Xu, Yi Zhu, Tiffany Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo
This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity.
1 code implementation • 16 Jun 2023 • Han Wang, Yi Zhu, Ye Wang, Yun Li, Yunhao Yuan, Jipeng Qiang
Clickbait, which aims to induce users with some surprising and even thrilling headlines for increasing click-through rates, permeates almost all online content publishers, such as news portals and social media.
no code implementations • 28 May 2023 • Yuhan Hou, Jack Ji, Yi Zhu, Thomas Dell, Xilin Liu
The inference of the DL model is performed on a low-power microcontroller in the central node.
1 code implementation • 16 May 2023 • Yuxin Ren, Zihan Zhong, Xingjian Shi, Yi Zhu, Chun Yuan, Mu Li
It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer.
1 code implementation • 14 May 2023 • Jipeng Qiang, Kang Liu, Yun Li, Yunhao Yuan, Yi Zhu
Lexical substitution (LS) aims at finding appropriate substitutes for a target word in a sentence.
1 code implementation • 26 Apr 2023 • Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang
In MOTOR, we combine two kinds of basic medical knowledge, i. e., general and specific knowledge, in a complementary manner to boost the general pretraining process.
1 code implementation • NeurIPS 2023 • Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu sun
This work proposes POMP, a prompt pre-training method for vision-language models.
no code implementations • 5 Apr 2023 • Yi Zhu, Mohamed Imoussaïne-Aïkous, Carolyn Côté-Lussier, Tiago H. Falk
We validate the effectiveness of the anonymization methods, compare their computational complexity, and quantify the impact across different testing scenarios for both within- and across-dataset conditions.
2 code implementations • 23 Feb 2023 • Yutao Feng, Jipeng Qiang, Yun Li, Yunhao Yuan, Yi Zhu
Sentence Simplification aims to rephrase complex sentences into simpler sentences while retaining original meaning.
no code implementations • 13 Feb 2023 • Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu
Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position.
2 code implementations • ICCV 2023 • Matias Mendieta, Boran Han, Xingjian Shi, Yi Zhu, Chen Chen
Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.
no code implementations • 7 Feb 2023 • Yash Patel, Yusheng Xie, Yi Zhu, Srikar Appalaraju, R. Manmatha
Instead of purely relying on the alignment from the noisy data, this paper proposes a novel loss function termed SimCon, which accounts for intra-modal similarities to determine the appropriate set of positive samples to align.
1 code implementation • 6 Feb 2023 • Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li
Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.
Ranked #3 on Action Recognition on Diving-48 (using extra training data)
1 code implementation • 31 Jan 2023 • Pengzhen Ren, Changlin Li, Hang Xu, Yi Zhu, Guangrun Wang, Jianzhuang Liu, Xiaojun Chang, Xiaodan Liang
Specifically, we first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.
no code implementations • 21 Jan 2023 • Zhiqi Lin, Youshan Miao, Guodong Liu, Xiaoxiang Shi, Quanlu Zhang, Fan Yang, Saeed Maleki, Yi Zhu, Xu Cao, Cheng Li, Mao Yang, Lintao Zhang, Lidong Zhou
SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans.
no code implementations • 21 Dec 2022 • Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia
The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.
no code implementations • 21 Dec 2022 • M Saiful Bari, Aston Zhang, Shuai Zheng, Xingjian Shi, Yi Zhu, Shafiq Joty, Mu Li
Pre-trained large language models can efficiently interpolate human-written prompts in a natural way.
no code implementations • 15 Dec 2022 • JieLin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li
Multimodal image-text models have shown remarkable performance in the past few years.
no code implementations • 4 Dec 2022 • ZiCheng Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Wei Ke
Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target.
no code implementations • 27 Nov 2022 • Ourong Lin, Tian Yu, Yuhan Hou, Yi Zhu, Xilin Liu
In a validation using a public dataset, the prototype developed achieved a FoG detection sensitivity of 88. 8% and an F1 score of 85. 34%, using less than 20 k trainable parameters per sensor node.
no code implementations • 2 Nov 2022 • Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang
Inspired by the success of vision-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner.
no code implementations • 10 Oct 2022 • Yunhe Gao, Xingjian Shi, Yi Zhu, Hao Wang, Zhiqiang Tang, Xiong Zhou, Mu Li, Dimitris N. Metaxas
First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation.
Ranked #6 on Domain Adaptation on VisDA2017
no code implementations • 6 Sep 2022 • Liwei Lu, Zhijun Zeng, Yan Jiang, Yi Zhu, Pipi Hu
Taking the collocations of Gaussian functions as the test functions in the weak form of the FP equation, we transfer the derivatives to the Gaussian functions and thus approximate the weak form by the expectational sum of the data.
2 code implementations • 12 Jul 2022 • Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Yuyang Wang, Mu Li, Dit-yan Yeung
With the explosive growth of the spatiotemporal Earth observation data in the past decade, data-driven models that apply Deep Learning (DL) are demonstrating impressive potential for various Earth system forecasting tasks.
Ranked #1 on Earth Surface Forecasting on EarthNet2021 OOD Track
1 code implementation • 11 Jul 2022 • Andrii Zadaianchuk, Matthaeus Kleindessner, Yi Zhu, Francesco Locatello, Thomas Brox
In this paper, we show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago.
no code implementations • 8 Jul 2022 • Yash Sharma, Yi Zhu, Chris Russell, Thomas Brox
While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision.
1 code implementation • 4 Jul 2022 • Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li, Alex Smola, Zhangyang Wang
However, in real-world applications, it is common for the training sets to have long-tailed distributions.
1 code implementation • 16 Jun 2022 • Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Bo Li, Mu Li
Data augmentation is a necessity to enhance data efficiency in deep learning.
no code implementations • CVPR 2022 • Bingqian Lin, Yi Zhu, Zicong Chen, Xiwen Liang, Jianzhuang Liu, Xiaodan Liang
Vision-Language Navigation (VLN) is a challenging task that requires an embodied agent to perform action-level modality alignment, i. e., make instruction-asked actions sequentially in complex visual environments.
2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang
The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.
no code implementations • 18 Apr 2022 • Jiudong Yang, Peiying Wang, Yi Zhu, Mingchao Feng, Meng Chen, Xiaodong He
Turn-taking, aiming to decide when the next speaker can start talking, is an essential component in building human-robot spoken dialogue systems.
1 code implementation • 15 Apr 2022 • Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li, Yunhao Yuan, Yi Zhu, Xindong Wu
Idioms, are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters.
2 code implementations • 12 Apr 2022 • Yi Zhu, Evgueni T. Filipov
This work harnesses interpretable machine learning methods to address the challenging inverse design problem of origami-inspired systems.
no code implementations • 31 Mar 2022 • Xuelin Qian, Li Wang, Yi Zhu, Li Zhang, Yanwei Fu, xiangyang xue
Conventional 3D object detection approaches concentrate on bounding boxes representation learning with several parameters, i. e., localization, dimension, and orientation.
2 code implementations • 24 Mar 2022 • Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, xiangyang xue
Multiple datasets and open challenges for object detection have been introduced in recent years.
Ranked #1 on Object Detection on BigDetection val
no code implementations • 22 Mar 2022 • Zexun Wang, Yuquan Le, Yi Zhu, Yuming Zhao, Mingchao Feng, Meng Chen, Xiaodong He
Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 23 Feb 2022 • Yi Zhu, Xinke Zhou, Jipeng Qiang, Yun Li, Yunhao Yuan, Xindong Wu
In the short text, the extremely short length, feature sparsity, and high ambiguity pose huge challenges to classification tasks.
no code implementations • CVPR 2022 • Yun-Hao Yuan, Jin Li, Yun Li, Jipeng Qiang, Yi Zhu, Xiaobo Shen, Jianping Gou
With this framework as a tool, we propose a correlative covariation projection (CCP) method by using an explicit nonlinear mapping.
1 code implementation • 8 Dec 2021 • Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan Liang
The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction.
no code implementations • NeurIPS 2021 • Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.
no code implementations • ICCV 2021 • Mohammadreza Zolfaghari, Yi Zhu, Peter Gehler, Thomas Brox
Contrastive learning allows us to flexibly define powerful losses by contrasting positive pairs from sets of negative samples.
1 code implementation • Findings (EMNLP) 2021 • Xinyu Lu, Jipeng Qiang, Yun Li, Yunhao Yuan, Yi Zhu
The availability of parallel sentence simplification (SS) is scarce for neural SS modelings.
1 code implementation • NeurIPS 2021 • Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, xiangyang xue
Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment.
1 code implementation • 5 Aug 2021 • Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joseph Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li
Our formulation is able to capture global context in a video, thus robust to temporal content change.
no code implementations • 29 Jul 2021 • Fangrui Zhu, Yi Zhu, Li Zhang, Chongruo wu, Yanwei Fu, Mu Li
Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries.
1 code implementation • 23 Jul 2021 • Bingqian Lin, Yi Zhu, Yanxin Long, Xiaodan Liang, Qixiang Ye, Liang Lin
Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps.
no code implementations • 19 Jul 2021 • Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen
We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
no code implementations • 7 Jul 2021 • Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang
A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc.
no code implementations • 24 Jun 2021 • Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam
Neural network-based semantic segmentation has achieved remarkable results when large amounts of annotated data are available, that is, in the supervised case.
no code implementations • 18 Jun 2021 • Lina Wang, Xingshu Chen, Yulong Wang, Yawei Yue, Yi Zhu, Xuemei Zeng, Wei Wang
Previous works study the adversarial robustness of image classifiers on image level and use all the pixel information in an image indiscriminately, lacking of exploration of regions with different semantic meanings in the pixel space of an image.
1 code implementation • CVPR 2021 • Guangrui Li, Guoliang Kang, Yi Zhu, Yunchao Wei, Yi Yang
To better exploit the intrinsic structure of the target domain, we propose Domain Consensus Clustering (DCC), which exploits the domain consensus knowledge to discover discriminative clusters on both common samples and private ones.
Ranked #4 on Partial Domain Adaptation on Office-31
no code implementations • ICCV 2021 • Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe
We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.
Ranked #15 on Action Classification on Charades
1 code implementation • CVPR 2021 • Fengda Zhu, Xiwen Liang, Yi Zhu, Xiaojun Chang, Xiaodan Liang
In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description.
Ranked #5 on Visual Navigation on SOON Test
no code implementations • 11 Mar 2021 • Yuzhe Qin, Huaxiong Huang, Yi Zhu, Chun Liu, Shixin Xu
Numerical simulations first illustrate the consistency of theoretical results on the sharp interface limit.
Numerical Analysis Numerical Analysis 76Z99, 92B05, 76R50
1 code implementation • 19 Feb 2021 • Yi Zhu, Evgueni T. Filipov
Electro-thermally actuated origami provides a novel method for creating 3-D systems with advanced morphing and functional capabilities.
Robotics
no code implementations • 17 Feb 2021 • Haimo Guo, Meirong Zhang, Yi Zhu
Weyl points are degenerate points on the spectral bands at which energy bands intersect conically.
Mathematical Physics Mathematical Physics Spectral Theory
1 code implementation • ICCV 2021 • Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris Metaxas
Can we develop new normalization methods to improve generalization robustness under distribution shifts?
1 code implementation • EACL 2021 • Yi Zhu, Ehsan Shareghi, Yingzhen Li, Roi Reichart, Anna Korhonen
Semi-supervised learning through deep generative models and multi-lingual pretraining techniques have orchestrated tremendous success across different areas of NLP.
no code implementations • ICCV 2021 • Yi Zhu, Yue Weng, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Yutong Lu, Jianbin Jiao
Vision-Dialog Navigation (VDN) requires an agent to ask questions and navigate following the human responses to find target objects.
no code implementations • 1 Jan 2021 • Zhiqiang Tang, Yunhe Gao, Yi Zhu, Zhi Zhang, Mu Li, Dimitris N. Metaxas
CrossNorm exchanges styles between feature channels to perform style augmentation, diversifying the content and style mixtures.
no code implementations • ACL 2021 • Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vulić, Roi Reichart, Anna Korhonen, Hinrich Schütze
Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT.
no code implementations • 15 Dec 2020 • Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe
In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video.
no code implementations • 15 Dec 2020 • Yi Zhu
The global existence of solutions to incompressible viscoelastic flows has been a longstanding open problem, even for the global weak solution.
Analysis of PDEs 76A10, 76D03, 35B65
1 code implementation • 11 Dec 2020 • Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
Video action recognition is one of the representative tasks for video understanding.
1 code implementation • 8 Dec 2020 • Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam
Land-cover classification using remote sensing imagery is an important Earth observation task.
no code implementations • 1 Dec 2020 • Srikar Appalaraju, Yi Zhu, Yusheng Xie, István Fehérvári
Self-supervised representation learning has seen remarkable progress in the last few years.
no code implementations • 18 Aug 2020 • Li-Na Wang, Rui Tang, Yawei Yue, Xingshu Chen, Wei Wang, Yi Zhu, Xuemei Zeng
The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original inputs, raises concerns about the robustness of DNNs to such attacks.
1 code implementation • 25 Jun 2020 • Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu
Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning, to simplify the sentence.
no code implementations • 2 Jun 2020 • Wuyue Yang, Liangrong Peng, Yi Zhu, Liu Hong
The onset of hydrodynamic instabilities is of great importance in both industry and daily life, due to the dramatic mechanical and thermodynamic changes for different types of flow motions.
no code implementations • 1 Jun 2020 • Wuyue Yang, Liangrong Peng, Yi Zhu, Liu Hong
Due to the intrinsic complexity and nonlinearity of chemical reactions, direct applications of traditional machine learning algorithms may face with many difficulties.
no code implementations • 26 May 2020 • Yi Zhu, Yiwei Zhou, Menglin Xia
Finally, we demonstrate that adversarial training with SAGE augmented data can improve performance and robustness of TableQA systems.
no code implementations • 11 May 2020 • Pipi Hu, Wuyue Yang, Yi Zhu, Liu Hong
To derive the hidden dynamics from observed data is one of the fundamental but also challenging problems in many different fields.
no code implementations • 30 Apr 2020 • Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola
In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.
35 code implementations • 19 Apr 2020 • Hang Zhang, Chongruo wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola
It is well known that featuremap attention and multi-path representation are important for visual recognition.
Ranked #9 on Instance Segmentation on COCO test-dev (APM metric)
1 code implementation • ECCV 2020 • Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang
Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored.
1 code implementation • CVPR 2020 • Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang
Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step.
no code implementations • 23 Dec 2019 • Xueqing Deng, Yi Zhu, Yuxin Tian, Shawn Newsam
Inspired by this, we investigate methods to inform or guide deep learning models for geospatial image analysis to increase their performance when a limited amount of training data is available or when they are applied to scenarios other than which they were trained on.
no code implementations • CVPR 2020 • Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang
In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information.
Ranked #13 on Vision and Language Navigation on VLN Challenge
no code implementations • 4 Nov 2019 • Yi Zhu, Jing Dong
In this paper, we study a simple algorithm to construct asymptotically valid confidence regions for model parameters using the batch means method.
no code implementations • ICLR 2020 • YI Zhu, Jing Dong, Henry Lam
We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy.
no code implementations • CONLL 2019 • Yi Zhu, Benjamin Heinzerling, Ivan Vulić, Michael Strube, Roi Reichart, Anna Korhonen
Recent work has validated the importance of subword information for word representation learning.
1 code implementation • 4 Aug 2019 • Shuai Yang, Hao Wang, Yuhong Zhang, Pei-Pei Li, Yi Zhu, Xuegang Hu
Domain adaptation aims to exploit the knowledge in source domain to promote the learning tasks in target domain, which plays a critical role in real-world applications.
no code implementations • 24 Jul 2019 • Yi Zhu, Shawn Newsam
Motivated by our observation that motion information is the key to good anomaly detection performance in video, we propose a temporal augmented network to learn a motion-aware feature.
3 code implementations • 14 Jul 2019 • Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu
Lexical simplification (LS) aims to replace complex words in a given sentence with their simpler alternatives of equivalent meaning.
3 code implementations • 9 Jul 2019 • Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu
We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).
no code implementations • NAACL 2019 • Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart, Anna Korhonen
While neural dependency parsers provide state-of-the-art accuracy for several languages, they still rely on large amounts of costly labeled training data.
no code implementations • CVPR 2019 • Yi Zhu, Yanzhao Zhou, Huijuan Xu, Qixiang Ye, David Doermann, Jianbin Jiao
However, learning the full extent of pixel-level instance response in a weakly supervised manner remains unexplored.
Ranked #12 on Image-level Supervised Instance Segmentation on PASCAL VOC 2012 val (using extra training data)
1 code implementation • 25 May 2019 • Yi Zhu
In this dissertation, I present my work towards exploring temporal information for better video understanding.
1 code implementation • NAACL 2019 • Yi Zhu, Ivan Vulić, Anna Korhonen
The use of subword-level information (e. g., characters, character n-grams, morphemes) has become ubiquitous in modern word representation learning.
no code implementations • 19 Feb 2019 • Xueqing Deng, Yi Zhu, Shawn Newsam
This paper develops a deep-learning framework to synthesize a ground-level view of a location given an overhead image.
5 code implementations • CVPR 2019 • Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro
In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.
Ranked #2 on Semantic Segmentation on KITTI Semantic Segmentation (using extra training data)
no code implementations • 30 Oct 2018 • Yi Zhu, Shawn Newsam
However, this does not work well for multirate videos, in which actions or subactions occur at different speeds.
no code implementations • 30 Oct 2018 • Yi Zhu, Jia Xue, Shawn Newsam
Deep neural networks have led to a series of breakthroughs in computer vision given sufficient annotated training datasets.
no code implementations • 13 Jun 2018 • Xueqing Deng, Yi Zhu, Shawn Newsam
More significantly, we show the generated images are representative of the locations and that the representations learned by the cGANs are informative.
no code implementations • 7 May 2018 • Yi Zhu, Shawn Newsam
Despite the significant progress that has been made on estimating optical flow recently, most estimation methods, including classical and deep learning approaches, still have difficulty with multi-scale estimation, real-time computation, and/or occlusion reasoning.
1 code implementation • NAACL 2018 • Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith
Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD.
Ranked #2 on Dependency Parsing on Tweebank
1 code implementation • CVPR 2018 • Yanzhao Zhou, Yi Zhu, Qixiang Ye, Qiang Qiu, Jianbin Jiao
Motivated by this, we first design a process to stimulate peaks to emerge from a class response map.
Ranked #13 on Image-level Supervised Instance Segmentation on PASCAL VOC 2012 val (using extra training data)
General Classification Image-level Supervised Instance Segmentation +3
no code implementations • CVPR 2018 • Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, Ling Shao
Unseen Action Recognition (UAR) aims to recognise novel action categories without training examples.
Ranked #14 on Action Recognition on ActivityNet
no code implementations • 21 Feb 2018 • Xueqing Deng, Yi Zhu, Shawn Newsam
We also show that the spatial morphing kernel improves the results.
no code implementations • 7 Feb 2018 • Yi Zhu, Xueqing Deng, Shawn Newsam
We perform fine-grained land use mapping at the city scale using ground-level images.
1 code implementation • ICCV 2017 • Yi Zhu, Yanzhao Zhou, Qixiang Ye, Qiang Qiu, Jianbin Jiao
Weakly supervised object localization remains challenging, where only image labels instead of bounding boxes are available during training.
Ranked #2 on Weakly Supervised Object Detection on MS COCO
1 code implementation • 19 Jul 2017 • Yi Zhu, Shawn Newsam
Classical approaches for estimating optical flow have achieved rapid progress in the last decade.
no code implementations • 24 Jun 2017 • Yi Zhu, Sen Liu, Shawn Newsam
This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos.
no code implementations • 11 Apr 2017 • Yi Zhu, Shawn Newsam, Zaikun Xu
This notebook paper describes our system for the untrimmed classification task in the ActivityNet challenge 2016.
3 code implementations • 2 Apr 2017 • Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
State-of-the-art action recognition approaches rely on traditional optical flow estimation methods to pre-compute motion information for CNNs.
Ranked #22 on Action Recognition on UCF101
no code implementations • 8 Feb 2017 • Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann
We study the unsupervised learning of CNNs for optical flow estimation using proxy ground truth data.
no code implementations • 25 Jan 2017 • Zhenzhong Lan, Yi Zhu, Alexander G. Hauptmann
We investigate the problem of representing an entire video using CNN features for human action recognition.
no code implementations • 22 Dec 2016 • Yi Zhu, Shawn Newsam
We employ a multi-task learning framework that performs the three highly related steps of action proposal, action recognition, and action localization refinement in parallel instead of the standard sequential pipeline that performs the steps in order.
no code implementations • 21 Sep 2016 • Yi Zhu, Shawn Newsam
Land use mapping is a fundamental yet challenging task in geographic science.
no code implementations • 21 Sep 2016 • Yi Zhu, Shawn Newsam
We perform spatio-temporal analysis of public sentiment using geotagged photo collections.
no code implementations • 15 Aug 2016 • Yi Zhu, Shawn Newsam
This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves.