no code implementations • 9 Dec 2024 • Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang, He Wang
A practical navigation agent must be capable of handling a wide range of interaction demands, such as following instructions, searching objects, answering questions, tracking people, and more.
no code implementations • 5 Dec 2024 • Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wang
The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring.
no code implementations • 20 May 2024 • Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng
Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement.
no code implementations • CVPR 2024 • Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace.
no code implementations • 19 Mar 2024 • Zhipeng Huang, Zhizheng Zhang, Yiting Lu, Zheng-Jun Zha, Zhibo Chen, Baining Guo
In this paper, we explore this question and provide the answer "Yes!".
no code implementations • 19 Mar 2024 • Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo
The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved.
1 code implementation • CVPR 2024 • Bingchen Li, Xin Li, Hanxin Zhu, Yeying Jin, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen
In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner.
no code implementations • 24 Feb 2024 • Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang
Vision-and-language navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.
no code implementations • 7 Oct 2023 • Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu
In this work, we build a multimodal model to ground natural language instructions in given UI screenshots as a generic UI task automation executor.
2 code implementations • ICCV 2023 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo
With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer.
no code implementations • 15 Jun 2023 • Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng
In the comparison of SGDM and SGD with the same effective learning rate and the same batch size, we observe a consistent pattern: when $\eta_{ef}$ is small, SGDM and SGD experience almost the same empirical training losses; when $\eta_{ef}$ surpasses a certain threshold, SGDM begins to perform better than SGD.
no code implementations • 2 Jun 2023 • Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu
In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e. g., the protection of users' privacy).
1 code implementation • 11 Apr 2023 • Ganlin Yang, Guoqiang Wei, Zhizheng Zhang, Yan Lu, Dong Liu
Most Neural Radiance Fields (NeRFs) exhibit limited generalization capabilities, which restrict their applicability in representing multiple scenes using a single model.
no code implementations • CVPR 2023 • Mude Hui, Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yuwang Wang, Yan Lu
Since different attributes have their individual semantics and characteristics, we propose to decouple the diffusion processes for them to improve the diversity of training samples and learn the reverse process jointly to exploit global-scope contexts for facilitating generation.
1 code implementation • 21 Jan 2023 • Zongyu Guo, Cuiling Lan, Zhizheng Zhang, Yan Lu, Zhibo Chen
In this paper, we propose an efficient NP framework dubbed Versatile Neural Processes (VNP), which largely increases the capability of approximating functions.
no code implementations • ICCV 2023 • Hewei Guo, Liping Ren, Jingjing Fu, Yuwang Wang, Zhizheng Zhang, Cuiling Lan, Haoqian Wang, Xinwen Hou
Targeting for detecting anomalies of various sizes for complicated normal patterns, we propose a Template-guided Hierarchical Feature Restoration method, which introduces two key techniques, bottleneck compression and template-guided compensation, for anomaly-free feature restoration.
Ranked #17 on Anomaly Detection on MVTec LOCO AD
no code implementations • 5 Jul 2022 • Ruoyu Feng, Xin Jin, Zongyu Guo, Runsen Feng, Yixin Gao, Tianyu He, Zhizheng Zhang, Simeng Sun, Zhibo Chen
Learning a kind of feature that is both general (for AI tasks) and compact (for compression) is pivotal for its success.
no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen
Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.
2 code implementations • 11 Mar 2022 • Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen
In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextual information distributed across different channels from other tokens into the given query token.
Ranked #64 on Object Detection on COCO minival
1 code implementation • 28 Jan 2022 • Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen
For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance.
1 code implementation • 26 Dec 2021 • Zongyu Guo, Runsen Feng, Zhizheng Zhang, Xin Jin, Zhibo Chen
Neural video codecs have demonstrated great potential in video transmission and storage applications.
no code implementations • CVPR 2022 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha
In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.
Domain Adaptive Person Re-Identification Knowledge Distillation +4
no code implementations • 6 Dec 2021 • Shiqi Lin, Zhizheng Zhang, Xin Li, Wenjun Zeng, Zhibo Chen
Data augmentation (DA) has been widely investigated to facilitate model optimization in many tasks.
no code implementations • 30 Nov 2021 • Xiaotian Han, Quanzeng You, Chunyu Wang, Zhizheng Zhang, Peng Chu, Houdong Hu, Jiang Wang, Zicheng Liu
This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.
Ranked #2 on Object Tracking on MMPTRACK
no code implementations • 26 Nov 2021 • Xin Li, Zhizheng Zhang, Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Xin Jin, Zhibo Chen
In this paper, we propose a novel Confounder Identification-free Causal Visual Feature Learning (CICF) method, which obviates the need for identifying confounders.
no code implementations • NeurIPS 2021 • Runsen Feng, Zongyu Guo, Zhizheng Zhang, Zhibo Chen
We show that the flow prediction module can largely reduce the transmission cost of voxel flows.
no code implementations • 31 Jul 2021 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, Zheng-Jun Zha
Occluded person re-identification (ReID) aims to match person images with occlusion.
1 code implementation • 30 Jun 2021 • Zhizheng Zhang, Wen Song, Qiqiang Li
While deep learning has achieved great success in RUL prediction, existing methods have difficulties in processing long sequences and extracting information from the sensor and time step aspects.
1 code implementation • NeurIPS 2021 • Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zhibo Chen
Unsupervised domain adaptive classifcation intends to improve the classifcation performance on unlabeled target domain.
2 code implementations • NeurIPS 2021 • Tao Yu, Cuiling Lan, Wenjun Zeng, Mingxiao Feng, Zhizheng Zhang, Zhibo Chen
In this work, we propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Continuous Control (100k environment steps) Continuous Control (500k environment steps) +4
no code implementations • 12 Apr 2021 • Zongyu Guo, Zhizheng Zhang, Runsen Feng, Zhibo Chen
Quantization is one of the core components in lossy image compression.
no code implementations • 25 Mar 2021 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen
Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.
no code implementations • 17 Dec 2020 • Yaojun Wu, Xin Li, Zhizheng Zhang, Xin Jin, Zhibo Chen
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications.
1 code implementation • 16 Dec 2020 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zheng-Jun Zha
Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss.
no code implementations • 11 Dec 2020 • Xin Li, Xin Jin, Tao Yu, Yingxue Pang, Simeng Sun, Zhizheng Zhang, Zhibo Chen
Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i. e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations.
no code implementations • 19 Nov 2020 • Zongyu Guo, Zhizheng Zhang, Runsen Feng, Zhibo Chen
In this paper, we propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space.
no code implementations • 9 Oct 2020 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen, Shih-Fu Chang
In this work, we propose Uncertainty-Aware Few-Shot framework for image classification by modeling uncertainty of the similarities of query-support pairs and performing uncertainty-aware optimization.
no code implementations • 8 Jun 2020 • Zhizheng Zhang, Cuiling Lan, Wen-Jun Zeng, Zhibo Chen, Shih-Fu Chang
There is a lack of loss design which enables the joint optimization of multiple instances (of multiple classes) within per-query optimization for person ReID.
no code implementations • 16 May 2020 • Xin Li, Simeng Sun, Zhizheng Zhang, Zhibo Chen
Versatile Video Coding (H. 266/VVC) standard achieves better image quality when keeping the same bits than any other conventional image codec, such as BPG, JPEG, and etc.
no code implementations • CVPR 2020 • Zhizheng Zhang, Cuiling Lan, Wen-Jun Zeng, Zhibo Chen
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation.
1 code implementation • 23 Nov 2019 • Tao Yu, Zongyu Guo, Xin Jin, Shilin Wu, Zhibo Chen, Weiping Li, Zhizheng Zhang, Sen Liu
In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation.
no code implementations • 18 Sep 2019 • Bin Wang, Jun Shen, Shutao Zhang, Zhizheng Zhang
Firstly, we present the notions of p-strong and w-strong equivalences between LPMLN programs.
no code implementations • 9 Sep 2019 • Bin Wang, Jun Shen, Shutao Zhang, Zhizheng Zhang
In this paper, we study the strong equivalence for LPMLN programs, which is an important tool for program rewriting and theoretical investigations in the field of logic programming.
Logic in Computer Science D.1.6
no code implementations • 8 Jun 2019 • Chaowei Shan, Zhizheng Zhang, Zhibo Chen
For current learned methods in this field, global harmonious perception and local details are hard to be well-considered in a single model simultaneously.
no code implementations • 17 Apr 2019 • Xin Jin, Cuiling Lan, Wen-Jun Zeng, Zhizheng Zhang, Zhibo Chen
We achieve this by the context interaction among the features of different scales.
1 code implementation • CVPR 2020 • Zhizheng Zhang, Cuiling Lan, Wen-Jun Zeng, Xin Jin, Zhibo Chen
For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i. e., discriminative feature learning.
1 code implementation • 3 Mar 2019 • Zhizheng Zhang, Jiale Chen, Zhibo Chen, Weiping Li
Not limited to the control tasks in computationally complex environments, AE-DDPG also achieves higher rewards and 2- to 4-fold improvement in sample efficiency on average compared to other variants of DDPG in MuJoCo environments.
no code implementations • CVPR 2019 • Zhizheng Zhang, Cuiling Lan, Wen-Jun Zeng, Zhibo Chen
We propose a densely semantically aligned person re-identification framework.
2 code implementations • 2 Apr 2018 • Łukasz Kidziński, Sharada Prasanna Mohanty, Carmichael Ong, Zhewei Huang, Shuchang Zhou, Anton Pechenko, Adam Stelmaszczyk, Piotr Jarosik, Mikhail Pavlov, Sergey Kolesnikov, Sergey Plis, Zhibo Chen, Zhizheng Zhang, Jiale Chen, Jun Shi, Zhuobin Zheng, Chun Yuan, Zhihui Lin, Henryk Michalewski, Piotr Miłoś, Błażej Osiński, Andrew Melnik, Malte Schilling, Helge Ritter, Sean Carroll, Jennifer Hicks, Sergey Levine, Marcel Salathé, Scott Delp
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course.
no code implementations • 14 May 2014 • Zhizheng Zhang, Kaikai Zhao
(To appear in Theory and Practice of Logic Programming (TPLP)) ESmodels is designed and implemented as an experiment platform to investigate the semantics, language, related reasoning algorithms, and possible applications of epistemic specifications. We first give the epistemic specification language of ESmodels and its semantics.