no code implementations • 9 Feb 2018 • Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin
As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer.
no code implementations • 2 Dec 2017 • Xiaohang Zhan, Ziwei Liu, Ping Luo, Xiaoou Tang, Chen Change Loy
The key of this new form of learning is to design a proxy task (e. g. image colorization), from which a discriminative loss can be formulated on unlabeled data.
no code implementations • 30 Apr 2017 • Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing He
Then, with a proposed tree-structured search method, the model is able to generate the most probable responses in the form of dependency trees, which are finally flattened into sequences as the system output.
no code implementations • 21 Sep 2016 • Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang
Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data.
no code implementations • 29 Jan 2017 • Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang
We propose a deep convolutional neural network (CNN) for face detection leveraging on facial attributes based supervision.
no code implementations • 23 Jun 2016 • Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang
Semantic segmentation tasks can be well modeled by Markov Random Field (MRF).
no code implementations • ICCV 2015 • Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang
This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts.
Ranked #89 on Semantic Segmentation on Cityscapes test
1 code implementation • ICCV 2015 • Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang
In this paper, we propose a novel deep convolutional network (DCN) that achieves outstanding performance on FDDB, PASCAL Face, and AFW.
no code implementations • ICCV 2015 • Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang
Social relation defines the association, e. g, warm, friendliness, and dominance, between two or more people.
no code implementations • 18 Aug 2014 • Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang
In this study, we show that landmark detection or face alignment task is not a single and independent problem.
Ranked #13 on Unsupervised Facial Landmark Detection on MAFL
no code implementations • CVPR 2015 • Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang
In this paper, we propose deformable deep convolutional neural networks for generic object detection.
no code implementations • 5 Jan 2015 • Yubin Deng, Ping Luo, Chen Change Loy, Xiaoou Tang
Learning to recognize pedestrian attributes at far distance is a challenging problem in visual surveillance since face and body close-shots are hardly available; instead, only far-view image frames of pedestrian are given.
no code implementations • CVPR 2014 • Wei Yang, Ping Luo, Liang Lin
This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations.
no code implementations • CVPR 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang
Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources.
Ranked #30 on Pedestrian Detection on Caltech
no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang
In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.
no code implementations • 26 Jun 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.
no code implementations • 14 Apr 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.
no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang
To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).
no code implementations • 15 Aug 2018 • Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang
Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.
no code implementations • 22 Aug 2018 • Ganbin Zhou, Rongyu Cao, Xiang Ao, Ping Luo, Fen Lin, Leyu Lin, Qing He
Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains.
1 code implementation • ICLR 2019 • Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng
Batch Normalization (BN) improves both convergence and generalization in training neural networks.
no code implementations • 19 Nov 2018 • Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang
Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.
no code implementations • 4 Dec 2018 • Yujun Shen, Bolei Zhou, Ping Luo, Xiaoou Tang
In the second stage, they compete in the image domain to render photo-realistic images that contain high diversity but preserve identity.
no code implementations • NeurIPS 2018 • Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin
In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches.
no code implementations • NeurIPS 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.
no code implementations • CVPR 2018 • Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang
Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.
no code implementations • ICML 2017 • Ping Luo
Whitened Neural Network (WNN) is a recent advanced deep architecture, which improves convergence and generalization of canonical neural networks by whitening their internal hidden representation.
no code implementations • CVPR 2014 • Ping Luo, Yonglong Tian, Xiaogang Wang, Xiaoou Tang
In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection.
no code implementations • CVPR 2016 • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang
To demonstrate the advantages of DeepFashion, we propose a new deep model, namely FashionNet, which learns clothing features by jointly predicting clothing attributes and landmarks.
no code implementations • CVPR 2017 • Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang
This work significantly increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset.
no code implementations • ICCV 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang
Third, each part detector in DeepParts is a strong detector that can detect pedestrian by observing only a part of a proposal.
no code implementations • ICCV 2017 • Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang
The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance.
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
no code implementations • 14 May 2019 • Ganbin Zhou, Ping Luo, Jingwu Chen, Fen Lin, Leyu Lin, Qing He
To enrich the generated responses, ARM introduces a large number of molecule-mechanisms as various responding styles, which are conducted by taking different combinations from a few atom-mechanisms.
no code implementations • 22 Jul 2019 • Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li
Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer?
no code implementations • ICCV 2019 • Jiangfan Han, Ping Luo, Xiaogang Wang
Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.
no code implementations • ICCV 2019 • Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang
Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.
no code implementations • ICCV 2019 • Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo
ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.
no code implementations • ICCV 2019 • Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang
To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.
Ranked #4 on Image Retrieval on DeepFashion - Consumer-to-shop (Rank-1 metric)
no code implementations • 31 Aug 2019 • Zhuoran Yu, Aojun Zhou, Yukun Ma, Yudian Li, Xiaohan Zhang, Ping Luo
Experiment results show that SCT improves accuracy of single Resnet-50 on ImageNet by 1. 7% and 11. 5% accuracy when testing on image sizes of 224 and 128 respectively.
no code implementations • 11 Sep 2019 • Hang Yu, Aishan Liu, Xianglong Liu, Gengchao Li, Ping Luo, Ran Cheng, Jichen Yang, Chongzhi Zhang
In other words, DNNs trained with PDA are able to obtain more robustness against both adversarial attacks as well as common corruptions than the recent state-of-the-art methods.
no code implementations • ICCV 2019 • Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang
Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.
no code implementations • 28 Nov 2019 • Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo
Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.
no code implementations • 30 Jan 2020 • Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei zhang
This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result.
no code implementations • CVPR 2020 • Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
no code implementations • 24 Apr 2020 • Zhongzhan Huang, Wenqi Shao, Xinjiang Wang, Liang Lin, Ping Luo
Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters.
no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.
Ranked #3 on Keypoint Detection on OCHuman
no code implementations • ECCV 2020 • Chaofan Tao, Qinhong Jiang, Lixin Duan, Ping Luo
Existing work addressed this challenge by either learning social spatial interactions represented by the positions of a group of pedestrians, while ignoring their temporal coherence (\textit{i. e.} dependencies between different long trajectories), or by understanding the complicated scene layout (\textit{e. g.} scene segmentation) to ensure safe navigation.
no code implementations • 27 Aug 2020 • Zhibo Zou, Jun-Jie Huang, Ping Luo
Based on simple and traditional methods, we propose a compensation tracker to further alleviate the lost tracking problem caused by missing detection.
no code implementations • 16 Sep 2020 • Yuanfeng Ji, Ruimao Zhang, Zhen Li, Jiamin Ren, Shaoting Zhang, Ping Luo
Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.
no code implementations • 1 Jan 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo
With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.
no code implementations • NeurIPS 2021 • Zhongzhan Huang, Xinjiang Wang, Ping Luo
Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), and various pruning criteria have been proposed to remove the redundant filters of CNNs.
no code implementations • 8 Mar 2021 • Jian Ding, Enze Xie, Hang Xu, Chenhan Jiang, Zhenguo Li, Ping Luo, Gui-Song Xia
Unsupervised pre-training aims at learning transferable features that are beneficial for downstream tasks.
no code implementations • 27 Mar 2021 • Shervin Minaee, Ping Luo, Zhe Lin, Kevin Bowyer
In this work, we provide a detailed overview of some of the most representative deep learning based face detection methods by grouping them into a few major categories, and present their core architectural designs and accuracies on popular benchmarks.
no code implementations • 13 May 2021 • Wenqi Shao, Hang Yu, Zhaoyang Zhang, Hang Xu, Zhenguo Li, Ping Luo
To address this problem, we develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which can stochastically discard unimportant channels by modeling the probability of a channel being activated.
no code implementations • 14 May 2021 • Rongyu Cao, Yixuan Cao, Ganbin Zhou, Ping Luo
In this paper, we study the problem of extracting variable-depth "logical document hierarchy" from long documents, namely organizing the recognized "physical document objects" into hierarchical structures.
no code implementations • ICCV 2021 • Muhammad Awais, Fengwei Zhou, Hang Xu, Lanqing Hong, Ping Luo, Sung-Ho Bae, Zhenguo Li
Extensive Unsupervised Domain Adaptation (UDA) studies have shown great success in practice by learning transferable representations across a labeled source domain and an unlabeled target domain with deep models.
no code implementations • ICCV 2021 • Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, Ping Luo
WOO takes a unified video backbone to simultaneously extract features for actor location and action classification.
no code implementations • ICLR 2022 • Shuo Yang, Peize Sun, Yi Jiang, Xiaobo Xia, Ruiheng Zhang, Zehuan Yuan, Changhu Wang, Ping Luo, Min Xu
A more realistic object detection paradigm, Open-World Object Detection, has arisen increasing research interests in the community recently.
no code implementations • 29 Sep 2021 • Qiushan Guo, Yizhou Yu, Ping Luo
Furthermore, the limited annotations in semi-supervised learning scale up the challenges: large variance of object sizes and class imbalance (i. e., the extreme ratio between background and object), hindering the performance of prior arts.
no code implementations • NeurIPS 2021 • Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
no code implementations • NeurIPS 2021 • Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Li, Chongjie Zhang, Jianye Hao
Model-based reinforcement learning aims to improve the sample efficiency of policy learning by modeling the dynamics of the environment.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • NeurIPS 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo
To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.
no code implementations • 25 Sep 2019 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.
no code implementations • 6 Dec 2021 • Yuying Ge, Ruimao Zhang, Ping Luo
This work proposes a novel framework named MetaCloth via meta-learning, which is able to learn unseen tasks of dense fashion landmark detection with only a few annotated samples.
no code implementations • 13 Jan 2022 • Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo
Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.
no code implementations • ICLR 2022 • Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang
PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.
no code implementations • 16 Mar 2022 • Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo
Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.
no code implementations • ACL 2022 • Chaofan Tao, Lu Hou, Wei zhang, Lifeng Shang, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}.
no code implementations • CVPR 2022 • Qiushan Guo, Yao Mu, Jianyu Chen, Tianqi Wang, Yizhou Yu, Ping Luo
Further, we overcome these challenges by introducing a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
no code implementations • 11 Apr 2022 • Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez
In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.
no code implementations • 13 Apr 2022 • Teng Wang, Zhu Liu, Feng Zheng, Zhichao Lu, Ran Cheng, Ping Luo
This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021.
no code implementations • 23 May 2022 • Xiaoyu Chen, Yao Mu, Ping Luo, Shengbo Li, Jianyu Chen
Furthermore, we show that the learned belief states can be plugged into downstream RL algorithms to improve performance.
no code implementations • 28 Sep 2022 • Ping Luo, Jieren Cheng, Zhenhao Liu, N. Xiong, Jie Wu
However, the clients' Non-Independent and Identically Distributed (Non-IID) data negatively affect the trained model, and clients with different numbers of local updates may cause significant gaps to the local gradients in each communication round.
no code implementations • 8 Oct 2022 • Zeyu Gao, Yao Mu, Ruoyan Shen, Chen Chen, Yangang Ren, Jianyu Chen, Shengbo Eben Li, Ping Luo, YanFeng Lu
End-to-end autonomous driving provides a feasible way to automatically maximize overall driving system performance by directly mapping the raw pixels from a front-facing camera to control signals.
no code implementations • 23 Nov 2022 • Junjie Wang, Yao Mu, Dong Li, Qichao Zhang, Dongbin Zhao, Yuzheng Zhuang, Ping Luo, Bin Wang, Jianye Hao
The latent world model provides a promising way to learn policies in a compact latent space for tasks with high-dimensional observations, however, its generalization across diverse environments with unseen dynamics remains challenging.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • CVPR 2023 • Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang
When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations.
no code implementations • 30 Mar 2023 • Chongjian Ge, Jiangliu Wang, Zhan Tong, Shoufa Chen, Yibing Song, Ping Luo
We evaluate our soft neighbor contrastive learning method (SNCLR) on standard visual recognition benchmarks, including image classification, object detection, and instance segmentation.
no code implementations • 3 Apr 2023 • Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge, Junsong Chen, Zhenguo Li, Ping Luo
In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms.
no code implementations • 7 Apr 2023 • Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program.
no code implementations • 19 Apr 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan
We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.
no code implementations • CVPR 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
no code implementations • CVPR 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan
We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.
no code implementations • 23 May 2023 • Yuanfeng Ji, Yatao Bian, Guoji Fu, Peilin Zhao, Ping Luo
Firstly, SyNDock formulates multimeric protein docking as a problem of learning global transformations to holistically depict the placement of chain units of a complex, enabling a learning-centric solution.
no code implementations • NeurIPS 2023 • Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo
Text-to-image generation has recently witnessed remarkable achievements.
Ranked #11 on Text-to-Image Generation on MS COCO
no code implementations • 20 Jun 2023 • Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo
Then, we propose the audio adapter to adapt audio representation into an audio token enriched with specific semantics, which can be injected into a frozen T2I model flexibly.
no code implementations • 26 Jun 2023 • Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, Ping Luo
To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.
no code implementations • ICCV 2023 • Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
Open-world instance segmentation is a rising task, which aims to segment all objects in the image by learning from a limited number of base-category objects.
no code implementations • ICCV 2023 • Yangyang Xu, Shengfeng He, Kwan-Yee K. Wong, Ping Luo
In this paper, we propose a unified recurrent framework, named \textbf{R}ecurrent v\textbf{I}deo \textbf{G}AN \textbf{I}nversion and e\textbf{D}iting (RIGID), to explicitly and simultaneously enforce temporally coherent GAN inversion and facial editing of real videos.
no code implementations • 28 Aug 2023 • Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.
no code implementations • 4 Sep 2023 • Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo
StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods.
no code implementations • ICCV 2023 • Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo
In this work, we end the current fragmented situation and propose UniRef to unify the three reference-based object segmentation tasks with a single architecture.
no code implementations • ICCV 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.
no code implementations • 4 Oct 2023 • Hao Sha, Yao Mu, YuXuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding
Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability.
no code implementations • 8 Oct 2023 • Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang
Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches.
no code implementations • 8 Oct 2023 • Chaoxu Pang, Yixuan Cao, Qiang Ding, Ping Luo
In this paper, we propose a Guideline Learning (GL) framework for In-context IE which reflectively learns and follows guidelines.
no code implementations • 12 Oct 2023 • Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo
This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations.
no code implementations • 12 Oct 2023 • Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo
Active learning presents a promising avenue for training high-performance models with minimal labeled data, achieved by judiciously selecting the most informative instances to label and incorporating them into the task learner.
no code implementations • 22 Nov 2023 • Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo
In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes.
no code implementations • 24 Nov 2023 • Yuanfeng Ji, Chongjian Ge, Weikai Kong, Enze Xie, Zhengying Liu, Zhengguo Li, Ping Luo
In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient aligners, measuring the alignment between VLMs and human intelligence and value through automatic data curation and assessment.
no code implementations • NeurIPS 2023 • Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo
In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.
no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
In this study, we explore Transformer-based diffusion models for image and video generation.
no code implementations • 9 Dec 2023 • Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu, Chen Qian, Ping Luo
Human-centric perception (e. g. pedetrian detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision.
no code implementations • 22 Feb 2024 • Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo
To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language.
no code implementations • 23 Feb 2024 • Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu
Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process.
no code implementations • 25 Feb 2024 • Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI.
Ranked #76 on Visual Question Answering on MM-Vet
no code implementations • 4 Mar 2024 • Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu
Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.
no code implementations • 4 Mar 2024 • Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo
We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.
no code implementations • 7 Mar 2024 • Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li
In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.
no code implementations • 11 Mar 2024 • Shaojie Dai, Xin Liu, Ping Luo, Yue Yu
Large language model (LLM) has achieved promising performance in multilingual machine translation tasks through zero/few-shot prompts or prompt-tuning.
no code implementations • 14 Mar 2024 • Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang
To bridge this gap, we introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions (AVIs), including four types of image-based AVIs, ten types of text-based AVIs, and nine types of content bias AVIs (such as gender, violence, cultural, and racial biases, among others).
no code implementations • 25 Mar 2024 • Ping Luo, Xiaoge Deng, Ziqing Wen, Tao Sun, Dongsheng Li
Federated Learning (FL) is a distributed machine learning framework in communication network systems.
no code implementations • 25 Mar 2024 • Tianqi Wang, Enze Xie, Ruihang Chu, Zhenguo Li, Ping Luo
We utilize the challenging driving scenarios from the CARLA leaderboard 2. 0, which involve high-speed driving and lane-changing, and propose a rule-based expert policy to control the vehicle and generate ground truth labels for its reasoning process across different driving aspects and the final decisions.
no code implementations • 25 Mar 2024 • Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo
This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.
no code implementations • 10 Apr 2024 • Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo
We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.
no code implementations • 24 Apr 2024 • Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation.
no code implementations • 30 Apr 2024 • Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo
In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework.
1 code implementation • CVPR 2016 • Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang
Face detection is one of the most studied topics in the computer vision community.
Ranked #34 on Face Detection on WIDER Face (Medium)
1 code implementation • 16 Mar 2024 • Ke Lin, Yiyang Luo, Zijian Zhang, Ping Luo
Generative linguistic steganography attempts to hide secret messages into covertext.
1 code implementation • 27 Jan 2023 • Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang
The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.
1 code implementation • 3 Jul 2022 • Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo
Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information.
1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo
Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
1 code implementation • 4 Apr 2023 • Qiushan Guo, Yizhou Yu, Yi Jiang, Jiannan Wu, Zehuan Yuan, Ping Luo
We extend our pretext task to supervised pre-training, which achieves a similar performance to self-supervised learning.
1 code implementation • ICLR 2018 • Runyao Chen, Kun Wu, Ping Luo
Mini-batch gradient descent and its variants are commonly used in deep learning.
3 code implementations • CVPR 2015 • Linjie Yang, Ping Luo, Chen Change Loy, Xiaoou Tang
Updated on 24/09/2015: This update provides preliminary experiment results for fine-grained classification on the surveillance data of CompCars.
Ranked #5 on Fine-Grained Image Classification on CompCars
4 code implementations • ECCV 2020 • Jingkang Yang, Litong Feng, Weirong Chen, Xiaopeng Yan, Huabin Zheng, Ping Luo, Wayne Zhang
Therefore, a simple yet effective WSL framework is proposed.
Ranked #7 on Image Classification on WebVision-1000
2 code implementations • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo
Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.
1 code implementation • 18 Sep 2021 • Jiannan Wu, Peize Sun, Shoufa Chen, Jiewen Yang, Zihao Qi, Lan Ma, Ping Luo
Towards high-quality temporal action detection, we introduce Sparse Proposals to interact with the hierarchical features.
1 code implementation • 21 Feb 2022 • Zhecheng Yuan, Guozheng Ma, Yao Mu, Bo Xia, Bo Yuan, Xueqian Wang, Ping Luo, Huazhe Xu
One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments.
1 code implementation • 20 Dec 2023 • Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo
This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens.
1 code implementation • 18 Dec 2023 • Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo
Experiments on multi-task robotic manipulation benchmarks like Meta-World and LOReL demonstrate state-of-the-art performance and human-interpretable skill representations from SkillDiffuser.
1 code implementation • 31 Mar 2024 • Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji
Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.
1 code implementation • CVPR 2020 • Zhouxia Wang, Jiawei Zhang, Mude Lin, Jiong Wang, Ping Luo, Jimmy Ren
Automatically selecting exposure bracketing (images exposed differently) is important to obtain a high dynamic range image by using multi-exposure fusion.
1 code implementation • 17 Jun 2022 • Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo
Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors, or crawled from the internet followed by elaborate data cleaning techniques.
1 code implementation • CVPR 2023 • Zhaoyang Zhang, Yitong Jiang, Wenqi Shao, Xiaogang Wang, Ping Luo, Kaimo Lin, Jinwei Gu
Controllable image denoising aims to generate clean samples with human perceptual priors and balance sharpness and smoothness.
1 code implementation • ICML 2020 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.
1 code implementation • 7 Jul 2022 • Wenqi Shao, Xun Zhao, Yixiao Ge, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo
It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive.
Ranked #2 on Transferability on classification benchmark
1 code implementation • 9 Oct 2022 • Yao Mu, Yuzheng Zhuang, Fei Ni, Bin Wang, Jianyu Chen, Jianye Hao, Ping Luo
This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error.
1 code implementation • 16 Jun 2022 • Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo
Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods.
1 code implementation • 8 Jun 2022 • Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, Ping Luo
In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner.
1 code implementation • 11 Mar 2023 • Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo
Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.
1 code implementation • 17 Jun 2022 • Yao Mu, Shoufa Chen, Mingyu Ding, Jianyu Chen, Runjian Chen, Ping Luo
In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size.
1 code implementation • 16 Aug 2022 • Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu
Human pose estimation aims to accurately estimate a wide variety of human poses.
1 code implementation • 15 Feb 2021 • Chaofan Tao, Rui Lin, Quan Chen, Zhaoyang Zhang, Ping Luo, Ngai Wong
Prior arts often discretize the network weights by carefully tuning hyper-parameters of quantization (e. g. non-uniform stepsize and layer-wise bitwidths), which are complicated and sub-optimal because the full-precision and low-precision models have a large discrepancy.
1 code implementation • ICLR 2022 • Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN.
1 code implementation • NeurIPS 2023 • Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo
This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visual question answering, and text question answering.
1 code implementation • 30 Nov 2023 • Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You
Visual-language pre-training has achieved remarkable success in many multi-modal tasks, largely attributed to the availability of large-scale image-text datasets.
1 code implementation • CVPR 2021 • Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo
However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.
1 code implementation • 26 Sep 2022 • Chuofan Ma, Qiushan Guo, Yi Jiang, Zehuan Yuan, Ping Luo, Xiaojuan Qi
Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale.
1 code implementation • CVPR 2023 • Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, XiaoHu Qie, Ping Luo
FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted.
1 code implementation • ICCV 2023 • Qiushan Guo, Chuofan Ma, Yi Jiang, Zehuan Yuan, Yizhou Yu, Ping Luo
Learning image classification and image generation using the same set of network parameters is a challenging problem.
1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan
When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.
1 code implementation • 27 Apr 2023 • Chengyue Wu, Teng Wang, Yixiao Ge, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo
Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks.
1 code implementation • ICCV 2023 • Yutao Hu, Qixiong Wang, Wenqi Shao, Enze Xie, Zhenguo Li, Jungong Han, Ping Luo
In this paper, we address this issue from two perspectives.
1 code implementation • 21 Apr 2020 • Wenjie Li, Zhaoyang Zhang, Xinjiang Wang, Ping Luo
Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's fast convergence would possibly lead the algorithm to local minimums.
1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo
For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.
1 code implementation • ICCV 2021 • Wei Shang, Dongwei Ren, Dongqing Zou, Jimmy S. Ren, Ping Luo, WangMeng Zuo
EFM can also be easily incorporated into existing deblurring networks, making event-driven deblurring task benefit from state-of-the-art deblurring methods.
2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.
1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo
(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.
1 code implementation • 3 Feb 2023 • Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, Ping Luo
For example, AdaptDiffuser not only outperforms the previous art Diffuser by 20. 8% on Maze2D and 7. 5% on MuJoCo locomotion, but also adapts better to new tasks, e. g., KUKA pick-and-place, by 27. 9% without requiring additional expert data.
1 code implementation • CVPR 2023 • Ziyun Zeng, Yuying Ge, Xihui Liu, Bin Chen, Ping Luo, Shu-Tao Xia, Yixiao Ge
Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years.
1 code implementation • 8 Jul 2021 • Lingyun Wu, Zhiqiang Hu, Yuanfeng Ji, Ping Luo, Shaoting Zhang
For example, STFT improves the still image baseline FCOS by 10. 6% and 20. 6% on the comprehensive F1-score of the polyp localization task in CVC-Clinic and ASUMayo datasets, respectively, and outperforms the state-of-the-art video-based method by 3. 6% and 8. 0%, respectively.
2 code implementations • 14 Sep 2020 • Hao Tan, Ran Cheng, Shihua Huang, Cheng He, Changxiao Qiu, Fan Yang, Ping Luo
Despite the remarkable successes of Convolutional Neural Networks (CNNs) in computer vision, it is time-consuming and error-prone to manually design a CNN.
1 code implementation • ICCV 2021 • Zhaoyang Zhang, Yitong Jiang, Jun Jiang, Xiaogang Wang, Ping Luo, Jinwei Gu
STAR is a general architecture that can be easily adapted to different image enhancement tasks.
1 code implementation • 20 Oct 2022 • Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu, Ping Luo
To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts.
1 code implementation • 19 Mar 2023 • Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Jirui Yuan, Ping Luo, Zaiqing Nie
Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities.
1 code implementation • 4 Jan 2024 • Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo
Charts play a vital role in data visualization, understanding data patterns, and informed decision-making.
1 code implementation • NeurIPS 2023 • Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Ping Luo, Zaiqing Nie
To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework.
2 code implementations • ICCV 2015 • Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang
LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction.
Ranked #6 on Facial Attribute Classification on LFWA
1 code implementation • 30 Aug 2023 • Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine de Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen, Heinrich Mächler, Jan Stefan Kirschke, Ezequiel de la Rosa, Patrick Ferdinand Christ, Hongwei Bran Li, David G. Ellis, Michele R. Aizenberg, Sergios Gatidis, Thomas Küstner, Nadya Shusharina, Nicholas Heller, Vincent Andrearczyk, Adrien Depeursinge, Mathieu Hatt, Anjany Sekuboyina, Maximilian Löffler, Hans Liebl, Reuben Dorent, Tom Vercauteren, Jonathan Shapey, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Achraf Ben-Hamadou, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Federico Bolelli, Costantino Grana, Luca Lumetti, Hamidreza Salehi, Jun Ma, Yao Zhang, Ramtin Gharleghi, Susann Beier, Arcot Sowmya, Eduardo A. Garza-Villarreal, Thania Balducci, Diego Angeles-Valdez, Roberto Souza, Leticia Rittner, Richard Frayne, Yuanfeng Ji, Vincenzo Ferrari, Soumick Chatterjee, Florian Dubost, Stefanie Schreiber, Hendrik Mattern, Oliver Speck, Daniel Haehn, Christoph John, Andreas Nürnberger, João Pedrosa, Carlos Ferreira, Guilherme Aresta, António Cunha, Aurélio Campilho, Yannick Suter, Jose Garcia, Alain Lalande, Vicky Vandenbossche, Aline Van Oevelen, Kate Duquesne, Hamza Mekhzoum, Jef Vandemeulebroucke, Emmanuel Audenaert, Claudia Krebs, Timo Van Leeuwen, Evie Vereecke, Hauke Heidemeyer, Rainer Röhrig, Frank Hölzle, Vahid Badeli, Kathrin Krieger, Matthias Gunzer, Jianxu Chen, Timo van Meegdenburg, Amin Dada, Miriam Balzer, Jana Fragemann, Frederic Jonske, Moritz Rempe, Stanislav Malorodov, Fin H. Bahnsen, Constantin Seibold, Alexander Jaus, Zdravko Marinov, Paul F. Jaeger, Rainer Stiefelhagen, Ana Sofia Santos, Mariana Lindo, André Ferreira, Victor Alves, Michael Kamp, Amr Abourayya, Felix Nensa, Fabian Hörst, Alexander Brehmer, Lukas Heine, Yannik Hanusrichter, Martin Weßling, Marcel Dudda, Lars E. Podleska, Matthias A. Fink, Julius Keyl, Konstantinos Tserpes, Moon-Sung Kim, Shireen Elhabian, Hans Lamecker, Dženan Zukić, Beatriz Paniagua, Christian Wachinger, Martin Urschler, Luc Duong, Jakob Wasserthal, Peter F. Hoyer, Oliver Basu, Thomas Maal, Max J. H. Witjes, Gregor Schiele, Ti-chiun Chang, Seyed-Ahmad Ahmadi, Ping Luo, Bjoern Menze, Mauricio Reyes, Thomas M. Deserno, Christos Davatzikos, Behrus Puladi, Pascal Fua, Alan L. Yuille, Jens Kleesiek, Jan Egger
For the medical domain, we present a large collection of anatomical shapes (e. g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems.
1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo
Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.
2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo
Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
1 code implementation • ICCV 2023 • Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo
Token compression aims to speed up large-scale vision transformers (e. g. ViTs) by pruning (dropping) or merging tokens.
Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)
2 code implementations • 7 Aug 2017 • Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang
This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test.
1 code implementation • 19 Apr 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo
These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.
1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.
1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
Ranked #4 on Semantic Segmentation on Trans10K
1 code implementation • 22 Jul 2022 • Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo
Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo
Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.
1 code implementation • CVPR 2021 • Chongjian Ge, Yibing Song, Yuying Ge, Han Yang, Wei Liu, Ping Luo
To this end, DCTON can be naturally trained in a self-supervised manner following cycle consistency learning.
1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo
The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.
1 code implementation • 28 Jun 2021 • Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo
The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.
1 code implementation • CVPR 2017 • Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models.
Ranked #22 on Semantic Segmentation on PASCAL VOC 2012 test
1 code implementation • 14 Aug 2023 • Zhouxia Wang, Jiawei Zhang, Tianshui Chen, Wenping Wang, Ping Luo
In this work, we propose RestoreFormer++, which on the one hand introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors, and on the other hand, explores an extending degrading model to help generate more realistic degraded face images to alleviate the synthetic-to-real-world gap.
1 code implementation • CVPR 2023 • Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, Juan Song, Jirui Yuan, Ping Luo, Zaiqing Nie
Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving.
1 code implementation • 11 Oct 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo
Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.
1 code implementation • NeurIPS 2021 • Chongjian Ge, Youwei Liang, Yibing Song, Jianbo Jiao, Jue Wang, Ping Luo
Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL.
1 code implementation • 26 Apr 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo
Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.
Ranked #7 on Zero-Shot Video Retrieval on MSVD
1 code implementation • 19 Mar 2020 • An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo
Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.
1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo
Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.
1 code implementation • ICCV 2019 • Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo
Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods.
Ranked #6 on Robust Object Detection on DWD
1 code implementation • 24 Jan 2022 • Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Long-Kai Huang, Tingyang Xu, Yu Rong, Lanqing Li, Jie Ren, Ding Xue, Houtim Lai, Shaoyong Xu, Jing Feng, Wei Liu, Ping Luo, Shuigeng Zhou, Junzhou Huang, Peilin Zhao, Yatao Bian
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient.
1 code implementation • ICCV 2023 • Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
Ranked #2 on Monocular Depth Estimation on SUN-RGBD
1 code implementation • 22 May 2023 • Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding
We also propose a unified spatial-temporal mask modeling mechanism, seamlessly integrated with the model, to cater to diverse video generation scenarios.
3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang
This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).
Ranked #1 on 3D Human Reconstruction on Surreal
1 code implementation • 27 Nov 2022 • Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai
In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.
1 code implementation • 22 Mar 2021 • Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo
(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.
1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Vision transformers have achieved great successes in many computer vision tasks.
Ranked #4 on 2D Human Pose Estimation on COCO-WholeBody
1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
Ranked #4 on 2D Pose Estimation on MP-100
2 code implementations • ICCV 2021 • Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo
Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.
Ranked #5 on Dense Video Captioning on YouCook2