Search Results for author: Lingxi Xie

Found 139 papers, 64 papers with code

SIMILE: Introducing Sequential Information towards More Effective Imitation Learning

no code implementations ICLR 2019 Yutong Bai, Lingxi Xie

Reinforcement learning (RL) is a metaheuristic aiming at teaching an agent to interact with an environment and maximizing the reward in a complex task.

Imitation Learning OpenAI Gym +3

Text-Animator: Controllable Visual Text Video Generation

no code implementations25 Jun 2024 Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian

Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising.

Text Generation Video Generation

ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

1 code implementation17 Jun 2024 Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in training MLLMs to communicate between language and vision.

Decoder Visual Reasoning

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

1 code implementation15 Feb 2024 Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined.

Neural Rendering Object

ChatterBox: Multi-round Multimodal Referring and Grounding

1 code implementation24 Jan 2024 Yunjie Tian, Tianren Ma, Lingxi Xie, Jihao Qiu, Xi Tang, Yuan Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues.

Language Modelling Visual Grounding

VMamba: Visual State Space Model

6 code implementations18 Jan 2024 Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, YaoWei Wang, Qixiang Ye, Yunfan Liu

Designing computationally efficient network architectures persists as an ongoing necessity in computer vision.

Computational Efficiency Language Modelling +1

Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

no code implementations6 Jan 2024 Xin He, Longhui Wei, Lingxi Xie, Qi Tian

Multimodal Large Language Models (MLLMs) are experiencing rapid growth, yielding a plethora of noteworthy contributions in recent months.

Instruction Following

Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

no code implementations7 Dec 2023 Yabo Chen, Jiemin Fang, YuYang Huang, Taoran Yi, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

We propose a cascade generation framework constructed with two Zero-1-to-3 models, named Cascade-Zero123, to tackle this issue, which progressively extracts 3D information from the source image.

Transparent objects

Segment Any 3D Gaussians

no code implementations1 Dec 2023 Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

This is achieved by attaching an scale-gated affinity feature to each 3D Gaussian to endow it a new property towards multi-granularity segmentation.

Interactive Segmentation Scene Understanding +1

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

no code implementations CVPR 2024 Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen

Parameter-efficient fine-tuning (PEFT) is an effective methodology to unleash the potential of large foundation models in novel scenarios with limited training data.

Image Classification Image Segmentation +2

GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

no code implementations CVPR 2024 Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian

Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians.

3D scene Editing

One-bit Supervision for Image Classification: Problem, Solution, and Beyond

no code implementations26 Nov 2023 Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian

An intriguing property of the setting is that the burden of annotation largely alleviates in comparison to offering the accurate label.

Active Learning Image Classification +2

Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models

no code implementations14 Jun 2023 Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Kaifeng Bi, Xiaotao Gu, Jianlong Chang, Qi Tian

In this paper, we start with a conceptual definition of AGI and briefly review how NLP solves a wide range of tasks via a chat system.

Visual Tuning

no code implementations10 May 2023 Bruce X. B. Yu, Jianlong Chang, Haixin Wang, Lingbo Liu, Shijie Wang, Zhiyu Wang, Junfan Lin, Lingxi Xie, Haojie Li, Zhouchen Lin, Qi Tian, Chang Wen Chen

With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer.

Segment Anything in 3D with Radiance Fields

1 code implementation NeurIPS 2023 Jiazhong Cen, Jiemin Fang, Zanwei Zhou, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

The Segment Anything Model (SAM) emerges as a powerful vision foundation model to generate high-quality 2D segmentation results.

Inverse Rendering Segmentation

Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism

no code implementations22 Apr 2023 Xin Chen, Hengheng Zhang, Xiaotao Gu, Kaifeng Bi, Lingxi Xie, Qi Tian

The Mixture of Experts (MoE) model becomes an important choice of large language models nowadays because of its scalability with sublinear computational complexity for training and inference.

Focus on Your Target: A Dual Teacher-Student Framework for Domain-adaptive Semantic Segmentation

no code implementations ICCV 2023 Xinyue Huo, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Currently, a popular UDA framework lies in self-training which endows the model with two-fold abilities: (i) learning reliable semantics from the labeled images in the source domain, and (ii) adapting to the target domain via generating pseudo labels on the unlabeled images.

Semantic Segmentation Unsupervised Domain Adaptation

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

1 code implementation4 Nov 2022 Chengcheng Ma, Yang Liu, Jiankang Deng, Lingxi Xie, WeiMing Dong, Changsheng Xu

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts.

object-detection Open Vocabulary Object Detection +2

Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast

3 code implementations3 Nov 2022 Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, Qi Tian

In this paper, we present Pangu-Weather, a deep learning based system for fast and accurate global weather forecast.

Learnable Distribution Calibration for Few-Shot Class-Incremental Learning

no code implementations1 Oct 2022 Binghao Liu, Boyu Yang, Lingxi Xie, Ren Wang, Qi Tian, Qixiang Ye

LDC is built upon a parameterized calibration unit (PCU), which initializes biased distributions for all classes based on classifier vectors (memory-free) and a single covariance matrix.

Few-Shot Class-Incremental Learning Few-Shot Learning +2

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

1 code implementation31 Jul 2022 Maosen Li, Siheng Chen, Zijing Zhang, Lingxi Xie, Qi Tian, Ya zhang

To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into richer graph spectrum bands.

Human motion prediction motion prediction

Visual Recognition by Request

1 code implementation CVPR 2023 Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian

Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal.

Instance Segmentation Semantic Segmentation

Active Pointly-Supervised Instance Segmentation

1 code implementation23 Jul 2022 Chufeng Tang, Lingxi Xie, Gang Zhang, Xiaopeng Zhang, Qi Tian, Xiaolin Hu

In this paper, we present an economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object.

Active Learning Instance Segmentation +2

A Survey on Label-efficient Deep Image Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction

no code implementations4 Jul 2022 Wei Shen, Zelin Peng, Xuehui Wang, Huayu Wang, Jiazhong Cen, Dongsheng Jiang, Lingxi Xie, Xiaokang Yang, Qi Tian

Next, we summarize the existing label-efficient image segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, and cross-image relation.

Image Segmentation Instance Segmentation +2

Fast Dynamic Radiance Fields with Time-Aware Neural Voxels

1 code implementation30 May 2022 Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nießner, Qi Tian

A multi-distance interpolation method is proposed and applied on voxel features to model both small and large motions.

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation30 May 2022 Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

CenterNet++ for Object Detection

2 code implementations18 Apr 2022 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint).

Object object-detection +1

MVP: Multimodality-guided Visual Pre-training

no code implementations10 Mar 2022 Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

Recently, masked image modeling (MIM) has become a promising direction for visual pre-training.

Language Modelling

One-Bit Active Query With Contrastive Pairs

no code implementations CVPR 2022 Yuhang Zhang, Xiaopeng Zhang, Lingxi Xie, Jie Li, Robert C. Qiu, Hengtong Hu, Qi Tian

The Yes query is treated as positive pairs of the queried category for contrastive pulling, while the No query is treated as hard negative pairs for contrastive repelling.

Active Learning Contrastive Learning

Exploring Complicated Search Spaces with Interleaving-Free Sampling

no code implementations5 Dec 2021 Yunjie Tian, Lingxi Xie, Jiemin Fang, Jianbin Jiao, Qixiang Ye, Qi Tian

In this paper, we build the search algorithm upon a complicated search space with long-distance connections, and show that existing weight-sharing search algorithms mostly fail due to the existence of \textbf{interleaved connections}.

Neural Architecture Search

NeuSample: Neural Sample Field for Efficient View Synthesis

1 code implementation30 Nov 2021 Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian

Neural radiance fields (NeRF) have shown great potentials in representing 3D scenes and synthesizing novel views, but the computational overhead of NeRF at the inference stage is still heavy.

Semantic-Aware Generation for Self-Supervised Visual Representation Learning

1 code implementation25 Nov 2021 Yunjie Tian, Lingxi Xie, Xiaopeng Zhang, Jiemin Fang, Haohang Xu, Wei Huang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this paper, we propose a self-supervised visual representation learning approach which involves both generative and discriminative proxies, where we focus on the former part by requiring the target network to recover the original image based on the mid-level features.

Representation Learning Semantic Segmentation

Consensus Synergizes with Memory: A Simple Approach for Anomaly Segmentation in Urban Scenes

no code implementations24 Nov 2021 Jiazhong Cen, Zenkun Jiang, Lingxi Xie, Qi Tian, Xiaokang Yang, Wei Shen

Anomaly segmentation is a crucial task for safety-critical applications, such as autonomous driving in urban scenes, where the goal is to detect out-of-distribution (OOD) objects with categories which are unseen during training.

Anomaly Detection Autonomous Driving +1

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

1 code implementation19 Oct 2021 Peng Zhou, Lingxi Xie, Bingbing Ni, Qi Tian

The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses.

3D-Aware Image Synthesis Transfer Learning

Deep Encryption: Protecting Pre-Trained Neural Networks with Confusion Neurons

no code implementations29 Sep 2021 Mengbiao Zhao, Shixiong Xu, Jianlong Chang, Lingxi Xie, Jie Chen, Qi Tian

Having consumed huge amounts of training data and computational resource, large-scale pre-trained models are often considered key assets of AI service providers.

Position

Vibration-based Uncertainty Estimation for Learning from Limited Supervision

no code implementations29 Sep 2021 Hengtong Hu, Lingxi Xie, Yinquan Wang, Richang Hong, Meng Wang, Qi Tian

We investigate the problem of estimating uncertainty for training data, so that deep neural networks can make use of the results for learning from limited supervision.

Active Learning

Bag of Instances Aggregation Boosts Self-supervised Distillation

1 code implementation ICLR 2022 Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

Here bag of instances indicates a set of similar samples constructed by the teacher and are grouped within a bag, and the goal of distillation is to aggregate compact representations over the student with respect to instances in a bag.

Contrastive Learning Linear evaluation +1

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation

no code implementations CVPR 2021 Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Semi-supervised learning is a useful tool for image segmentation, mainly due to its ability in extracting knowledge from unlabeled data to assist learning from labeled data.

Continual Learning Image Segmentation +3

Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task

no code implementations1 Jun 2021 Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models.

Diversity Self-Supervised Learning

Conformer: Local Features Coupling Global Representations for Visual Recognition

4 code implementations ICCV 2021 Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, YaoWei Wang, Jianbin Jiao, Qixiang Ye

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations.

Image Classification Instance Segmentation +4

Visformer: The Vision-friendly Transformer

5 code implementations ICCV 2021 Zhengsu Chen, Lingxi Xie, Jianwei Niu, Xuefeng Liu, Longhui Wei, Qi Tian

The past year has witnessed the rapid development of applying the Transformer module to vision problems.

Image Classification

Location-Sensitive Visual Recognition with Cross-IOU Loss

1 code implementation11 Apr 2021 Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks.

2D Human Pose Estimation Instance Segmentation +5

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

no code implementations CVPR 2021 Zhikai Chen, Lingxi Xie, Shanmin Pang, Yong He, Bo Zhang

This paper presents MagDR, a mask-guided detection and reconstruction pipeline for defending deepfakes from adversarial attacks.

Interactive Fusion of Multi-level Features for Compositional Activity Recognition

1 code implementation10 Dec 2020 Rui Yan, Lingxi Xie, Xiangbo Shu, Jinhui Tang

To understand a complex action, multiple sources of information, including appearance, positional, and semantic features, need to be integrated.

Action Recognition

UnrealPerson: An Adaptive Pipeline towards Costless Person Re-identification

1 code implementation CVPR 2021 Tianyu Zhang, Lingxi Xie, Longhui Wei, Zijie Zhuang, Yongfei Zhang, Bo Li, Qi Tian

The main difficulty of person re-identification (ReID) lies in collecting annotated data and transferring the model across different domains.

Domain Adaptation Image Generation +1

Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning

no code implementations4 Dec 2020 Haohang Xu, Xiaopeng Zhang, Hao Li, Lingxi Xie, Hongkai Xiong, Qi Tian

In this paper, we propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to \textbf{Cross-samples and Multi-level} representation, and models the invariance to semantically similar images in a hierarchical way.

Contrastive Learning Linear evaluation +3

Batch Normalization with Enhanced Linear Transformation

1 code implementation28 Nov 2020 Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.

Omni-GAN: On the Secrets of cGANs and Beyond

3 code implementations ICCV 2021 Peng Zhou, Lingxi Xie, Bingbing Ni, Cong Geng, Qi Tian

The conditional generative adversarial network (cGAN) is a powerful tool of generating high-quality images, but existing approaches mostly suffer unsatisfying performance or the risk of mode collapse.

Conditional Image Generation Generative Adversarial Network

Heterogeneous Contrastive Learning: Encoding Spatial Information for Compact Visual Representations

no code implementations19 Nov 2020 Xinyue Huo, Lingxi Xie, Longhui Wei, Xiaopeng Zhang, Hao Li, Zijie Yang, Wengang Zhou, Houqiang Li, Qi Tian

Contrastive learning has achieved great success in self-supervised visual representation learning, but existing approaches mostly ignored spatial information which is often crucial for visual representation.

Contrastive Learning Data Augmentation +1

Privileged Knowledge Distillation for Online Action Detection

no code implementations18 Nov 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

Knowledge Distillation Online Action Detection

One-bit Supervision for Image Classification

1 code implementation NeurIPS 2020 Hengtong Hu, Lingxi Xie, Zewei Du, Richang Hong, Qi Tian

Instead of training a model upon the accurate label of each sample, our setting requires the model to query with a predicted label of each sample and learn from the answer whether the guess is correct.

Classification General Classification +1

Polar Relative Positional Encoding for Video-Language Segmentation

no code implementations20 Jul 2020 Ke Ning, Lingxi Xie, Fei Wu, Qi Tian

In this paper, we propose a novel Polar Relative Positional Encoding (PRPE) mechanism that represents spatial relations in a ``linguistic'' way, i. e., in terms of direction and range.

Referring Expression Segmentation Sentence

Social Adaptive Module for Weakly-supervised Group Activity Recognition

no code implementations ECCV 2020 Rui Yan, Lingxi Xie, Jinhui Tang, Xiangbo Shu, Qi Tian

This paper presents a new task named weakly-supervised group activity recognition (GAR) which differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data.

Group Activity Recognition

Universal-to-Specific Framework for Complex Action Recognition

no code implementations13 Jul 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Qi Tian

The U2S framework is composed of three subnetworks: a universal network, a category-specific network, and a mask network.

Action Recognition Decision Making

Discretization-Aware Architecture Search

1 code implementation7 Jul 2020 Yunjie Tian, Chang Liu, Lingxi Xie, Jianbin Jiao, Qixiang Ye

The search cost of neural architecture search (NAS) has been largely reduced by weight-sharing methods.

Image Classification Neural Architecture Search

GOLD-NAS: Gradual, One-Level, Differentiable

1 code implementation7 Jul 2020 Kaifeng Bi, Lingxi Xie, Xin Chen, Longhui Wei, Qi Tian

There has been a large literature of neural architecture search, but most existing work made use of heuristic rules that largely constrained the search flexibility.

Image Classification Neural Architecture Search

Searching towards Class-Aware Generators for Conditional Generative Adversarial Networks

1 code implementation25 Jun 2020 Peng Zhou, Lingxi Xie, Xiaopeng Zhang, Bingbing Ni, Qi Tian

To learn the sampling policy, a Markov decision process is embedded into the search algorithm and a moving average is applied for better stability.

Image Generation

ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Medical Image Segmentation

no code implementations24 Jun 2020 Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Qi Tian

This paper focuses on a popular pipeline known as self learning, and points out a weakness named lazy learning that refers to the difficulty for a model to learn from the pseudo labels generated by itself.

Autonomous Driving Image Segmentation +4

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

no code implementations17 Apr 2020 Xin Chen, Lingxi Xie, Jun Wu, Longhui Wei, Yuhui Xu, Qi Tian

We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal.

Neural Architecture Search

Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

1 code implementation CVPR 2020 Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian

In recent years, cross-modal hashing (CMH) has attracted increasing attentions, mainly because its potential ability of mapping contents from different modalities, especially in vision and language, into the same space, so that it becomes efficient in cross-modal data retrieval.

Knowledge Distillation Retrieval

Circumventing Outliers of AutoAugment with Knowledge Distillation

1 code implementation ECCV 2020 Longhui Wei, An Xiao, Lingxi Xie, Xin Chen, Xiaopeng Zhang, Qi Tian

AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks, yet it is sensitive to the operator space as well as hyper-parameters, and an improper setting may degenerate network optimization.

Data Augmentation General Classification +2

Bottom-Up Temporal Action Localization with Mutual Regularization

1 code implementation ECCV 2020 Peisen Zhao, Lingxi Xie, Chen Ju, Ya zhang, Yan-Feng Wang, Qi Tian

To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases.

Temporal Action Localization

Latency-Aware Differentiable Neural Architecture Search

1 code implementation17 Jan 2020 Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Bowen Shi, Qi Tian, Hongkai Xiong

However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.

Neural Architecture Search

Wasserstein-Bounded Generative Adversarial Networks

no code implementations ICLR 2020 Peng Zhou, Bingbing Ni, Lingxi Xie, Xiaopeng Zhang, Hang Wang, Cong Geng, Qi Tian

In the field of Generative Adversarial Networks (GANs), how to design a stable training strategy remains an open problem.

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

4 code implementations23 Dec 2019 Xin Chen, Lingxi Xie, Jun Wu, Qi Tian

With the rapid development of neural architecture search (NAS), researchers found powerful network architectures for a wide range of vision tasks.

Neural Architecture Search

Appending Adversarial Frames for Universal Video Attack

no code implementations10 Dec 2019 Zhikai Chen, Lingxi Xie, Shanmin Pang, Yong He, Qi Tian

There have been many efforts in attacking image classification models with adversarial perturbations, but the same topic on video classification has not yet been thoroughly studied.

Classification General Classification +2

Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters

1 code implementation25 Oct 2019 Kaifeng Bi, Changping Hu, Lingxi Xie, Xin Chen, Longhui Wei, Qi Tian

Our approach bridges the gap from two aspects, namely, amending the estimation on the architectural gradients, and unifying the hyper-parameter settings in the search and re-training stages.

Neural Architecture Search

Fast Non-Local Neural Networks with Spectral Residual Learning

1 code implementation MM '19: Proceedings of the 27th ACM International Conference on Multimedia 2019 Lu Chi, Guiyu Tian, Yadong Mu, Lingxi Xie, Qi Tian

We show its equivalence to conducting residual learning in some spectral domain and carefully re-formulate a variety of neural layers into their spectral forms, such as ReLU or convolutions.

Pose Estimation Video Classification

Pruning from Scratch

1 code implementation27 Sep 2019 Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu

Network pruning is an important research field aiming at reducing computational costs of neural networks.

Network Pruning

Single Camera Training for Person Re-identification

1 code implementation24 Sep 2019 Tianyu Zhang, Lingxi Xie, Longhui Wei, Yongfei Zhang, Bo Li, Qi Tian

Differently, this paper investigates ReID in an unexplored single-camera-training (SCT) setting, where each person in the training set appears in only one camera.

Metric Learning Person Re-Identification

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data

no code implementations19 Sep 2019 Zhuoxun He, Lingxi Xie, Xin Chen, Ya zhang, Yan-Feng Wang, Qi Tian

Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks.

Data Augmentation Image Classification +2

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

8 code implementations ICLR 2020 Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

Differentiable architecture search (DARTS) provided a fast solution in finding effective network architectures, but suffered from large memory and computing overheads in jointly training a super-network and searching for an optimal architecture.

Neural Architecture Search

Defending Adversarial Attacks by Correcting logits

no code implementations26 Jun 2019 Yifeng Li, Lingxi Xie, Ya zhang, Rui Zhang, Yanfeng Wang, Qi Tian

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning.

Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation

4 code implementations ICCV 2019 Xin Chen, Lingxi Xie, Jun Wu, Qi Tian

Recently, differentiable search methods have made major progress in reducing the computational costs of neural architecture search.

Neural Architecture Search

CenterNet: Keypoint Triplets for Object Detection

20 code implementations ICCV 2019 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions.

Object object-detection +1

Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval

1 code implementation ICCV 2019 Qing Liu, Lingxi Xie, Huiyu Wang, Alan Yuille

Sketch-based image retrieval (SBIR) is widely recognized as an important vision problem which implies a wide range of real-world applications.

Domain Adaptation Retrieval +2

Thickened 2D Networks for Efficient 3D Medical Image Segmentation

no code implementations2 Apr 2019 Qihang Yu, Yingda Xia, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille

With this design, we achieve a higher performance while maintaining a lower inference latency on a few abdominal organs from CT scans, in particular when the organ has a peculiar 3D shape and thus strongly requires contextual information, demonstrating our method's effectiveness and ability in capturing 3D information.

Image Segmentation Medical Image Segmentation +2

SIXray : A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images

1 code implementation2 Jan 2019 Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye

In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection.

Object Localization

Identity-Enhanced Network for Facial Expression Recognition

no code implementations11 Dec 2018 Yanwei Li, Xingang Wang, Shilei Zhang, Lingxi Xie, Wenqi Wu, Hongyuan Yu, Zheng Zhu

Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Attention-guided Unified Network for Panoptic Segmentation

no code implementations CVPR 2019 Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang

This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.

Panoptic Segmentation Segmentation

CRAVES: Controlling Robotic Arm with a Vision-based Economic System

1 code implementation CVPR 2019 Yiming Zuo, Weichao Qiu, Lingxi Xie, Fangwei Zhong, Yizhou Wang, Alan L. Yuille

We also construct a vision-based control system for task accomplishment, for which we train a reinforcement learning agent in a virtual environment and apply it to the real-world.

3D Pose Estimation Domain Adaptation

Elastic Boundary Projection for 3D Medical Image Segmentation

2 code implementations CVPR 2019 Tianwei Ni, Lingxi Xie, Huangjie Zheng, Elliot K. Fishman, Alan L. Yuille

The key observation is that, although the object is a 3D volume, what we really need in segmentation is to find its boundary which is a 2D surface.

3D Medical Imaging Segmentation Image Segmentation +3

Iterative Reorganization with Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning

1 code implementation CVPR 2019 Chen Wei, Lingxi Xie, Xutong Ren, Yingda Xia, Chi Su, Jiaying Liu, Qi Tian, Alan L. Yuille

We consider spatial contexts, for which we solve so-called jigsaw puzzles, i. e., each image is cut into grids and then disordered, and the goal is to recover the correct configuration.

General Classification Image Classification +4

Snapshot Distillation: Teacher-Student Optimization in One Generation

no code implementations CVPR 2019 Chenglin Yang, Lingxi Xie, Chi Su, Alan L. Yuille

Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting.

Image Classification object-detection +2

Generalized Coarse-to-Fine Visual Recognition with Progressive Training

no code implementations29 Nov 2018 Xutong Ren, Lingxi Xie, Chen Wei, Siyuan Qiao, Chi Su, Jiaying Liu, Qi Tian, Elliot K. Fishman, Alan L. Yuille

Computer vision is difficult, partly because the desired mathematical function connecting input and output data is often complex, fuzzy and thus hard to learn.

Image Classification Object Localization +1

Phase Collaborative Network for Two-Phase Medical Image Segmentation

no code implementations28 Nov 2018 Huangjie Zheng, Lingxi Xie, Tianwei Ni, Ya zhang, Yan-Feng Wang, Qi Tian, Elliot K. Fishman, Alan L. Yuille

However, in medical image analysis, fusing prediction from two phases is often difficult, because (i) there is a domain gap between two phases, and (ii) the semantic labels are not pixel-wise corresponded even for images scanned from the same patient.

Image Segmentation Medical Image Segmentation +3

Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints from Limited Training Data

1 code implementation ICCV 2019 Yutong Bai, Qing Liu, Lingxi Xie, Weichao Qiu, Yan Zheng, Alan Yuille

In particular, this enables images in the training dataset to be matched to a virtual 3D model of the object (for simplicity, we assume that the object viewpoint can be estimated by standard techniques).

Clustering Object +1

Accelerating Deep Neural Networks with Spatial Bottleneck Modules

no code implementations7 Sep 2018 Junran Peng, Lingxi Xie, Zhao-Xiang Zhang, Tieniu Tan, Jingdong Wang

This paper presents an efficient module named spatial bottleneck for accelerating the convolutional layers in deep neural networks.

Infinite Curriculum Learning for Efficiently Detecting Gastric Ulcers in WCE Images

no code implementations7 Sep 2018 Xiaolu Zhang, Shiwan Zhao, Lingxi Xie

This paper considers WCE-based gastric ulcer detection, in which the major challenge is to detect the lesions in a local region.

Binary Classification

Attention-based Pyramid Aggregation Network for Visual Place Recognition

no code implementations1 Aug 2018 Yingying Zhu, Jiong Wang, Lingxi Xie, Liang Zheng

Visual place recognition is challenging in the urban environment and is usually viewed as a large scale image retrieval task.

Image Retrieval Retrieval +1

Multi-Scale Coarse-to-Fine Segmentation for Screening Pancreatic Ductal Adenocarcinoma

no code implementations9 Jul 2018 Zhuotun Zhu, Yingda Xia, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille

We propose an intuitive approach of detecting pancreatic ductal adenocarcinoma (PDAC), the most common type of pancreatic cancer, by checking abdominal CT scans.

General Classification Segmentation +1

G2C: A Generator-to-Classifier Framework Integrating Multi-Stained Visual Cues for Pathological Glomerulus Classification

no code implementations30 Jun 2018 Bingzhe Wu, Xiaolu Zhang, Shiwan Zhao, Lingxi Xie, Caihong Zeng, Zhihong Liu, Guangyu Sun

Given an input image from a specified stain, several generators are first applied to estimate its appearances in other staining methods, and a classifier follows to combine visual cues from different stains for prediction (whether it is pathological, or which type of pathology it has).

Classification Decision Making +2

Joint Shape Representation and Classification for Detecting PDAC

no code implementations27 Apr 2018 Fengze Liu, Lingxi Xie, Yingda Xia, Elliot K. Fishman, Alan L. Yuille

Shape representation and classification are performed in a joint manner, both to exploit the knowledge that PDAC often changes the shape of the pancreas and to prevent over-fitting.

Classification General Classification +1

Multi-Scale Spatially-Asymmetric Recalibration for Image Classification

no code implementations ECCV 2018 Yan Wang, Lingxi Xie, Siyuan Qiao, Ya zhang, Wenjun Zhang, Alan L. Yuille

Convolution is spatially-symmetric, i. e., the visual features are independent of its position in the image, which limits its ability to utilize contextual cues for visual recognition.

Classification General Classification +2

SampleAhead: Online Classifier-Sampler Communication for Learning from Synthesized Data

no code implementations1 Apr 2018 Qi Chen, Weichao Qiu, Yi Zhang, Lingxi Xie, Alan Yuille

But, this raises an important problem in active vision: given an {\bf infinite} data space, how to effectively sample a {\bf finite} subset to train a visual classifier?

Classification General Classification

Adversarial Attacks Beyond the Image Space

no code implementations CVPR 2019 Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi Keung Tang, Alan L. Yuille

Though image-space adversaries can be interpreted as per-pixel albedo change, we verify that they cannot be well explained along these physically meaningful dimensions, which often have a non-local effect.

Question Answering Visual Question Answering

Visual Concepts and Compositional Voting

no code implementations13 Nov 2017 Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, Alan Yuille

We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles.

Clustering Semantic Part Detection

DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

no code implementations CVPR 2018 Zhishuai Zhang, Cihang Xie, Jian-Yu Wang, Lingxi Xie, Alan L. Yuille

The first layer extracts the evidence of local visual cues, and the second layer performs a voting mechanism by utilizing the spatial relationship between visual cues and semantic parts.

Semantic Part Detection

Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation

2 code implementations CVPR 2018 Qihang Yu, Lingxi Xie, Yan Wang, Yuyin Zhou, Elliot K. Fishman, Alan L. Yuille

The key innovation is a saliency transformation module, which repeatedly converts the segmentation probability map from the previous iteration as spatial weights and applies these weights to the current iteration.

Organ Segmentation Pancreas Segmentation +1

Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT Scans

no code implementations22 Jun 2017 Yuyin Zhou, Lingxi Xie, Elliot K. Fishman, Alan L. Yuille

Inspired by the high relevance between the location of a pancreas and its cystic region, we introduce extra deep supervision into the segmentation network, so that cyst segmentation can be improved with the help of relatively easier pancreas segmentation.

Pancreas Segmentation Segmentation

Adversarial Examples for Semantic Segmentation and Object Detection

2 code implementations ICCV 2017 Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille

Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e. g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations.

Adversarial Attack Object +4

SORT: Second-Order Response Transform for Visual Recognition

no code implementations ICCV 2017 Yan Wang, Lingxi Xie, Chenxi Liu, Ya zhang, Wenjun Zhang, Alan Yuille

In this paper, we reveal the importance and benefits of introducing second-order operations into deep neural networks.

Genetic CNN

1 code implementation ICCV 2017 Lingxi Xie, Alan Yuille

The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition.

Object Recognition

Deep Collaborative Learning for Visual Recognition

no code implementations3 Mar 2017 Yan Wang, Lingxi Xie, Ya zhang, Wenjun Zhang, Alan Yuille

We formulate the function of a convolutional layer as learning a large visual vocabulary, and propose an alternative way, namely Deep Collaborative Learning (DCL), to reduce the computational complexity.

General Classification Image Classification

Object Recognition with and without Objects

1 code implementation20 Nov 2016 Zhuotun Zhu, Lingxi Xie, Alan L. Yuille

While recent deep neural networks have achieved a promising performance on object recognition, they rely implicitly on the visual contents of the whole image.

Object Object Recognition

Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons

no code implementations21 Jul 2016 Lingxi Xie, Qi Tian, John Flynn, Jingdong Wang, Alan Yuille

For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them.

Image Classification

InterActive: Inter-Layer Activeness Propagation

no code implementations CVPR 2016 Lingxi Xie, Liang Zheng, Jingdong Wang, Alan Yuille, Qi Tian

An increasing number of computer vision tasks can be tackled with deep features, which are the intermediate outputs of a pre-trained Convolutional Neural Network.

Descriptive General Classification

DisturbLabel: Regularizing CNN on the Loss Layer

2 code implementations CVPR 2016 Lingxi Xie, Jingdong Wang, Zhen Wei, Meng Wang, Qi Tian

During a long period of time we are combating over-fitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc.

Data Augmentation

RIDE: Reversal Invariant Descriptor Enhancement

no code implementations ICCV 2015 Lingxi Xie, Jingdong Wang, Weiyao Lin, Bo Zhang, Qi Tian

In many fine-grained object recognition datasets, image orientation (left/right) might vary from sample to sample.

Object Recognition

Fidelity-Naturalness Evaluation of Single Image Super Resolution

no code implementations21 Nov 2015 Xuan Dong, Yu Zhu, Weixin Li, Lingxi Xie, Alex Wong, Alan Yuille

In this paper, we proposed to use both fidelity (the difference with original images) and naturalness (human visual perception of super resolved images) for evaluation.

Image Quality Assessment Image Super-Resolution

Orientational Pyramid Matching for Recognizing Indoor Scenes

no code implementations CVPR 2014 Lingxi Xie, Jingdong Wang, Baining Guo, Bo Zhang, Qi Tian

The novelty lies in that OPM uses the 3D orientations to form the pyramid and produce the pooling regions, which is unlike SPM that uses the spatial positions to form the pyramid.

General Classification Scene Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.