Search Results for author: Xiangyu Zhang

Found 188 papers, 106 papers with code

Aligning Speech to Languages to Enhance Code-switching Speech Recognition

no code implementations9 Mar 2024 Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

Performance evaluation using large language models reveals the advantage of the linguistic hint by achieving 14. 1% and 5. 5% relative improvement on test sets of the ASRU and SEAME datasets, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

no code implementations17 Feb 2024 Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

In addition, this approach is not only valuable for the detection of depression but also represents a new perspective in enhancing the ability of LLMs to comprehend and process speech signals.

Depression Detection

When Dataflow Analysis Meets Large Language Models

no code implementations16 Feb 2024 Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiaoheng Xie, Xiangyu Zhang

Dataflow analysis is a powerful code analysis technique that reasons dependencies between program values, offering support for code optimization, program comprehension, and bug detection.

Hallucination

Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia

1 code implementation8 Feb 2024 Guangyu Shen, Siyuan Cheng, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Lu Yan, Zhuo Zhang, Shiqing Ma, Xiangyu Zhang

Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities.

MULTIVERSE: Exposing Large Language Model Alignment Problems in Diverse Worlds

no code implementations25 Jan 2024 Xiaolong Jin, Zhuo Zhang, Xiangyu Zhang

Given the low cost of our method, we are able to conduct a large scale study regarding LLM alignment issues in different worlds.

Language Modelling Large Language Model

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations23 Jan 2024 Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Language Modelling Large Language Model +3

Stream Query Denoising for Vectorized HD Map Construction

no code implementations17 Jan 2024 Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.

Autonomous Driving Denoising

Slot-guided Volumetric Object Radiance Fields

no code implementations NeurIPS 2023 Di Qi, Tong Yang, Xiangyu Zhang

We hope our approach can provide preliminary understanding of the physical world and help ease future research in 3D object-centric representation learning.

Object Representation Learning

Bootstrap Masked Visual Modeling via Hard Patches Mining

1 code implementation21 Dec 2023 Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang

To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

1 code implementation11 Dec 2023 Hao Tan, Jun Li, Yizhuang Zhou, Jun Wan, Zhen Lei, Xiangyu Zhang

We introduce text supervision to the optimization of prompts, which enables two benefits: 1) releasing the model reliance on the pre-defined category names during inference, thereby enabling more flexible prompt generation; 2) reducing the number of inputs to the text encoder, which decreases GPU memory consumption significantly.

Domain Generalization

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

no code implementations8 Dec 2023 Zhuo Zhang, Guangyu Shen, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang

Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits.

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations30 Nov 2023 En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Visual Question Answering

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

no code implementations28 Nov 2023 Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.

Autonomous Driving Video Generation

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

1 code implementation27 Nov 2023 Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, QiuLing Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang

Diffusion models (DM) have become state-of-the-art generative models because of their capability to generate high-quality images from noises without adversarial training.

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

1 code implementation27 Nov 2023 Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, WenHan Chao, Leibny Paola Garcia

There is a positive correlation between PSR scores and ASR performance, suggesting that phonetic information extracted by monolingual SSL models can be used for downstream tasks in cross-lingual settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Nova$^+$: Generative Language Models for Binaries

no code implementations22 Nov 2023 Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang

We build Nova$^+$ to further boost Nova using two new pre-training tasks, i. e., optimization generation and optimization level prediction, which are designed to learn binary optimization and align equivalent binaries.

Code Translation Compiler Optimization +2

ADriver-I: A General World Model for Autonomous Driving

no code implementations22 Nov 2023 Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang

Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I.

Autonomous Driving

Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration

1 code implementation NeurIPS 2023 Longlin Yu, Tianyu Xie, Yu Zhu, Tong Yang, Xiangyu Zhang, Cheng Zhang

Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner.

Bayesian Inference Variational Inference

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

1 code implementation16 Oct 2023 Ruiqi Wu, Liangyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang

Specifically, we design a first-frame-conditioned pipeline that uses an off-the-shelf text-to-image model for content generation so that our tuned video diffusion model mainly focuses on motion learning.

Image Animation Text-to-Image Generation +2

Secondary frequency control of islanded microgrid considering wind and solar stochastics

no code implementations8 Oct 2023 Cheng Zhong, Zhifu Jiang, Xiangyu Zhang, Jikai Chen, Yang Li

Finally, a microgrid simulation model including multiple PV and wind DGs is built and performed in various scenarios compared to the traditional secondary frequency control method.

Model Predictive Control

Cold & Warm Net: Addressing Cold-Start Users in Recommender Systems

no code implementations27 Sep 2023 Xiangyu Zhang, Zongqiang Kuang, Zehao Zhang, Fan Huang, Xianfeng Tan

Finally, we evaluate our Cold & Warm Net on public datasets in comparison to models commonly applied in the matching stage and it outperforms other models on all user types.

Knowledge Distillation Meta-Learning +1

Unidirectional brain-computer interface: Artificial neural network encoding natural images to fMRI response in the visual cortex

1 code implementation26 Sep 2023 Ruixing Liang, Xiangyu Zhang, Qiong Li, Lai Wei, Hexin Liu, Avisha Kumar, Kelley M. Kempski Leadingham, Joshua Punnoose, Leibny Paola Garcia, Amir Manbachi

While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored.

Brain Computer Interface

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation20 Sep 2023 Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

 Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

Language Prompt for Autonomous Driving

2 code implementations8 Sep 2023 Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang, Jianbing Shen

A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt.

Autonomous Driving Object

RevColV2: Exploring Disentangled Representations in Masked Image Modeling

1 code implementation NeurIPS 2023 Qi Han, Yuxuan Cai, Xiangyu Zhang

Such design enables our architecture with the nice property: maintaining disentangled low-level and semantic information at the end of the network in MIM pre-training.

Image Classification object-detection +3

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers

no code implementations14 Aug 2023 Xijun Wang, Xiaojie Chu, Chunrui Han, Xiangyu Zhang

This paper presents a module, Spatial Cross-scale Convolution (SCSC), which is verified to be effective in improving both CNNs and Transformers.

Face Recognition

POSIT: Promotion of Semantic Item Tail via Adversarial Learning

no code implementations7 Aug 2023 QiuLing Xu, Pannaga Shivaswamy, Xiangyu Zhang

We subsequently use that metric in an adversarial learning framework to systematically promote disadvantaged items.

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations18 Jul 2023 Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations18 Jul 2023 Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

no code implementations17 Jul 2023 Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam

In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy.

energy management Inductive Bias +3

MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

no code implementations30 May 2023 Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles

To enhance the reliability and robustness of language identification (LID) and language diarization (LD) systems for heterogeneous populations and scenarios, there is a need for speech processing models to be trained on datasets that feature diverse language registers and speech patterns.

Language Identification

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking

no code implementations23 May 2023 En Yu, Tiancai Wang, Zhuoling Li, Yuang Zhang, Xiangyu Zhang, Wenbing Tao

Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics.

Denoising Multi-Object Tracking +1

Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection

no code implementations28 Apr 2023 Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang

We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks.

3D Object Detection Autonomous Driving +2

Self-supervised Learning by View Synthesis

no code implementations22 Apr 2023 Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose.

3D Classification Self-Supervised Learning

Align-DETR: Improving DETR with Simple IoU-aware BCE loss

1 code implementation15 Apr 2023 Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di Huang

We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.

object-detection Object Detection

Detecting Backdoors in Pre-trained Encoders

1 code implementation CVPR 2023 Shiwei Feng, Guanhong Tao, Siyuan Cheng, Guangyu Shen, Xiangzhe Xu, Yingqi Liu, Kaiyuan Zhang, Shiqing Ma, Xiangyu Zhang

We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs.

Self-Supervised Learning

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

1 code implementation ICCV 2023 Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, Xiangyu Zhang

On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67. 6% NDS & 65. 3% AMOTA) with lidar-based methods.

3D Multi-Object Tracking 3D Object Detection +2

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations10 Mar 2023 Chunrui Han, Jianjian Sun, Zheng Ge, Jinrong Yang, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

Referring Multi-Object Tracking

1 code implementation CVPR 2023 Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xiangyu Zhang, Jianbing Shen

In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT).

Multi-Object Tracking Object

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3 code implementations5 Feb 2023 Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi

This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification +2

KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair

1 code implementation3 Feb 2023 Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, Xiangyu Zhang

KNOD has two major novelties, including (1) a novel three-stage tree decoder, which directly generates Abstract Syntax Trees of patched code according to the inherent tree structure, and (2) a novel domain-rule distillation, which leverages syntactic and semantic rules and teacher-student distributions to explicitly inject the domain knowledge into the decoding procedure during both the training and inference phases.

Program Repair

BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense

1 code implementation16 Jan 2023 Siyuan Cheng, Guanhong Tao, Yingqi Liu, Shengwei An, Xiangzhe Xu, Shiwei Feng, Guangyu Shen, Kaiyuan Zhang, QiuLing Xu, Shiqing Ma, Xiangyu Zhang

Attack forensics, a critical counter-measure for traditional cyber attacks, is hence of importance for defending model backdoor attacks.

Backdoor Attack

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

2 code implementations CVPR 2023 Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia

Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers.

3D Semantic Segmentation Segmentation

MEDIC: Remove Model Backdoors via Importance Driven Cloning

no code implementations CVPR 2023 QiuLing Xu, Guanhong Tao, Jean Honorio, Yingqi Liu, Shengwei An, Guangyu Shen, Siyuan Cheng, Xiangyu Zhang

It trains the clone model from scratch on a very small subset of samples and aims to minimize a cloning loss that denotes the differences between the activations of important neurons across the two models.

Knowledge Distillation

Reversible Column Networks

1 code implementation22 Dec 2022 Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang

Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does.

Ranked #8 on Semantic Segmentation on ADE20K (using extra training data)

Image Classification object-detection +3

Backdoor Vulnerabilities in Normally Trained Deep Learning Models

no code implementations29 Nov 2022 Guanhong Tao, Zhenting Wang, Siyuan Cheng, Shiqing Ma, Shengwei An, Yingqi Liu, Guangyu Shen, Zhuo Zhang, Yunshu Mao, Xiangyu Zhang

We leverage 20 different types of injected backdoor attacks in the literature as the guidance and study their correspondences in normally trained models, which we call natural backdoor vulnerabilities.

Data Poisoning

Near-Field Channel Estimation for Extremely Large-Scale Array Communications: A model-based deep learning approach

no code implementations28 Nov 2022 Xiangyu Zhang, Zening Wang, Haiyang Zhang, Luxi Yang

In particular, we first formulate the XL-MIMO near-field channel estimation task as a compressed sensing problem using the spatial gridding-based sparsifying dictionary, and then solve the resulting problem by applying the Learning Iterative Shrinkage and Thresholding Algorithm (LISTA).

Dictionary Learning

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

2 code implementations ICCV 2023 HongYu Zhou, Zheng Ge, Zeming Li, Xiangyu Zhang

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT.

Ranked #2 on Bird's-Eye View Semantic Segmentation on nuScenes (IoU lane - 224x480 - 100x100 at 0.5 metric)

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

4 code implementations CVPR 2023 Yuang Zhang, Tiancai Wang, Xiangyu Zhang

In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector.

 Ranked #1 on Multi-Object Tracking on DanceTrack (using extra training data)

Multi-Object Tracking Multiple Object Tracking +2

Towards 3D Object Detection with 2D Supervision

no code implementations15 Nov 2022 Jinrong Yang, Tiancai Wang, Zheng Ge, Weixin Mao, Xiaoping Li, Xiangyu Zhang

We propose a temporal 2D transformation to bridge the 3D predictions with temporal 2D labels.

3D Object Detection Object +1

From Model-Based to Model-Free: Learning Building Control for Demand Response

1 code implementation18 Oct 2022 David Biagioni, Xiangyu Zhang, Christiane Adcock, Michael Sinner, Peter Graf, Jennifer King

We demonstrate, in this context, that hybrid methods offer many benefits over both purely model-free and model-based methods as long as certain requirements are met.

PQLM -- Multilingual Decentralized Portable Quantum Language Model for Privacy Protection

no code implementations6 Oct 2022 Shuyue Stella Li, Xiangyu Zhang, Shu Zhou, Hongchao Shu, Ruixing Liang, Hexin Liu, Leibny Paola Garcia

In this work, we propose a highly Portable Quantum Language Model (PQLM) that can easily transmit information to downstream tasks on classical machines.

Language Modelling Sentence Embedding +3

Differentiable Architecture Search with Random Features

no code implementations CVPR 2023 Xuanyang Zhang, Yonggang Li, Xiangyu Zhang, Yongtao Wang, Jian Sun

Differentiable architecture search (DARTS) has significantly promoted the development of NAS techniques because of its high search efficiency and effectiveness but suffers from performance collapse.

Neural Architecture Search

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning

1 code implementation30 Jul 2022 Junqiang Huang, Xiangwen Kong, Xiangyu Zhang

We focus on better understanding the critical factors of augmentation-invariant representation learning.

Representation Learning

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

no code implementations11 Jul 2022 Zhiyuan Cheng, James Liang, Hongjun Choi, Guanhong Tao, Zhiwen Cao, Dongfang Liu, Xiangyu Zhang

Experimental results show that our method can generate stealthy, effective, and robust adversarial patches for different target objects and models and achieves more than 6 meters mean depth estimation error and 93% attack success rate (ASR) in object detection with a patch of 1/9 of the vehicle's rear area.

3D Object Detection Autonomous Driving +3

DECK: Model Hardening for Defending Pervasive Backdoors

no code implementations18 Jun 2022 Guanhong Tao, Yingqi Liu, Siyuan Cheng, Shengwei An, Zhuo Zhang, QiuLing Xu, Guangyu Shen, Xiangyu Zhang

As such, using the samples derived from our attack in adversarial training can harden a model against these backdoor vulnerabilities.

Re-parameterizing Your Optimizers rather than Architectures

1 code implementation30 May 2022 Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding

For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models.

Quantization

Self-Supervised Visual Representation Learning with Semantic Grouping

1 code implementation30 May 2022 Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi

The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots.

Contrastive Learning Instance Segmentation +6

GL-RG: Global-Local Representation Granularity for Video Captioning

1 code implementation22 May 2022 Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description.

Descriptive Video Captioning

Focal Sparse Convolutional Networks for 3D Object Detection

2 code implementations CVPR 2022 Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.

3D Object Detection Object +1

Simple Baselines for Image Restoration

9 code implementations10 Apr 2022 Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, Jian Sun

Although there have been significant advances in the field of image restoration recently, the system complexity of the state-of-the-art (SOTA) methods is increasing as well, which may hinder the convenient analysis and comparison of methods.

Deblurring Image Deblurring +2

Near-optimality for infinite-horizon restless bandits with many arms

no code implementations29 Mar 2022 Xiangyu Zhang, Peter I. Frazier

Although an average-case-optimal policy can be computed via stochastic dynamic programming, the computation required grows exponentially with the number of arms $N$.

Active Learning Management +1

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

1 code implementation CVPR 2022 Zhiyuan Liang, Tiancai Wang, Xiangyu Zhang, Jian Sun, Jianbing Shen

The tree energy loss is effective and easy to be incorporated into existing frameworks by combining it with a traditional segmentation loss.

Segmentation Semantic Segmentation

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance

2 code implementations CVPR 2022 Yin-Yin He, Peizhen Zhang, Xiu-Shen Wei, Xiangyu Zhang, Jian Sun

In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one.

Instance Segmentation Semantic Segmentation

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

no code implementations5 Jan 2022 Weijie Zhao, Xuewu Jiao, Mingqing Hu, Xiaoyun Li, Xiangyu Zhang, Ping Li

In this paper, we propose a hardware-aware training workflow that couples the hardware topology into the algorithm design.

Click-Through Rate Prediction

Complex Backdoor Detection by Symmetric Feature Differencing

1 code implementation CVPR 2022 Yingqi Liu, Guangyu Shen, Guanhong Tao, Zhenting Wang, Shiqing Ma, Xiangyu Zhang

Our results on the TrojAI competition rounds 2-4, which have patch backdoors and filter backdoors, show that existing scanners may produce hundreds of false positives (i. e., clean models recognized as trojaned), while our technique removes 78-100% of them with a small increase of false negatives by 0-30%, leading to 17-41% overall accuracy improvement.

Bounded Adversarial Attack on Deep Content Features

1 code implementation CVPR 2022 QiuLing Xu, Guanhong Tao, Xiangyu Zhang

We propose a novel adversarial attack targeting content features in some deep layer, that is, individual neurons in the layer.

Adversarial Attack

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

4 code implementations CVPR 2022 Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Jungong Han, Guiguang Ding

Our results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation.

Image Classification Semantic Segmentation

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

1 code implementation19 Dec 2021 Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.

Ranked #5 on Image Super-Resolution on Set5 - 2x upscaling (using extra training data)

Denoising Image Super-Resolution

Implicit Feature Refinement for Instance Segmentation

1 code implementation9 Dec 2021 Lufan Ma, Tiancai Wang, Bin Dong, Jiangpeng Yan, Xiu Li, Xiangyu Zhang

Our IFR enjoys several advantages: 1) simulates an infinite-depth refinement network while only requiring parameters of single residual block; 2) produces high-level equilibrium instance features of global receptive field; 3) serves as a plug-and-play general module easily extended to most object recognition frameworks.

Instance Segmentation Object Recognition +3

Two-step Lookahead Bayesian Optimization with Inequality Constraints

no code implementations6 Dec 2021 Yunxiang Zhang, Xiangyu Zhang, Peter I. Frazier

Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost.

Bayesian Optimization Vocal Bursts Valence Prediction

Constrained Two-step Look-Ahead Bayesian Optimization

no code implementations NeurIPS 2021 Yunxiang Zhang, Xiangyu Zhang, Peter Frazier

Recent advances in computationally efficient non-myopic Bayesian optimization offer improved query efficiency over traditional myopic methods like expected improvement, with only a modest increase in computational cost.

Bayesian Optimization Vocal Bursts Valence Prediction

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay

no code implementations NeurIPS 2021 Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose ``angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings.

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

1 code implementation10 Nov 2021 David Biagioni, Xiangyu Zhang, Dylan Wald, Deepthi Vaidhynathan, Rohit Chintala, Jennifer King, Ahmed S. Zamzam

We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL).

Multi-agent Reinforcement Learning reinforcement-learning +1

A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

no code implementations8 Nov 2021 David J. Biagioni, Xiangyu Zhang, Peter Graf, Devon Sigler, Wesley Jones

We demonstrate that optimal control for this problem is challenging, requiring more than 8-hour lookahead for MPC with perfect forecasting to attain the minimum cost.

Model Predictive Control Time Series +1

Raw Bayer Pattern Image Synthesis for Computer Vision-oriented Image Signal Processing Pipeline Design

no code implementations25 Oct 2021 Wei Zhou, Xiangyu Zhang, Hongyu Wang, Shenghua Gao, Xin Lou

It is shown that by adding another transformation, the proposed method is able to synthesize high-quality RAW Bayer images with arbitrary size.

Demosaicking Image Generation +3

RWN: Robust Watermarking Network for Image Cropping Localization

no code implementations12 Oct 2021 Qichao Ying, Xiaoxiao Hu, Xiangyu Zhang, Zhenxing Qian, Xinpeng Zhang

At the recipient's side, ACP extracts the watermark from the attacked image, and we conduct feature matching on the original and extracted watermark to locate the position of the crop in the original image plane.

Image Cropping Image Forensics

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better

no code implementations26 Sep 2021 Xuanyang Zhang, Xiangyu Zhang, Jian Sun

Knowledge distillation field delicately designs various types of knowledge to shrink the performance gap between compact student and large-scale teacher.

Knowledge Distillation

LGD: Label-guided Self-distillation for Object Detection

1 code implementation23 Sep 2021 Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun

Instead, we generate an instructive knowledge based only on student representations and regular labels.

Instance Segmentation Object +4

Image Synthesis via Semantic Composition

no code implementations ICCV 2021 Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia

In this paper, we present a novel approach to synthesize realistic images based on their semantic layouts.

Image Generation Semantic Composition

Anchor DETR: Query Design for Transformer-Based Object Detection

2 code implementations15 Sep 2021 Yingming Wang, Xiangyu Zhang, Tong Yang, Jian Sun

Thanks to the query design and the attention variant, the proposed detector that we called Anchor DETR, can achieve better performance and run faster than the DETR with 10$\times$ fewer training epochs.

Object object-detection +1

Accelerating Markov Random Field Inference with Uncertainty Quantification

no code implementations2 Aug 2021 Ramin Bashizade, Xiangyu Zhang, Sayan Mukherjee, Alvin R. Lebeck

In this paper, we propose a high-throughput accelerator for Markov Random Field (MRF) inference, a powerful model for representing a wide range of applications, using MCMC with Gibbs sampling.

Motion Estimation Playing the Game of 2048 +1

Restless Bandits with Many Arms: Beating the Central Limit Theorem

no code implementations25 Jul 2021 Xiangyu Zhang, Peter I. Frazier

Thus, there is substantial value in understanding the performance of index policies and other policies that can be computed efficiently for large $N$.

Active Learning Management +1

The Threat of Offensive AI to Organizations

no code implementations30 Jun 2021 Yisroel Mirsky, Ambra Demontis, Jaidip Kotak, Ram Shankar, Deng Gelei, Liu Yang, Xiangyu Zhang, Wenke Lee, Yuval Elovici, Battista Biggio

Although offensive AI has been discussed in the past, there is a need to analyze and understand the threat in the context of organizations.

SOLQ: Segmenting Objects by Learning Queries

1 code implementation NeurIPS 2021 Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei

Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR.

Ranked #4 on Object Detection on COCO minival (AP75 metric)

Instance Segmentation Object Detection +2

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

9 code implementations5 May 2021 Xiaohan Ding, Chunlong Xia, Xiangyu Zhang, Xiaojie Chu, Jungong Han, Guiguang Ding

We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers.

Face Recognition Image Classification +1

Points as Queries: Weakly Semi-supervised Object Detection by Points

1 code implementation CVPR 2021 Liangyu Chen, Tong Yang, Xiangyu Zhang, Wei zhang, Jian Sun

We propose a novel point annotated setting for the weakly semi-supervised object detection task, in which the dataset comprises small fully annotated images and large weakly annotated images by points.

object-detection Object Detection +1

Joint User Association and Power Allocation in Heterogeneous Ultra Dense Network via Semi-Supervised Representation Learning

no code implementations29 Mar 2021 Xiangyu Zhang, Zhengming Zhang, Luxi Yang

We model the HUDNs as a heterogeneous graph and train a Graph Neural Network (GNN) to approach this representation function by using semi-supervised learning, in which the loss function is composed of the unsupervised part that helps the GNN approach the optimal representation function and the supervised part that utilizes the previous experience to reduce useless exploration.

Computational Efficiency Representation Learning

Diverse Branch Block: Building a Convolution as an Inception-like Unit

2 code implementations CVPR 2021 Xiaohan Ding, Xiangyu Zhang, Jungong Han, Guiguang Ding

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs.

Image Classification object-detection +2

You Only Look One-level Feature

6 code implementations CVPR 2021 Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, Jian Sun

From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {\em utilizing only one-level feature for detection}.

object-detection Object Detection

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

1 code implementation9 Feb 2021 Guangyu Shen, Yingqi Liu, Guanhong Tao, Shengwei An, QiuLing Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang

By iteratively and stochastically selecting the most promising labels for optimization with the guidance of an objective function, we substantially reduce the complexity, allowing to handle models with many classes.

Neural Architecture Search with Random Labels

1 code implementation CVPR 2021 Xuanyang Zhang, Pengfei Hou, Xiangyu Zhang, Jian Sun

In this paper, we investigate a new variant of neural architecture search (NAS) paradigm -- searching with random labels (RLNAS).

Neural Architecture Search

RepVGG: Making VGG-style ConvNets Great Again

22 code implementations CVPR 2021 Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology.

Image Classification Semantic Segmentation

Implicit Feature Pyramid Network for Object Detection

no code implementations25 Dec 2020 Tiancai Wang, Xiangyu Zhang, Jian Sun

In this paper, we present an implicit feature pyramid network (i-FPN) for object detection.

Object object-detection +1

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

2 code implementations21 Dec 2020 Siyuan Cheng, Yingqi Liu, Shiqing Ma, Xiangyu Zhang

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data.

Backdoor Attack

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

1 code implementation3 Dec 2020 Tiancai Wang, Tong Yang, Jiale Cao, Xiangyu Zhang

Object detectors usually achieve promising results with the supervision of complete instance annotations.

MULTI-VIEW LEARNING Object +4

Microlensing Predictions: Impact of Galactic Disc Dynamical Models

no code implementations30 Oct 2020 Hongjing Yang, Shude Mao, Weicheng Zang, Xiangyu Zhang

Additionally, we find the asymptotic power-law behaviors in both $\theta_{\rm E}$ and $\pi_{\rm E}$ distributions, and we provide a simple model to understand them.

Astrophysics of Galaxies Earth and Planetary Astrophysics Solar and Stellar Astrophysics

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track

no code implementations6 Oct 2020 Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun

In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation.

Instance Segmentation object-detection +3

EqCo: Equivalent Rules for Self-supervised Contrastive Learning

1 code implementation5 Oct 2020 Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun

In this paper, we propose EqCo (Equivalent Rules for Contrastive Learning) to make self-supervised learning irrelevant to the number of negative samples in the contrastive learning framework.

Contrastive Learning Self-Supervised Learning

MPG-Net: Multi-Prediction Guided Network for Segmentation of Retinal Layers in OCT Images

no code implementations28 Sep 2020 Zeyu Fu, Yang Sun, Xiangyu Zhang, Scott Stainton, Shaun Barney, Jeffry Hogg, William Innes, Satnam Dlay

In this paper, we propose a novel multiprediction guided attention network (MPG-Net) for automated retinal layer segmentation in OCT images.

Segmentation

Deep Learning & Software Engineering: State of Research and Future Directions

1 code implementation17 Sep 2020 Prem Devanbu, Matthew Dwyer, Sebastian Elbaum, Michael Lowry, Kevin Moran, Denys Poshyvanyk, Baishakhi Ray, Rishabh Singh, Xiangyu Zhang

The intent of this report is to serve as a potential roadmap to guide future work that sits at the intersection of SE & DL.

Activate or Not: Learning Customized Activation

4 code implementations CVPR 2021 Ningning Ma, Xiangyu Zhang, Ming Liu, Jian Sun

We present a simple, effective, and general activation function we term ACON which learns to activate the neurons or not.

object-detection Object Detection +1

Funnel Activation for Visual Recognition

6 code implementations ECCV 2020 Ningning Ma, Xiangyu Zhang, Jian Sun

We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition.

Scene Generation Semantic Segmentation

WeightNet: Revisiting the Design Space of Weight Networks

2 code implementations ECCV 2020 Ningning Ma, Xiangyu Zhang, Jiawei Huang, Jian Sun

WeightNet is easy and memory-conserving to train, on the kernel space instead of the feature space.

LabelEnc: A New Intermediate Supervision Method for Object Detection

1 code implementation ECCV 2020 Miao Hao, Yitao Liu, Xiangyu Zhang, Jian Sun

In this paper we propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems.

Object object-detection +1

Weight-dependent Gates for Network Pruning

no code implementations4 Jul 2020 Yun Li, Zechun Liu, Weiqun Wu, Haotian Yao, Xiangyu Zhang, Chi Zhang, Baoqun Yin

In this paper, a simple yet effective network pruning framework is proposed to simultaneously address the problems of pruning indicator, pruning ratio, and efficiency constraint.

Network Pruning

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

no code implementations15 Jun 2020 Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD).

D-square-B: Deep Distribution Bound for Natural-looking Adversarial Attack

no code implementations12 Jun 2020 Qiu-Ling Xu, Guanhong Tao, Xiangyu Zhang

We propose a novel technique that can generate natural-looking adversarial examples by bounding the variations induced for internal activation values in some deep layer(s), through a distribution quantile bound and a polynomial barrier loss function.

Adversarial Attack

Exhaustive goodness-of-fit via smoothed inference and graphics

1 code implementation26 May 2020 Sara Algeri, Xiangyu Zhang

Classical tests of goodness-of-fit aim to validate the conformity of a postulated model to the data under study.

Methodology Statistics Theory Applications Statistics Theory

Joint Multi-Dimension Pruning via Numerical Gradient Update

no code implementations18 May 2020 Zechun Liu, Xiangyu Zhang, Zhiqiang Shen, Zhe Li, Yichen Wei, Kwang-Ting Cheng, Jian Sun

To tackle these three naturally different dimensions, we proposed a general framework by defining pruning as seeking the best pruning vector (i. e., the numerical value of layer-wise channel number, spacial size, depth) and construct a unique mapping from the pruning vector to the pruned network structures.

Angle-based Search Space Shrinking for Neural Architecture Search

1 code implementation ECCV 2020 Yiming Hu, Yuding Liang, Zichao Guo, Ruosi Wan, Xiangyu Zhang, Yichen Wei, Qingyi Gu, Jian Sun

Comprehensive experiments show that ABS can dramatically enhance existing NAS approaches by providing a promising shrunk search space.

Neural Architecture Search

Dynamic Scale Training for Object Detection

4 code implementations26 Apr 2020 Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.

Instance Segmentation Model Optimization +4

Personalized Re-ranking for Improving Diversity in Live Recommender Systems

no code implementations14 Apr 2020 Yichao Wang, Xiangyu Zhang, Zhirong Liu, Zhenhua Dong, Xinhua Feng, Ruiming Tang, Xiuqiang He

To overcome such limitation, our re-ranking model proposes a personalized DPP to model the trade-off between accuracy and diversity for each individual user.

Recommendation Systems Re-Ranking

Attentive Normalization for Conditional Image Generation

1 code implementation CVPR 2020 Yi Wang, Ying-Cong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

Traditional convolution-based generative adversarial networks synthesize images based on hierarchical local operations, where long-range dependency relation is implicitly modeled with a Markov chain.

Conditional Image Generation Semantic correspondence +2

Learning Human-Object Interaction Detection using Interaction Points

1 code implementation CVPR 2020 Tiancai Wang, Tong Yang, Martin Danelljan, Fahad Shahbaz Khan, Xiangyu Zhang, Jian Sun

Human-object interaction (HOI) detection strives to localize both the human and an object as well as the identification of complex interactions between them.

Human-Object Interaction Detection Keypoint Detection +2

Dynamic Region-Aware Convolution

no code implementations CVPR 2021 Jin Chen, Xijun Wang, Zichao Guo, Xiangyu Zhang, Jian Sun

More gracefully, our DRConv transfers the increasing channel-wise filters to spatial dimension with learnable instructor, which not only improve representation ability of convolution, but also maintains computational cost and the translation-invariance as standard convolution dose.

Face Recognition General Classification +2

Learning Dynamic Routing for Semantic Segmentation

1 code implementation CVPR 2020 Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun

To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.

Segmentation Semantic Segmentation

Detection in Crowded Scenes: One Proposal, Multiple Predictions

3 code implementations CVPR 2020 Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, Jian Sun

We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.

Object Detection Pedestrian Detection

PointINS: Point-based Instance Segmentation

no code implementations13 Mar 2020 Lu Qi, Yi Wang, Yukang Chen, Yingcong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features.

Instance Segmentation Object Detection +3

Learning Delicate Local Representations for Multi-Person Pose Estimation

4 code implementations ECCV 2020 Yuanhao Cai, Zhicheng Wang, Zhengxiong Luo, Binyi Yin, Angang Du, Haoqian Wang, Xiangyu Zhang, Xinyu Zhou, Erjin Zhou, Jian Sun

To tackle this problem, we propose an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations.

Keypoint Detection Multi-Person Pose Estimation

Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

no code implementations5 Mar 2020 Xiangyu Zhang, Ramin Bashizade, Yicheng Wang, Cheng Lyu, Sayan Mukherjee, Alvin R. Lebeck

Applying the framework to guide design space exploration shows that statistical robustness comparable to floating-point software can be achieved by slightly increasing the bit representation, without floating-point hardware requirements.

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

1 code implementation ICLR 2020 Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei zhang, Yichen Wei, Jian Sun

Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption.

Learning-Accelerated ADMM for Distributed Optimal Power Flow

no code implementations8 Nov 2019 David Biagioni, Peter Graf, Xiangyu Zhang, Ahmed Zamzam, Kyri Baker, Jennifer King

We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions.

Distributed Optimization

A Case for Quantifying Statistical Robustness of Specialized Probabilistic AI Accelerators

no code implementations27 Oct 2019 Xiangyu Zhang, Sayan Mukherjee, Alvin R. Lebeck

Although a common approach is to compare the end-point result quality using community-standard benchmarks and metrics, we claim a probabilistic architecture should provide some measure (or guarantee) of statistical robustness.

Resizable Neural Networks

no code implementations25 Sep 2019 Yichen Zhu, Xiangyu Zhang, Tong Yang, Jian Sun

We introduce the adaptive resizable networks as dynamic networks, which further improve the performance with less computational cost via data-dependent inference.

Data Augmentation Neural Architecture Search

VAENAS: Sampling Matters in Neural Architecture Search

no code implementations25 Sep 2019 Shizheng Qin, Yichen Zhu, Pengfei Hou, Xiangyu Zhang, Wenqiang Zhang, Jian Sun

In this paper, we propose a learnable sampling module based on variational auto-encoder (VAE) for neural architecture search (NAS), named as VAENAS, which can be easily embedded into existing weight sharing NAS framework, e. g., one-shot approach and gradient-based approach, and significantly improve the performance of searching results.

Neural Architecture Search

Arbitrage of Energy Storage in Electricity Markets with Deep Reinforcement Learning

no code implementations28 Apr 2019 Hanchen Xu, Xiao Li, Xiangyu Zhang, Junbo Zhang

In this letter, we address the problem of controlling energy storage systems (ESSs) for arbitrage in real-time electricity markets under price uncertainty.

reinforcement-learning Reinforcement Learning (RL)

DetNAS: Backbone Search for Object Detection

2 code implementations NeurIPS 2019 Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, Jian Sun

In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection.

General Classification Image Classification +4

Meta-SR: A Magnification-Arbitrary Network for Super-Resolution

2 code implementations CVPR 2019 Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, Jian Sun

In this work, we propose a novel method called Meta-SR to firstly solve super-resolution of arbitrary scale factor (including non-integer scale factors) with a single model.

Image Super-Resolution

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

1 code implementation NeurIPS 2018 Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang

Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9. 91% false positives on benign inputs.

Attribute Face Recognition +1

DetNet: Design Backbone for Object Detection

no code implementations ECCV 2018 Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

(1) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales.

Classification General Classification +7

CrowdHuman: A Benchmark for Detecting Human in a Crowd

1 code implementation30 Apr 2018 Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun

There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.

Ranked #7 on Pedestrian Detection on Caltech (using extra training data)

Human Detection Object Detection +1

DetNet: A Backbone network for Object Detection

2 code implementations17 Apr 2018 Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection.

Classification General Classification +7

ExFuse: Enhancing Feature Fusion for Semantic Segmentation

no code implementations ECCV 2018 Zhenli Zhang, Xiangyu Zhang, Chao Peng, Dazhi Cheng, Jian Sun

Modern semantic segmentation frameworks usually combine low-level and high-level features from pre-trained backbone convolutional models to boost performance.

Ranked #4 on Semantic Segmentation on PASCAL VOC 2012 val (using extra training data)

Segmentation Semantic Segmentation

Light-Head R-CNN: In Defense of Two-Stage Object Detector

5 code implementations20 Nov 2017 Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

More importantly, simply replacing the backbone with a tiny network (e. g, Xception), our Light-Head R-CNN gets 30. 7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.

Vocal Bursts Valence Prediction

MegDet: A Large Mini-Batch Object Detector

6 code implementations CVPR 2018 Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun

The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.

Object object-detection +1

Channel Pruning for Accelerating Very Deep Neural Networks

1 code implementation ICCV 2017 Yihui He, Xiangyu Zhang, Jian Sun

In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction.

regression

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

37 code implementations CVPR 2018 Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e. g., 10-150 MFLOPs).

General Classification Image Classification +2

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

2 code implementations CVPR 2017 Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun

One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e. g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity.

Semantic Segmentation

Identity Mappings in Deep Residual Networks

55 code implementations16 Mar 2016 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors.

Image Classification

Deep Residual Learning for Image Recognition

467 code implementations CVPR 2016 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Domain Generalization +11

Accelerating Very Deep Convolutional Networks for Classification and Detection

no code implementations26 May 2015 Xiangyu Zhang, Jianhua Zou, Kaiming He, Jian Sun

This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community.

Classification General Classification +3

Object Detection Networks on Convolutional Feature Maps

no code implementations23 Apr 2015 Shaoqing Ren, Kaiming He, Ross Girshick, Xiangyu Zhang, Jian Sun

We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier.

General Classification Image Classification +3

Efficient and Accurate Approximations of Nonlinear Convolutional Networks

no code implementations CVPR 2015 Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs).

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

14 code implementations18 Jun 2014 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale.

General Classification Image Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.