AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

no code implementations19 Sep 2023 Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, Rongrong Ji

Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.

Image Generation single-image-generation

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

no code implementations17 Sep 2023 Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan

Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation.

Image-to-Image Translation Translation

DLIP: Distilling Language-Image Pre-training

no code implementations24 Aug 2023 Huafeng Kuang, Jie Wu, Xiawu Zheng, Ming Li, Xuefeng Xiao, Rui Wang, Min Zheng, Rongrong Ji

Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22. 4% parameters and 24. 8% FLOPs compared to the teacher model and accelerates inference speed by 2. 7x.

Image Captioning Knowledge Distillation +5

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

1 code implementation20 Jul 2023 Ming Li, Jie Wu, Xionghui Wang, Chen Chen, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan

To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies.

object-detection Object Detection

When Decentralized Optimization Meets Federated Learning

no code implementations5 Jun 2023 Hongchang Gao, My T. Thai, Jie Wu

Federated learning is a new learning paradigm for extracting knowledge from distributed data.

Federated Learning

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

no code implementations23 May 2023 Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, Liang Lin

A first-frame conditioning strategy is proposed to facilitate the model to generate videos transferred from the image domain as well as arbitrary-length videos in an auto-regressive manner.

Style Transfer Text-to-Video Generation +3

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

no code implementations CVPR 2023 Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang

Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios.

Image Segmentation Instance Segmentation +2

Masked Vision-Language Transformers for Scene Text Recognition

1 code implementation9 Nov 2022 Jie Wu, Ying Peng, Shengming Zhang, Weigang Qi, Jian Zhang

MVLT is trained in two stages: in the first stage, we design a STR-tailored pretraining method based on a masking strategy; in the second stage, we fine-tune our model and adopt an iterative correction method to improve the performance.

Scene Text Recognition

FedVeca: Federated Vectorized Averaging on Non-IID Data with Adaptive Bi-directional Global Objective

no code implementations28 Sep 2022 Ping Luo, Jieren Cheng, Zhenhao Liu, N. Xiong, Jie Wu

However, the clients' Non-Independent and Identically Distributed (Non-IID) data negatively affect the trained model, and clients with different numbers of local updates may cause significant gaps to the local gradients in each communication round.

Federated Learning

Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation

1 code implementation22 Aug 2022 Jie Qin, Jie Wu, Ming Li, Xuefeng Xiao, Min Zheng, Xingang Wang

Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting.

Knowledge Distillation Semi-Supervised Semantic Segmentation

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

no code implementations22 Jun 2022 Ming Li, Jie Wu, Jinhang Cai, Jie Qin, Yuxi Ren, Xuefeng Xiao, Min Zheng, Rui Wang, Xin Pan

Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs.

Instance Segmentation Semantic Segmentation

TRT-ViT: TensorRT-oriented Vision Transformer

no code implementations19 May 2022 Xin Xia, Jiashi Li, Jie Wu, Xing Wang, Xuefeng Xiao, Min Zheng, Rui Wang

We revisit the existing excellent Transformers from the perspective of practical application.

Image Classification object-detection +2

ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer

2 code implementations21 Mar 2022 Rui Yang, Hailong Ma, Jie Wu, Yansong Tang, Xuefeng Xiao, Min Zheng, Xiu Li

The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions.

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

1 code implementation16 Dec 2021 Jie Qin, Jie Wu, Xuefeng Xiao, Lujun Li, Xingang Wang

Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label.

Feature Importance Scene Understanding +2

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

1 code implementation NeurIPS 2021 Shaojie Li, Jie Wu, Xuefeng Xiao, Fei Chao, Xudong Mao, Rongrong Ji

In this work, we revisit the role of discriminator in GAN compression and design a novel generator-discriminator cooperative compression scheme for GAN compression, termed GCC.

Online Multi-Granularity Distillation for GAN Compression

1 code implementation ICCV 2021 Yuxi Ren, Jie Wu, Xuefeng Xiao, Jianchao Yang

It reveals that OMGD provides a feasible solution for the deployment of real-time image translation on resource-constrained devices.


Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

no code implementations9 Aug 2021 Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.

Anomaly Detection

Spoken Language Understanding for Task-oriented Dialogue Systems with Augmented Memory Networks

no code implementations NAACL 2021 Jie Wu, Ian Harris, Hongzhi Zhao

We adopt a key-value memory network to model slot context dynamically and to track more important slot tags decoded before, which are then fed into our decoder for slot tagging.

Intent Detection slot-filling +3

Robust Sequence Submodular Maximization

no code implementations NeurIPS 2020 Gamal Sallam, Zizhan Zheng, Jie Wu, Bo Ji

Compared to robust submodular maximization for set function, new challenges arise when sequence functions are concerned.

Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos

no code implementations18 Sep 2020 Jie Wu, Guanbin Li, Xiaoguang Han, Liang Lin

Temporal grounding of natural language in untrimmed videos is a fundamental yet challenging multimedia task facilitating cross-media visual content retrieval.

reinforcement-learning Reinforcement Learning (RL) +2

Fine-Grained Image Captioning with Global-Local Discriminative Objective

1 code implementation21 Jul 2020 Jie Wu, Tianshui Chen, Hefeng Wu, Zhi Yang, Guangchun Luo, Liang Lin

This is primarily due to (i) the conservative characteristic of traditional training objectives that drives the model to generate correct but hardly discriminative captions for similar images and (ii) the uneven word distribution of the ground-truth captions, which encourages generating highly frequent words/phrases while suppressing the less frequent but more concrete ones.

Descriptive Image Captioning +1

Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

no code implementations18 Jun 2020 Jie Wu, Jian Luan

This paper presents a high quality singing synthesizer that is able to model a voice with limited available recordings.

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

no code implementations11 Jun 2020 Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou

This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling.

Singing Voice Synthesis Vocal Bursts Intensity Prediction

PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs

no code implementations30 Dec 2019 Jie Wu, Ying Peng, Chenghao Zheng, Zongbo Hao, Jian Zhang

Recently, generative adversarial networks (GANs) have shown great advantages in synthesizing images, leading to a boost of explorations of using faked images to augment data.

Data Augmentation Pedestrian Detection

Data-driven prediction of vortex-induced vibration response of marine risers subjected to three-dimensional current

no code implementations24 Jun 2019 Signe Riemer-Sørensen, Jie Wu, Halvor Lie, Svein Sævik, Sang-Woo Kim

The load model and hydrodynamic parameters in present VIV prediction tools are developed based on two-dimensional (2D) flow conditions, as it is challenging to consider the effect of 3D flow along the risers.


Hand Gesture Recognition with Leap Motion

no code implementations12 Nov 2017 Youchen Du, Shenglan Liu, Lin Feng, Menghui Chen, Jie Wu

The recent introduction of depth cameras like Leap Motion Controller allows researchers to exploit the depth information to recognize hand gesture more robustly.

Dimensionality Reduction Hand Gesture Recognition +1

Polarimetric Hierarchical Semantic Model and Scattering Mechanism Based PolSAR Image Classification

no code implementations1 Jul 2015 Fang Liu, Junfei Shi, Licheng Jiao, Hongying Liu, Shuyuan Yang, Jie Wu, Hongxia Hao, Jialing Yuan

For polarimetric SAR (PolSAR) image classification, it is a challenge to classify the aggregated terrain types, such as the urban area, into semantic homogenous regions due to sharp bright-dark variations in intensity.

General Classification Image Classification

Room-temperature implementation of the Deutsch-Jozsa algorithm with a single electronic spin in diamond

no code implementations12 Feb 2010 Fazhan Shi, Xing Rong, Nanyang Xu, Ya Wang, Jie Wu, Bo Chong, Xinhua Peng, Juliane Kniepert, Rolf-Simon Schoenfeld, Wolfgang Harneit, Mang Feng, Jiangfeng Du

The nitrogen-vacancy defect center (NV center) is a promising candidate for quantum information processing due to the possibility of coherent manipulation of individual spins in the absence of the cryogenic requirement.

Quantum Physics

