Search Results for author: Daquan Zhou

Found 37 papers, 24 papers with code

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

1 code implementation2 May 2024 Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.

motion prediction Story Generation +1

Chain of Thought Explanation for Dialogue State Tracking

no code implementations7 Mar 2024 Lin Xu, Ningxin Peng, Daquan Zhou, See-Kiong Ng, Jinlan Fu

Dialogue state tracking (DST) aims to record user queries and goals during a conversational interaction achieved by maintaining a predefined set of slots and their corresponding values.

Dialogue State Tracking

Sora Generates Videos with Stunning Geometrical Consistency

no code implementations27 Feb 2024 XuanYi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou, Ming-Ming Cheng

We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality.

3D Reconstruction Video Generation

Magic-Me: Identity-Specific Video Customized Diffusion

1 code implementation14 Feb 2024 Ze Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng

To achieve this, we propose three novel components that are essential for high-quality identity preservation and stable video generation: 1) a noise initialization method with 3D Gaussian Noise Prior for better inter-frame stability; 2) an ID module based on extended Textual Inversion trained with the cropped identity to disentangle the ID information from the background 3) Face VCD and Tiled VCD modules to reinforce faces and upscale the video to higher resolution while preserving the identity's features.

Text-to-Image Generation Video Generation

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

no code implementations9 Jan 2024 Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.

MORPH Video Generation

Factorization Vision Transformer: Modeling Long Range Dependency with Local Window Cost

1 code implementation14 Dec 2023 Haolin Qin, Daquan Zhou, Tingfa Xu, Ziyang Bian, Jianan Li

Accordingly, we propose a novel factorization self-attention mechanism (FaSA) that enjoys both the advantages of local window cost and long-range dependency modeling capability.

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

1 code implementation14 Nov 2023 Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng

Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory.

Benchmarking Language Modelling +1

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome

no code implementations12 Nov 2023 Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer

On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost.

Model Compression Neural Architecture Search +1

ChatAnything: Facetime Chat with LLM-Enhanced Personas

no code implementations12 Nov 2023 Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou

For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.

In-Context Learning Novel Concepts +2

Low-Resolution Self-Attention for Semantic Segmentation

no code implementations8 Oct 2023 Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction.

Decoder Segmentation +1

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

no code implementations8 Sep 2023 Yupeng Zhou, Daquan Zhou, Zuo-Liang Zhu, Yaxing Wang, Qibin Hou, Jiashi Feng

In this work, we identify that a crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning between the prompt and the output image.

Dataset Quantization

1 code implementation ICCV 2023 Daquan Zhou, Kai Wang, Jianyang Gu, Xiangyu Peng, Dongze Lian, Yifan Zhang, Yang You, Jiashi Feng

Extensive experiments demonstrate that DQ is able to generate condensed small datasets for training unseen network architectures with state-of-the-art compression ratios for lossless model training.

object-detection Object Detection +2

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

1 code implementation17 Jul 2023 Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.

Instruction Following Sentence +1

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

1 code implementation ICCV 2023 Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li

This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains.

Efficient Diffusion Personalization

DiM: Distilling Dataset into Generative Model

2 code implementations8 Mar 2023 Kai Wang, Jianyang Gu, Daquan Zhou, Zheng Zhu, Wei Jiang, Yang You

To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75. 1\% with ResNet-18 and 72. 6\% with ConvNet-3 on ten images per class of CIFAR-10.

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

1 code implementation8 Mar 2023 Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning.

Semantic Segmentation

Diffusion Probabilistic Model Made Slim

no code implementations CVPR 2023 Xingyi Yang, Daquan Zhou, Jiashi Feng, Xinchao Wang

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms.

Image Generation Unconditional Image Generation

Expanding Small-Scale Datasets with Guided Imagination

1 code implementation NeurIPS 2023 Yifan Zhang, Daquan Zhou, Bryan Hooi, Kai Wang, Jiashi Feng

Specifically, GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model, resulting in the creation of photo-realistic images with new content.

MagicVideo: Efficient Video Generation With Latent Diffusion Models

no code implementations20 Nov 2022 Daquan Zhou, Weimin WANG, Hanshu Yan, Weiwei Lv, Yizhe Zhu, Jiashi Feng

In specific, unlike existing works that directly train video models in the RGB space, we use a pre-trained VAE to map video clips into a low-dimensional latent space and learn the distribution of videos' latent codes via a diffusion model.

Text-to-Video Generation Video Generation

MagicMix: Semantic Mixing with Diffusion Models

2 code implementations28 Oct 2022 Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

Unlike style transfer, where an image is stylized according to the reference style without changing the image content, semantic blending mixes two different concepts in a semantic manner to synthesize a novel concept while preserving the spatial layout and geometry.

Denoising Style Transfer

Deep Model Reassembly

1 code implementation24 Oct 2022 Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, Xinchao Wang

Given a collection of heterogeneous models pre-trained from distinct sources and with diverse architectures, the goal of DeRy, as its name implies, is to first dissect each model into distinctive building blocks, and then selectively reassemble the derived blocks to produce customized networks under both the hardware resource and performance constraints.

Transfer Learning

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

1 code implementation17 Oct 2022 Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang

With the proposed SSF, our model obtains 2. 46% (90. 72% vs. 88. 54%) and 11. 48% (73. 10% vs. 65. 57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0. 3M parameters.

Image Classification

Sharpness-Aware Training for Free

1 code implementation27 May 2022 Jiawei Du, Daquan Zhou, Jiashi Feng, Vincent Y. F. Tan, Joey Tianyi Zhou

Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights.

Understanding The Robustness in Vision Transformers

2 code implementations26 Apr 2022 Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez

Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations.

Ranked #4 on Domain Generalization on ImageNet-R (using extra training data)

Domain Generalization Image Classification +3

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

no code implementations11 Apr 2022 Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez

In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs.

3D Object Detection object-detection +1

Shunted Self-Attention via Multi-Scale Token Aggregation

1 code implementation CVPR 2022 Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang

This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost.

Refiner: Refining Self-attention for Vision Transformers

1 code implementation7 Jun 2021 Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.

Image Classification

DeepViT: Towards Deeper Vision Transformer

5 code implementations22 Mar 2021 Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng

In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.

Image Classification Representation Learning

AutoSpace: Neural Architecture Search with Less Human Interference

1 code implementation ICCV 2021 Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng

Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.

Neural Architecture Search

Coordinate Attention for Efficient Mobile Network Design

2 code implementations CVPR 2021 Qibin Hou, Daquan Zhou, Jiashi Feng

Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e. g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps.

object-detection Object Detection +1

ConvBERT: Improving BERT with Span-based Dynamic Convolution

7 code implementations NeurIPS 2020 Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.

Natural Language Understanding

Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks

no code implementations2 Jul 2020 Jibin Wu, Cheng-Lin Xu, Daquan Zhou, Haizhou Li, Kay Chen Tan

In this paper, we propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition, which is referred to as progressive tandem learning of deep SNNs.

Computational Efficiency Image Reconstruction +2

PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment

5 code implementations ICCV 2019 Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, Jiashi Feng

In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set.

Few-Shot Semantic Segmentation Metric Learning +2

Neural Epitome Search for Architecture-Agnostic Network Compression

no code implementations ICLR 2020 Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng

The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs).

Model Compression Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.