Search Results for author: Zhijie Deng

Found 45 papers, 25 papers with code

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads

no code implementations28 Nov 2024 Siqi Kou, Jiachun Jin, Chang Liu, Ye Ma, Jian Jia, Quan Chen, Peng Jiang, Zhijie Deng

We introduce Orthus, an autoregressive (AR) transformer that excels in generating images given textual prompts, answering questions based on visual inputs, and even crafting lengthy image-text interleaved contents.

Language Modelling Quantization +1

Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models

no code implementations26 Nov 2024 Peng Cui, Guande He, Dan Zhang, Zhijie Deng, Yinpeng Dong, Jun Zhu

Datasets collected from the open world unavoidably suffer from various forms of randomness or noiseness, leading to the ubiquity of aleatoric (data) uncertainty.

Object object-detection +1

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

no code implementations16 Oct 2024 Bokai Lin, Zihao Zeng, Zipeng Xiao, Siqi Kou, Tianqi Hou, Xiaofeng Gao, Hao Zhang, Zhijie Deng

We empirically witness the high data efficiency of our training procedure and find that our method can sustain over 90% performance with an average KV cache compression rate of 60% (and up to 75% in certain extreme scenarios) for popular LLMs like LLaMA2-7B-base and Mistral-7B-v0. 3-base.

Data Compression

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts

1 code implementation16 Oct 2024 Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin

With the rapid progress of diffusion-based content generation, significant efforts are being made to unlearn harmful or copyrighted concepts from pretrained diffusion models (DMs) to prevent potential model misuse.

In-context KV-Cache Eviction for LLMs via Attention-Gate

no code implementations15 Oct 2024 Zihao Zeng, Bokai Lin, Tianqi Hou, Hao Zhang, Zhijie Deng

In supervised fine-tuning, it not only evicts many tokens but also outperforms LoRA-finetuned LLMs on some datasets, such as RTE, where it improves accuracy by 13. 9% while evicting 62. 8% of tokens, showing that effective eviction of redundant tokens can even enhance performance.

RTE

Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

1 code implementation1 Aug 2024 Juntu Zhao, Junyu Deng, Yixin Ye, Chongxuan Li, Zhijie Deng, Dequan Wang

The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis).

FaceScore: Benchmarking and Enhancing Face Quality in Human Generation

1 code implementation24 Jun 2024 Zhenyi Liao, Qingsong Xie, Chen Chen, Hannan Lu, Zhijie Deng

Targeting addressing such an issue, we first assess the face quality of generations from popular pre-trained DMs with the aid of human annotators and then evaluate the alignment between existing metrics with human judgments.

Benchmarking Denoising +2

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

no code implementations20 Jun 2024 Xiaoxuan Liu, Cade Daniel, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang

SmartSpec dynamically determines the best speculation length for each request (from 0, i. e., no speculation, to many tokens) -- hence the associated speculative execution costs -- based on a new metric called goodput, which characterizes the current observed load of the entire system and the speculation accuracy.

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

1 code implementation19 Jun 2024 Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

In this sense, we introduce AdaMoE to realize token-adaptive routing for MoE, where different tokens are permitted to select a various number of experts.

ARC

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

no code implementations11 Jun 2024 Yuzi Yan, Yibo Miao, Jialian Li, Yipin Zhang, Jian Xie, Zhijie Deng, Dong Yan

Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples.

Instruction Following Mathematical Problem-Solving

TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

1 code implementation9 Jun 2024 Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen Chen, Haonan Lu

Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.

Image Generation Style Transfer

SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

2 code implementations5 Jun 2024 Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He

Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy.

SST-2

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

no code implementations3 Mar 2024 Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu

In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.

Text-to-Image Generation

CLLMs: Consistency Large Language Models

1 code implementation28 Feb 2024 Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

no code implementations CVPR 2024 Yibo Miao, Yu Lei, Feng Zhou, Zhijie Deng

Low-shot image classification is a fundamental task in computer vision and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field.

Gaussian Processes Image Classification +1

Improved Operator Learning by Orthogonal Attention

1 code implementation19 Oct 2023 Zipeng Xiao, Zhongkai Hao, Bokai Lin, Zhijie Deng, Hang Su

Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning.

Operator learning

BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

1 code implementation17 Oct 2023 Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng

In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference.

Bayesian Inference Image Generation

LOVECon: Text-driven Training-Free Long Video Editing with ControlNet

1 code implementation15 Oct 2023 Zhenyi Liao, Zhijie Deng

Leveraging pre-trained conditional diffusion models for video editing without further tuning has gained increasing attention due to its promise in film production, advertising, etc.

Style Transfer Video Editing +1

Online Speculative Decoding

1 code implementation11 Oct 2023 Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang

Adapting to query distribution mitigates the shifts between the training distribution of the draft model and the query distribution, enabling the draft model to more accurately predict the target model's outputs.

Knowledge Distillation

Heterogeneous Multi-Task Gaussian Cox Processes

1 code implementation29 Aug 2023 Feng Zhou, Quyu Kong, Zhijie Deng, Fengxiang He, Peng Cui, Jun Zhu

This paper presents a novel extension of multi-task Gaussian Cox processes for modeling multiple heterogeneous correlated tasks jointly, e. g., classification and regression, via multi-output Gaussian processes (MOGP).

Bayesian Inference Data Augmentation +2

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

no code implementations16 Jun 2023 Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng

Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.

Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

no code implementations26 May 2023 Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse.

On Calibrating Diffusion Probabilistic Models

1 code implementation NeurIPS 2023 Tianyu Pang, Cheng Lu, Chao Du, Min Lin, Shuicheng Yan, Zhijie Deng

In this work, we observe that the stochastic reverse process of data scores is a martingale, from which concentration bounds and the optional stopping theorem for data scores can be derived.

Confidence-based Reliable Learning under Dual Noises

no code implementations10 Feb 2023 Peng Cui, Yang Yue, Zhijie Deng, Jun Zhu

Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks, where massive labeled images are routinely required for model optimization.

Model Optimization

Neural Eigenfunctions Are Structured Representation Learners

1 code implementation23 Oct 2022 Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu

Unlike prior spectral methods such as Laplacian Eigenmap that operate in a nonparametric manner, Neural Eigenmap leverages NeuralEF to parametrically model eigenfunctions using a neural network.

Contrastive Learning Data Augmentation +7

Accelerated Linearized Laplace Approximation for Bayesian Deep Learning

1 code implementation23 Oct 2022 Zhijie Deng, Feng Zhou, Jun Zhu

Laplace approximation (LA) and its linearized variant (LLA) enable effortless adaptation of pretrained deep neural networks to Bayesian neural networks.

Deep Learning

NeuralEF: Deconstructing Kernels by Deep Neural Networks

2 code implementations30 Apr 2022 Zhijie Deng, Jiaxin Shi, Jun Zhu

Learning the principal eigenfunctions of an integral operator defined by a kernel and a data distribution is at the core of many machine learning problems.

Image Classification

Deep Ensemble as a Gaussian Process Posterior

no code implementations29 Sep 2021 Zhijie Deng, Feng Zhou, Jianfei Chen, Guoqiang Wu, Jun Zhu

Deep Ensemble (DE) is a flexible, feasible, and effective alternative to Bayesian neural networks (BNNs) for uncertainty estimation in deep learning.

Variational Inference

Exploring Memorization in Adversarial Training

1 code implementation ICLR 2022 Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, Jun Zhu

In this paper, we explore the memorization effect in adversarial training (AT) for promoting a deeper understanding of model capacity, convergence, generalization, and especially robust overfitting of the adversarially trained models.

Memorization

Accurate and Reliable Forecasting using Stochastic Differential Equations

no code implementations28 Mar 2021 Peng Cui, Zhijie Deng, WenBo Hu, Jun Zhu

It is critical yet challenging for deep learning models to properly characterize uncertainty that is pervasive in real-world environments.

Prediction Intervals Uncertainty Quantification

Black-box Detection of Backdoor Attacks with Limited Information and Data

no code implementations ICCV 2021 Yinpeng Dong, Xiao Yang, Zhijie Deng, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments.

Understanding and Exploring the Network with Stochastic Architectures

1 code implementation NeurIPS 2020 Zhijie Deng, Yinpeng Dong, Shifeng Zhang, Jun Zhu

In this work, we decouple the training of a network with stochastic architectures (NSA) from NAS and provide a first systematical investigation on it as a stand-alone problem.

Neural Architecture Search

BayesAdapter: Being Bayesian, Inexpensively and Reliably, via Bayesian Fine-tuning

1 code implementation5 Oct 2020 Zhijie Deng, Jun Zhu

Despite their theoretical appealingness, Bayesian neural networks (BNNs) are left behind in real-world adoption, mainly due to persistent concerns on their scalability, accessibility, and reliability.

Variational Inference

BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayesian Fine-tuning

no code implementations28 Sep 2020 Zhijie Deng, Xiao Yang, Hao Zhang, Yinpeng Dong, Jun Zhu

Despite their theoretical appealingness, Bayesian neural networks (BNNs) are falling far behind in terms of adoption in real-world applications compared with normal NNs, mainly due to their limited scalability in training, and low fidelity in their uncertainty estimates.

Uncertainty Quantification Variational Inference

Adversarial Distributional Training for Robust Deep Learning

1 code implementation NeurIPS 2020 Yinpeng Dong, Zhijie Deng, Tianyu Pang, Hang Su, Jun Zhu

Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.

Deep Learning

Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structure

1 code implementation22 Nov 2019 Zhijie Deng, Yucen Luo, Jun Zhu, Bo Zhang

Bayesian neural networks (BNNs) augment deep networks with uncertainty quantification by Bayesian treatment of the network weights.

Bayesian Inference Neural Architecture Search +2

Deep Bayesian Structure Networks

1 code implementation25 Sep 2019 Zhijie Deng, Yucen Luo, Jun Zhu, Bo Zhang

Bayesian neural networks (BNNs) introduce uncertainty estimation to deep networks by performing Bayesian inference on network weights.

Bayesian Inference Neural Architecture Search +1

Cluster Alignment with a Teacher for Unsupervised Domain Adaptation

1 code implementation ICCV 2019 Zhijie Deng, Yucen Luo, Jun Zhu

Deep learning methods have shown promise in unsupervised domain adaptation, which aims to leverage a labeled source domain to learn a classifier for the unlabeled target domain with a different distribution.

Clustering Unsupervised Domain Adaptation

Batch Virtual Adversarial Training for Graph Convolutional Networks

no code implementations25 Feb 2019 Zhijie Deng, Yinpeng Dong, Jun Zhu

We present batch virtual adversarial training (BVAT), a novel regularization method for graph convolutional networks (GCNs).

General Classification Node Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.