no code implementations • 28 Nov 2024 • Siqi Kou, Jiachun Jin, Chang Liu, Ye Ma, Jian Jia, Quan Chen, Peng Jiang, Zhijie Deng
We introduce Orthus, an autoregressive (AR) transformer that excels in generating images given textual prompts, answering questions based on visual inputs, and even crafting lengthy image-text interleaved contents.
no code implementations • 26 Nov 2024 • Peng Cui, Guande He, Dan Zhang, Zhijie Deng, Yinpeng Dong, Jun Zhu
Datasets collected from the open world unavoidably suffer from various forms of randomness or noiseness, leading to the ubiquity of aleatoric (data) uncertainty.
no code implementations • 24 Oct 2024 • Yibo Miao, Bofei Gao, Shanghaoran Quan, Junyang Lin, Daoguang Zan, Jiaheng Liu, Jian Yang, Tianyu Liu, Zhijie Deng
We also contribute a pipeline for collecting preference pairs for DPO on CodeLLMs.
no code implementations • 16 Oct 2024 • Bokai Lin, Zihao Zeng, Zipeng Xiao, Siqi Kou, Tianqi Hou, Xiaofeng Gao, Hao Zhang, Zhijie Deng
We empirically witness the high data efficiency of our training procedure and find that our method can sustain over 90% performance with an average KV cache compression rate of 60% (and up to 75% in certain extreme scenarios) for popular LLMs like LLaMA2-7B-base and Mistral-7B-v0. 3-base.
1 code implementation • 16 Oct 2024 • Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin
With the rapid progress of diffusion-based content generation, significant efforts are being made to unlearn harmful or copyrighted concepts from pretrained diffusion models (DMs) to prevent potential model misuse.
no code implementations • 15 Oct 2024 • Zihao Zeng, Bokai Lin, Tianqi Hou, Hao Zhang, Zhijie Deng
In supervised fine-tuning, it not only evicts many tokens but also outperforms LoRA-finetuned LLMs on some datasets, such as RTE, where it improves accuracy by 13. 9% while evicting 62. 8% of tokens, showing that effective eviction of redundant tokens can even enhance performance.
1 code implementation • 1 Aug 2024 • Juntu Zhao, Junyu Deng, Yixin Ye, Chongxuan Li, Zhijie Deng, Dequan Wang
The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis).
1 code implementation • 24 Jun 2024 • Zhenyi Liao, Qingsong Xie, Chen Chen, Hannan Lu, Zhijie Deng
Targeting addressing such an issue, we first assess the face quality of generations from popular pre-trained DMs with the aid of human annotators and then evaluate the alignment between existing metrics with human judgments.
no code implementations • 20 Jun 2024 • Xiaoxuan Liu, Cade Daniel, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
SmartSpec dynamically determines the best speculation length for each request (from 0, i. e., no speculation, to many tokens) -- hence the associated speculative execution costs -- based on a new metric called goodput, which characterizes the current observed load of the entire system and the speculation accuracy.
1 code implementation • 19 Jun 2024 • Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng
In this sense, we introduce AdaMoE to realize token-adaptive routing for MoE, where different tokens are permitted to select a various number of experts.
no code implementations • 11 Jun 2024 • Yuzi Yan, Yibo Miao, Jialian Li, Yipin Zhang, Jian Xie, Zhijie Deng, Dong Yan
Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples.
1 code implementation • 9 Jun 2024 • Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen Chen, Haonan Lu
Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.
2 code implementations • 5 Jun 2024 • Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He
Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy.
no code implementations • 3 Mar 2024 • Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu
In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.
1 code implementation • 28 Feb 2024 • Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang
Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.
no code implementations • CVPR 2024 • Yibo Miao, Yu Lei, Feng Zhou, Zhijie Deng
Low-shot image classification is a fundamental task in computer vision and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field.
1 code implementation • 19 Oct 2023 • Zipeng Xiao, Zhongkai Hao, Bokai Lin, Zhijie Deng, Hang Su
Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning.
1 code implementation • 17 Oct 2023 • Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng
In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference.
1 code implementation • 15 Oct 2023 • Zhenyi Liao, Zhijie Deng
Leveraging pre-trained conditional diffusion models for video editing without further tuning has gained increasing attention due to its promise in film production, advertising, etc.
1 code implementation • 11 Oct 2023 • Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
Adapting to query distribution mitigates the shifts between the training distribution of the draft model and the query distribution, enabling the draft model to more accurately predict the target model's outputs.
1 code implementation • 29 Aug 2023 • Feng Zhou, Quyu Kong, Zhijie Deng, Fengxiang He, Peng Cui, Jun Zhu
This paper presents a novel extension of multi-task Gaussian Cox processes for modeling multiple heterogeneous correlated tasks jointly, e. g., classification and regression, via multi-output Gaussian processes (MOGP).
no code implementations • 16 Jun 2023 • Hongcheng Gao, Hao Zhang, Yinpeng Dong, Zhijie Deng
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.
no code implementations • 26 May 2023 • Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng
The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse.
no code implementations • ICCV 2023 • Zhijie Deng, Yucen Luo
Unsupervised semantic segmentation is a long-standing challenge in computer vision with great significance.
1 code implementation • NeurIPS 2023 • Tianyu Pang, Cheng Lu, Chao Du, Min Lin, Shuicheng Yan, Zhijie Deng
In this work, we observe that the stochastic reverse process of data scores is a martingale, from which concentration bounds and the optional stopping theorem for data scores can be derived.
no code implementations • 10 Feb 2023 • Peng Cui, Yang Yue, Zhijie Deng, Jun Zhu
Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks, where massive labeled images are routinely required for model optimization.
1 code implementation • 23 Oct 2022 • Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu
Unlike prior spectral methods such as Laplacian Eigenmap that operate in a nonparametric manner, Neural Eigenmap leverages NeuralEF to parametrically model eigenfunctions using a neural network.
1 code implementation • 23 Oct 2022 • Zhijie Deng, Feng Zhou, Jun Zhu
Laplace approximation (LA) and its linearized variant (LLA) enable effortless adaptation of pretrained deep neural networks to Bayesian neural networks.
no code implementations • 30 Apr 2022 • Zhijie Deng, Feng Zhou, Jianfei Chen, Guoqiang Wu, Jun Zhu
In this way, we relate DE to Bayesian inference to enjoy reliable Bayesian uncertainty.
2 code implementations • 30 Apr 2022 • Zhijie Deng, Jiaxin Shi, Jun Zhu
Learning the principal eigenfunctions of an integral operator defined by a kernel and a data distribution is at the core of many machine learning problems.
no code implementations • 29 Sep 2021 • Zhijie Deng, Feng Zhou, Jianfei Chen, Guoqiang Wu, Jun Zhu
Deep Ensemble (DE) is a flexible, feasible, and effective alternative to Bayesian neural networks (BNNs) for uncertainty estimation in deep learning.
1 code implementation • ICLR 2022 • Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, Jun Zhu
In this paper, we explore the memorization effect in adversarial training (AT) for promoting a deeper understanding of model capacity, convergence, generalization, and especially robust overfitting of the adversarially trained models.
no code implementations • 28 Mar 2021 • Peng Cui, Zhijie Deng, WenBo Hu, Jun Zhu
It is critical yet challenging for deep learning models to properly characterize uncertainty that is pervasive in real-world environments.
1 code implementation • CVPR 2021 • Zhijie Deng, Xiao Yang, Shizhen Xu, Hang Su, Jun Zhu
Despite their appealing flexibility, deep neural networks (DNNs) are vulnerable against adversarial examples.
no code implementations • ICCV 2021 • Yinpeng Dong, Xiao Yang, Zhijie Deng, Tianyu Pang, Zihao Xiao, Hang Su, Jun Zhu
Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments.
no code implementations • NeurIPS 2020 • Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, Eric Xing
Synchronization is a key step in data-parallel distributed machine learning (ML).
1 code implementation • NeurIPS 2020 • Zhijie Deng, Yinpeng Dong, Shifeng Zhang, Jun Zhu
In this work, we decouple the training of a network with stochastic architectures (NSA) from NAS and provide a first systematical investigation on it as a stand-alone problem.
1 code implementation • 5 Oct 2020 • Zhijie Deng, Jun Zhu
Despite their theoretical appealingness, Bayesian neural networks (BNNs) are left behind in real-world adoption, mainly due to persistent concerns on their scalability, accessibility, and reliability.
no code implementations • 28 Sep 2020 • Zhijie Deng, Xiao Yang, Hao Zhang, Yinpeng Dong, Jun Zhu
Despite their theoretical appealingness, Bayesian neural networks (BNNs) are falling far behind in terms of adoption in real-world applications compared with normal NNs, mainly due to their limited scalability in training, and low fidelity in their uncertainty estimates.
1 code implementation • NeurIPS 2020 • Yinpeng Dong, Zhijie Deng, Tianyu Pang, Hang Su, Jun Zhu
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
1 code implementation • 22 Nov 2019 • Zhijie Deng, Yucen Luo, Jun Zhu, Bo Zhang
Bayesian neural networks (BNNs) augment deep networks with uncertainty quantification by Bayesian treatment of the network weights.
1 code implementation • 25 Sep 2019 • Zhijie Deng, Yucen Luo, Jun Zhu, Bo Zhang
Bayesian neural networks (BNNs) introduce uncertainty estimation to deep networks by performing Bayesian inference on network weights.
1 code implementation • ICCV 2019 • Zhijie Deng, Yucen Luo, Jun Zhu
Deep learning methods have shown promise in unsupervised domain adaptation, which aims to leverage a labeled source domain to learn a classifier for the unlabeled target domain with a different distribution.
Ranked #3 on Domain Adaptation on SVNH-to-MNIST
no code implementations • 25 Feb 2019 • Zhijie Deng, Yinpeng Dong, Jun Zhu
We present batch virtual adversarial training (BVAT), a novel regularization method for graph convolutional networks (GCNs).
1 code implementation • NeurIPS 2017 • Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, Eric P. Xing
We study the problem of conditional generative modeling based on designated semantics or structures.