Search Results for author: Zhenyu Zhang

Found 115 papers, 56 papers with code

Enhancing Chinese Pre-trained Language Model via Heterogeneous Linguistics Graph

3 code implementations ACL 2022 Yanzeng Li, Jiangxia Cao, Xin Cong, Zhenyu Zhang, Bowen Yu, Hongsong Zhu, Tingwen Liu

Chinese pre-trained language models usually exploit contextual character information to learn representations, while ignoring the linguistics knowledge, e. g., word and sentence information.

Language Modeling Language Modelling +1

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

1 code implementation7 Apr 2025 Runjin Chen, Zhenyu Zhang, Junyuan Hong, Souvik Kundu, Zhangyang Wang

To address this issue, we investigate the internal reasoning structures of LLMs and categorize them into three primary thought types: execution, reflection, and transition thoughts.

GSM8K

Cross-Frame OTFS Parameter Estimation Based On Chinese Remainder Theorem

no code implementations7 Apr 2025 Zhenyu Zhang, Qianli Wang, Gang Liu, Feifei Gao, Pingzhi Fan

By designing co-prime numbers of subcarriers and time slots in different subframes, the difference in the responses of the subframes for a target can be used to estimate the distance and velocity of an out-of-range target.

ISAC parameter estimation

Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance

no code implementations28 Mar 2025 Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, Jian Yang

However, they often face challenges with temporal consistency, particularly in the talking head domain, where continuous changes in facial expressions intensify the level of difficulty.

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

1 code implementation23 Mar 2025 Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu

Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses.

Time-EAPCR: A Deep Learning-Based Novel Approach for Anomaly Detection Applied to the Environmental Field

no code implementations12 Mar 2025 Lei Liu, Yuchao Lu, Ling An, Huajie Liang, ChiChun Zhou, Zhenyu Zhang

As human activities intensify, environmental systems such as aquatic ecosystems and water treatment systems face increasingly complex pressures, impacting ecological balance, public health, and sustainable development, making intelligent anomaly monitoring essential.

Anomaly Detection

Inorganic Catalyst Efficiency Prediction Based on EAPCR Model: A Deep Learning Solution for Multi-Source Heterogeneous Data

no code implementations10 Mar 2025 Zhangdi Liu, Ling An, Mengke Song, Zhuohang Yu, Shan Wang, Kezhen Qi, Zhenyu Zhang, ChiChun Zhou

The design of inorganic catalysts and the prediction of their catalytic efficiency are fundamental challenges in chemistry and materials science.

Deep Learning

Unsupervised Waste Classification By Dual-Encoder Contrastive Learning and Multi-Clustering Voting (DECMCV)

no code implementations4 Mar 2025 Kui Huang, Mengke Song, Shuo Ba, Ling An, Huajie Liang, Huanxi Deng, Yang Liu, Zhenyu Zhang, ChiChun Zhou

On a real-world dataset of 4, 169 waste images, only 50 labeled samples were needed to accurately label thousands, improving classification accuracy by 29. 85% compared to supervised models.

Classification Contrastive Learning +1

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

1 code implementation24 Feb 2025 Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang, Shiwei Liu

This paper comprehensively evaluates several recently proposed optimizers for 4-bit training, revealing that low-bit precision amplifies sensitivity to learning rates and often causes unstable gradient norms, leading to divergence at higher learning rates.

Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking

no code implementations19 Feb 2025 Yilong Chen, Junyuan Shang, Zhenyu Zhang, Yanxi Xie, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang

Large language models (LLMs) face inherent performance bottlenecks under parameter constraints, particularly in processing critical tokens that demand complex reasoning.

BeamLoRA: Beam-Constraint Low-Rank Adaptation

no code implementations19 Feb 2025 Naibin Gu, Zhenyu Zhang, Xiyu Liu, Peng Fu, Zheng Lin, Shuohuan Wang, Yu Sun, Hua Wu, Weiping Wang, Haifeng Wang

Due to the demand for efficient fine-tuning of large language models, Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.

Code Generation Math +1

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

1 code implementation11 Feb 2025 Xialie Zhuang, Zhikai Jia, Jianjin Li, Zhenyu Zhang, Li Shen, Zheng Cao, Shiwei Liu

To address this, we propose Mask-Enhanced Autoregressive Prediction (MEAP), a simple yet effective training paradigm that seamlessly integrates Masked Language Modeling (MLM) into Next-Token Prediction (NTP) to enhance the latter's in-context retrieval capabilities.

Decoder Information Retrieval +4

EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy

no code implementations2 Jan 2025 Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang

In this way, the proposed method tackles the limitation on initialization and optimization, leading to an efficient and accurate 3DGS modeling.

3DGS Novel View Synthesis

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

1 code implementation20 Dec 2024 Xiantao Hu, Ying Tai, Xu Zhao, Chen Zhao, Zhenyu Zhang, Jun Li, Bineng Zhong, Jian Yang

These temporal information tokens are used to guide the localization of the target in the next time state, establish long-range contextual relationships between video frames, and capture the temporal trajectory of the target.

Mamba Rgb-T Tracking +1

StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors

1 code implementation16 Dec 2024 Xiaokun Sun, Zeyu Cai, Ying Tai, Jian Yang, Zhenyu Zhang

We propose StrandHead, a novel text to 3D head avatar generation method capable of generating disentangled 3D hair with strand representation.

Diversity Text to 3D

OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation

no code implementations12 Dec 2024 Weiqi Li, Shijie Zhao, Chong Mou, Xuhan Sheng, Zhenyu Zhang, Qian Wang, Junlin Li, Li Zhang, Jian Zhang

As virtual reality gains popularity, the demand for controllable creation of immersive and dynamic omnidirectional videos (ODVs) is increasing.

Image to Video Generation

Learning to Decouple the Lights for 3D Face Texture Modeling

no code implementations11 Dec 2024 Tianxin Huang, Zhenyu Zhang, Ying Tai, Gim Hee Lee

According to experiments on both single images and video sequences, we demonstrate the effectiveness of our approach in modeling facial textures under challenging illumination affected by occlusions.

APOLLO: SGD-like Memory, AdamW-level Performance

1 code implementation6 Dec 2024 Hanqing Zhu, Zhenyu Zhang, Wenyan Cong, Xi Liu, Sem Park, Vikas Chandra, Bo Long, David Z. Pan, Zhangyang Wang, Jinwon Lee

This memory burden necessitates using more or higher-end GPUs or reducing batch sizes, limiting training scalability and throughput.

Quantization

Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis

no code implementations2 Dec 2024 Hao Yang, Zhenyu Zhang, Yanyan Zhao, Bing Qin

And in the real world, the quality of data usually varies for different samples, such noise is called data uncertainty.

Aspect-Based Sentiment Analysis Sentiment Analysis

EAPCR: A Universal Feature Extractor for Scientific Data without Explicit Feature Relation Patterns

no code implementations12 Nov 2024 Zhuohang Yu, Ling An, Yansong Li, Yu Wu, Zeyu Dong, Zhangdi Liu, Le Gao, Zhenyu Zhang, ChiChun Zhou

The absence of explicit Feature Relation Patterns (FRPs) presents a significant challenge for deep learning techniques in scientific applications that are not image, text, and graph-based.

Anomaly Detection Deep Learning +1

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

no code implementations17 Oct 2024 Chuanyu Tang, Yilong Chen, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu

Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning.

Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots

1 code implementation2 Oct 2024 Renkai Wu, Xianjin Wang, Pengchen Liang, Zhenyu Zhang, Qing Chang, Hao Tang

In addition, we organize and propose a dehaze dataset for robotic vision in urological surgery (USRobot-Dehaze dataset).

Zero-Shot Learning

Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging

no code implementations2 Oct 2024 Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Hua Wu, Sen Su

To ensure that each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.

Diversity Mixture-of-Experts

Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-Shot Open-Set Recognition

1 code implementation23 Aug 2024 Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li

Few-shot open-set recognition (FSOR) is a challenging task that requires a model to recognize known classes and identify unknown classes with limited labeled data.

Meta-Learning Open Set Learning

MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

1 code implementation23 Aug 2024 Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li

Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems.

Contrastive Learning Unsupervised Few-Shot Learning

Barbie: Text to Barbie-Style 3D Avatars

1 code implementation17 Aug 2024 Xiaokun Sun, Zhenyu Zhang, Ying Tai, Qian Wang, Hao Tang, Zili Yi, Jian Yang

In this paper, we propose Barbie, a novel framework for generating 3D avatars that can be dressed in diverse and high-quality Barbie-like garments and accessories.

Disentanglement Diversity

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

2 code implementations7 Aug 2024 Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, dianhai yu, Hua Wu

Large Language Models (LLMs) have ignited an innovative surge of AI applications, marking a new era of exciting possibilities equipped with extended context windows.

LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment

1 code implementation29 Jul 2024 Taoyu Su, Xinghua Zhang, Jiawei Sheng, Zhenyu Zhang, Tingwen Liu

Other studies refine each uni-modal information with graph structures, but may introduce unnecessary relations in specific modalities.

Graph Attention Knowledge Graphs +1

Predicting T-Cell Receptor Specificity

no code implementations27 Jul 2024 Tengyao Tu, Wei Zeng, Kun Zhao, Zhenyu Zhang

The result proves that adding a classifier to the model based on the random forest algorithm is very effective, and our model generally outperforms ordinary deep learning methods.

Deep Learning Specificity

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

1 code implementation15 Jul 2024 Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

Modern Large Language Models (LLMs) are composed of matrices with billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage.

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

2 code implementations11 Jul 2024 Zhenyu Zhang, Ajay Jaiswal, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore.

Quantization

A Survey on Failure Analysis and Fault Injection in AI Systems

no code implementations28 Jun 2024 Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

Moreover, this survey contributes to the field by providing a framework for fault diagnosis, evaluating the state-of-the-art in FI, and identifying areas for improvement in FI techniques to enhance the resilience of AI systems.

Fault Diagnosis Survey

Healing Powers of BERT: How Task-Specific Fine-Tuning Recovers Corrupted Language Models

no code implementations20 Jun 2024 Shijie Han, Zhenyu Zhang, Andrei Arsene Simion

Language models like BERT excel at sentence classification tasks due to extensive pre-training on general data, but their robustness to parameter corruption is unexplored.

Language Modeling Language Modelling +2

E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models

no code implementations16 Jun 2024 Zhenyu Zhang, Bingguang Hao, Jinpeng Li, Zekai Zhang, Dongyan Zhao

Most large language models (LLMs) are sensitive to prompts, and another synonymous expression or a typo may lead to unexpected results for the model.

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

no code implementations29 Apr 2024 Tianyidan Xie, Rui Ma, Qian Wang, Xiaoqian Ye, Feixuan Liu, Ying Tai, Zhenyu Zhang, Lanjun Wang, Zili Yi

In this framework, each agent is specialized in a distinct aspect, such as foreground understanding, diversity enhancement, object integrity protection, and textual prompt consistency.

Diversity Image Inpainting +3

HFT: Half Fine-Tuning for Large Language Models

no code implementations29 Apr 2024 Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Weiran Xu, Yu Sun, Hua Wu

Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities, enabling LLMs to follow natural language instructions or align with human preferences.

Continual Learning

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

1 code implementation2 Apr 2024 Rui Xie, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Jian Yang, Ying Tai

Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs.

Blind Super-Resolution Super-Resolution

Deepfake Generation and Detection: A Benchmark and Survey

1 code implementation26 Mar 2024 Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.

Attribute Face Reenactment +3

Invertible Diffusion Models for Compressed Sensing

1 code implementation25 Mar 2024 Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, Jian Zhang

To enable such memory-intensive end-to-end fine-tuning, we propose a novel two-level invertible design to transform both (1) multi-step sampling process and (2) noise estimation U-Net in each step into invertible networks.

compressed sensing Image Compressed Sensing +2

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

no code implementations CVPR 2024 Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, YuFei Wang, Zhenyu Zhang, Jun Li, Jian Yang

Depth completion is a vital task for autonomous driving, as it involves reconstructing the precise 3D geometry of a scene from sparse and noisy depth measurements.

3D geometry Autonomous Driving +1

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

3 code implementations6 Mar 2024 Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian

Our approach reduces memory usage by up to 65. 5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19. 7B tokens, and on fine-tuning RoBERTa on GLUE tasks.

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

1 code implementation5 Mar 2024 Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang

To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead.

Language Modeling Language Modelling

Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

no code implementations19 Feb 2024 Xuelin Qian, Yu Wang, Simian Luo, yinda zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, xiangyang xue, Bo Zhao, Tiejun Huang, Yunsheng Wu, Yanwei Fu

In this paper, we extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.

3D Generation 3D Shape Generation +1

Demystifying Chains, Trees, and Graphs of Thoughts

no code implementations25 Jan 2024 Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler

Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph.

Mathematical Reasoning Prompt Engineering

QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

1 code implementation10 Jan 2024 Tianlong Chen, Zhenyu Zhang, Hanrui Wang, Jiaqi Gu, Zirui Li, David Z. Pan, Frederic T. Chong, Song Han, Zhangyang Wang

To address these two pain points, we propose QuantumSEA, an in-time sparse exploration for noise-adaptive quantum circuits, aiming to achieve two key objectives: (1) implicit circuits capacity during training - by dynamically exploring the circuit's sparse connectivity and sticking a fixed small number of quantum gates throughout the training which satisfies the coherence time and enjoy light noises, enabling feasible executions on real quantum devices; (2) noise robustness - by jointly optimizing the topology and parameters of quantum circuits under real device noise models.

Quantum Machine Learning

Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention

2 code implementations22 Dec 2023 Zhen Tan, Tianlong Chen, Zhenyu Zhang, Huan Liu

Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.

FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity

1 code implementation30 Nov 2023 Shiyao Cui, Zhenyu Zhang, Yilong Chen, Wenyuan Zhang, Tianyun Liu, Siqi Wang, Tingwen Liu

The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts, primarily stemming from factoid, unfair, and toxic content.

Fairness Instruction Following +1

Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images

no code implementations15 Nov 2023 Zhaocong liu, Fa Zhang, Lin Cheng, Huanxi Deng, Xiaoyan Yang, Zhenyu Zhang, ChiChun Zhou

Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation.

Classification Dimensionality Reduction +1

Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning

1 code implementation2 Nov 2023 Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li, Gaoang Wang

With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations.

Diagnostic Image Captioning

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

1 code implementation8 Oct 2023 Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size.

Network Pruning

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

1 code implementation2 Oct 2023 Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.

Mixture-of-Experts

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

1 code implementation1 Oct 2023 Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Du

We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to understand the training procedure of multilayer Transformer architectures.

A study on the impact of pre-trained model on Just-In-Time defect prediction

1 code implementation5 Sep 2023 Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W. K. Chan, Bo Jiang

These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.

Defect Detection

RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion

no code implementations1 Sep 2023 Zhiqiang Yan, Xiang Li, Le Hui, Zhenyu Zhang, Jun Li, Jian Yang

To tackle these challenges, we explore a repetitive design in our image guided network to gradually and sufficiently recover depth values.

Depth Completion Depth Estimation +1

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

2 code implementations24 Jun 2023 Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens.

Variable Radiance Field for Real-Life Category-Specifc Reconstruction from Single Image

no code implementations8 Jun 2023 Kun Wang, Zhiqiang Yan, Zhenyu Zhang, Xiang Li, Jun Li, Jian Yang

Our key contributions are: (1) We parameterize the geometry and appearance of the object using a multi-scale global feature extractor, which avoids frequent point-wise feature retrieval and camera dependency.

Contrastive Learning Object +1

Are Large Kernels Better Teachers than Transformers for ConvNets?

1 code implementation30 May 2023 Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu

We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures.

Knowledge Distillation

OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

no code implementations26 Apr 2023 Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Model A aims to enhance the feature extraction ability of 360{\deg} image positional information, while Model B further focuses on the high-frequency information of 360{\deg} images.

Image Super-Resolution Position

Learning Versatile 3D Shape Generation with Improved AR Models

no code implementations26 Mar 2023 Simian Luo, Xuelin Qian, Yanwei Fu, yinda zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, xiangyang xue

Auto-Regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.

3D Shape Generation Image Generation +1

Graph Transformer GANs for Graph-Constrained House Generation

no code implementations CVPR 2023 Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc van Gool

We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.

Generative Adversarial Network House Generation +1

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

1 code implementation3 Mar 2023 Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

1 code implementation2 Mar 2023 Tianlong Chen, Zhenyu Zhang, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy.

Mixture-of-Experts

Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

1 code implementation24 Feb 2023 Ruisi Cai, Zhenyu Zhang, Zhangyang Wang

Given a robust model trained to be resilient to one or multiple types of distribution shifts (e. g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models?

Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild

no code implementations CVPR 2023 Zhenyu Zhang, Renwang Chen, Weijian Cao, Ying Tai, Chengjie Wang

To address this problem, this paper presents a novel Neural Proto-face Field (NPF) for unsupervised robust 3D face modeling.

NeRF

Learning To Measure the Point Cloud Reconstruction Loss in a Representation Space

no code implementations CVPR 2023 Tianxin Huang, Zhonggan Ding, Jiangning Zhang, Ying Tai, Zhenyu Zhang, Mingang Chen, Chengjie Wang, Yong liu

Specifically, we use the contrastive constraint to help CALoss learn a representation space with shape similarity, while we introduce the adversarial strategy to help CALoss mine differences between reconstructed results and ground truths.

Point cloud reconstruction

Towards Generalized Open Information Extraction

no code implementations29 Nov 2022 Bowen Yu, Zhenyu Zhang, Jingyang Li, Haiyang Yu, Tingwen Liu, Jian Sun, Yongbin Li, Bin Wang

Open Information Extraction (OpenIE) facilitates the open-domain discovery of textual facts.

Open Information Extraction

DesNet: Decomposed Scale-Consistent Network for Unsupervised Depth Completion

no code implementations20 Nov 2022 Zhiqiang Yan, Kun Wang, Xiang Li, Zhenyu Zhang, Jun Li, Jian Yang

Unsupervised depth completion aims to recover dense depth from the sparse one without using the ground-truth annotation.

Depth Completion Depth Estimation +2

QuanGCN: Noise-Adaptive Training for Robust Quantum Graph Convolutional Networks

no code implementations9 Nov 2022 Kaixiong Zhou, Zhenyu Zhang, Shengyuan Chen, Tianlong Chen, Xiao Huang, Zhangyang Wang, Xia Hu

Quantum neural networks (QNNs), an interdisciplinary field of quantum computing and machine learning, have attracted tremendous research interests due to the specific quantum advantages.

An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition

no code implementations20 Sep 2022 Yang Wu, Pai Peng, Zhenyu Zhang, Yanyan Zhao, Bing Qin

At the low-level, we propose the progressive tri-modal attention, which can model the tri-modal feature interactions by adopting a two-pass strategy and can further leverage such interactions to significantly reduce the computation and memory complexity through reducing the input token length.

Emotion Recognition

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

no code implementations14 Jul 2022 Zhenyu Zhang, Bowen Yu, Haiyang Yu, Tingwen Liu, Cheng Fu, Jingyang Li, Chengguang Tang, Jian Sun, Yongbin Li

In this paper, we propose a Layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents (VRDs), so as to generate accurate responses in dialogue systems.

Language Modeling Language Modelling

Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness

1 code implementation15 Jun 2022 Tianlong Chen, huan zhang, Zhenyu Zhang, Shiyu Chang, Sijia Liu, Pin-Yu Chen, Zhangyang Wang

Certifiable robustness is a highly desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios, but often demands tedious computations to establish.

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

1 code implementation9 Jun 2022 Tianlong Chen, Zhenyu Zhang, Sijia Liu, Yang Zhang, Shiyu Chang, Zhangyang Wang

For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from ImageNet, at 89. 26%/73. 79%, 89. 26%/79. 03%, and 91. 41%/83. 22% sparsity, respectively.

Transfer Learning

Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

1 code implementation CVPR 2022 Tianlong Chen, Zhenyu Zhang, Yihua Zhang, Shiyu Chang, Sijia Liu, Zhangyang Wang

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger.

Network Pruning

Label Anchored Contrastive Learning for Language Understanding

no code implementations NAACL 2022 Zhenyu Zhang, Yuming Zhao, Meng Chen, Xiaodong He

Motivated by this, we propose a novel label anchored contrastive learning approach (denoted as LaCon) for language understanding.

Benchmarking Contrastive Learning +3

Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion

1 code implementation18 Mar 2022 Zhiqiang Yan, Xiang Li, Kun Wang, Zhenyu Zhang, Jun Li, Jian Yang

To deal with the PDC task, we train a deep network that takes both depth and image as inputs for the dense panoramic depth recovery.

Depth Completion Transfer Learning

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy

1 code implementation CVPR 2022 Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

However, a "head-to-toe assessment" regarding the extent of redundancy in ViTs, and how much we could gain by thoroughly mitigating such, has been absent for this field.

All Diversity

Sparsity Winning Twice: Better Robust Generalization from More Efficient Training

1 code implementation ICLR 2022 Tianlong Chen, Zhenyu Zhang, Pengjun Wang, Santosh Balachandra, Haoyu Ma, Zehao Wang, Zhangyang Wang

We introduce two alternatives for sparse adversarial training: (i) static sparsity, by leveraging recent results from the lottery ticket hypothesis to identify critical sparse subnetworks arising from the early training; (ii) dynamic sparsity, by allowing the sparse subnetwork to adaptively adjust its connectivity pattern (while sticking to the same sparsity ratio) throughout training.

ASFD: Automatic and Scalable Face Detector

no code implementations26 Jan 2022 Jian Li, Bin Zhang, Yabiao Wang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Jilin Li, Xiaoming Huang, Yili Xia

Along with current multi-scale based detectors, Feature Aggregation and Enhancement (FAE) modules have shown superior performance gains for cutting-edge object detection.

Face Detection object-detection +1

Learning To Restore 3D Face From In-the-Wild Degraded Images

no code implementations CVPR 2022 Zhenyu Zhang, Yanhao Ge, Ying Tai, Xiaoming Huang, Chengjie Wang, Hao Tang, Dongjin Huang, Zhifeng Xie

In-the-wild 3D face modelling is a challenging problem as the predicted facial geometry and texture suffer from a lack of reliable clues or priors, when the input images are degraded.

3D Face Modelling Face Reconstruction

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership

1 code implementation NeurIPS 2021 Xuxi Chen, Tianlong Chen, Zhenyu Zhang, Zhangyang Wang

The lottery ticket hypothesis (LTH) emerges as a promising framework to leverage a special sparse subnetwork (i. e., winning ticket) instead of a full model for both training and inference, that can lower both costs without sacrificing the performance.

FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

1 code implementation18 Oct 2021 Zhenyu Zhang, Yewei Gu, Xiaowei Yi, Xianfeng Zhao

As increasing development of text-to-speech (TTS) and voice conversion (VC) technologies, the detection of synthetic speech has been suffered dramatically.

Speech Synthesis Synthetic Speech Detection +2

Improving Distantly-Supervised Named Entity Recognition with Self-Collaborative Denoising Learning

1 code implementation EMNLP 2021 Xinghua Zhang, Bowen Yu, Tingwen Liu, Zhenyu Zhang, Jiawei Sheng, Mengge Xue, Hongbo Xu

Distantly supervised named entity recognition (DS-NER) efficiently reduces labor costs but meanwhile intrinsically suffers from the label noise due to the strong assumption of distant supervision.

Denoising named-entity-recognition +2

MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

2 code implementations6 Oct 2021 Yewei Gu, Zhenyu Zhang, Xiaowei Yi, Xianfeng Zhao

To realize any-to-any (A2A) voice conversion (VC), most methods are to perform symmetric self-supervised reconstruction tasks (Xi to Xi), which usually results in inefficient performances due to inadequate feature decoupling, especially for unseen speakers.

Voice Conversion

DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder

no code implementations22 Sep 2021 Zhenyu Zhang, Tao Guo, Meng Chen

DialogueBERT was pre-trained with 70 million dialogues in real scenario, and then fine-tuned in three different downstream dialogue understanding tasks.

Dialogue Understanding Emotion Recognition +7

RigNet: Repetitive Image Guided Network for Depth Completion

no code implementations29 Jul 2021 Zhiqiang Yan, Kun Wang, Xiang Li, Zhenyu Zhang, Jun Li, Jian Yang

However, blurry guidance in the image and unclear structure in the depth still impede the performance of the image guided frameworks.

Depth Completion Depth Estimation +1

Efficient Lottery Ticket Finding: Less Data is More

1 code implementation6 Jun 2021 Zhenyu Zhang, Xuxi Chen, Tianlong Chen, Zhangyang Wang

We observe that a high-quality winning ticket can be found with training and pruning the dense network on the very compact PrAC set, which can substantially save training iterations for the ticket finding process.

GANs Can Play Lottery Tickets Too

1 code implementation ICLR 2021 Xuxi Chen, Zhenyu Zhang, Yongduo Sui, Tianlong Chen

In this work, we for the first time study the existence of such trainable matching subnetworks in deep GANs.

Image-to-Image Translation

Decentralized Baseband Processing with Gaussian Message Passing Detection for Uplink Massive MU-MIMO Systems

no code implementations22 May 2021 Zhenyu Zhang, Yuanyuan Dong, Keping Long, Xiyuan Wang, Xiaoming Dai

Decentralized baseband processing (DBP) architecture, which partitions the base station antennas into multiple antenna clusters, has been recently proposed to alleviate the excessively high interconnect bandwidth, chip input/output data rates, and detection complexity for massive multi-user multiple-input multiple-output (MU-MIMO) systems.

"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization

1 code implementation16 Apr 2021 Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

However, the BN layer is costly to calculate and is typically implemented with non-binary parameters, leaving a hurdle for the efficient implementation of BNN training.

Image Classification

Hydrogen-assisted layer-by-layer growth and robust nontrivial topology of stanene films on Bi(111)

no code implementations11 Mar 2021 Liying Zhang, Leiqiang Li, Chenxiao Zhao, Shunfang Li, Jinfeng Jia, Zhenyu Zhang, Yu Jia, Ping Cui

The atomistic growth mechanisms and nontrivial topology of stanene as presented here are also discussed in connection with recent experimental findings.

Materials Science

Robust Overfitting may be mitigated by properly learned smoothening

no code implementations ICLR 2021 Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, Zhangyang Wang

A recent study (Rice et al., 2020) revealed overfitting to be a dominant phenomenon in adversarially robust training of deep networks, and that appropriate early-stopping of adversarial training (AT) could match the performance gains of most recent algorithmic improvements.

Knowledge Distillation

Document-level Relation Extraction with Dual-tier Heterogeneous Graph

no code implementations COLING 2020 Zhenyu Zhang, Bowen Yu, Xiaobo Shu, Tingwen Liu, Hengzhu Tang, Wang Yubin, Li Guo

Document-level relation extraction (RE) poses new challenges over its sentence-level counterpart since it requires an adequate comprehension of the whole document and the multi-hop reasoning ability across multiple sentences to reach the final result.

Decision Making Document-level Relation Extraction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.