Search Results for author: Zhangyang Wang

Found 383 papers, 236 papers with code

HALO: Hardware-Aware Learning to Optimize

1 code implementation ECCV 2020 Chaojian Li, Tianlong Chen, Haoran You, Zhangyang Wang, Yingyan Lin

There has been an explosive demand for bringing machine learning (ML) powered intelligence into numerous Internet-of-Things (IoT) devices.

Eliminating the Invariance on the Loss Landscape of Linear Autoencoders

no code implementations ICML 2020 Reza Oftadeh, Jiayi Shen, Zhangyang Wang, Dylan Shell

For this new loss, we characterize the full structure of the loss landscape in the following sense: we establish analytical expression for the set of all critical points, show that it is a subset of critical points of MSE, and that all local minima are still global.

Decoder

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

1 code implementation4 Dec 2024 Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You

Vision-language models (VLMs) have shown remarkable success across various multi-modal tasks, yet large VLMs encounter significant efficiency challenges due to processing numerous visual tokens.

Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method

no code implementations17 Nov 2024 Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Lanqing Guo, Yuehao Wang, Peihao Wang, Zhangyang Wang

We explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model.

Image Enhancement

Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

no code implementations3 Nov 2024 Neel P. Bhatt, Yunhao Yang, Rohan Siva, Daniel Milan, Ufuk Topcu, Zhangyang Wang

To quantify each type of uncertainty, we propose methods tailored to the unique properties of perception and decision-making: we use conformal prediction to calibrate perception uncertainty and introduce Formal-Methods-Driven Prediction (FMDP) to quantify decision uncertainty, leveraging formal verification techniques for theoretical guarantees.

Conformal Prediction Decision Making +1

Chasing Better Deep Image Priors between Over- and Under-parameterization

1 code implementation31 Oct 2024 Qiming Wu, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

Besides, we also extend LIP to compressive sensing image reconstruction, where a pre-trained GAN generator is used as the prior (in contrast to untrained DIP or deep decoder), and confirm its validity in this setting too.

Compressive Sensing Decoder +2

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

no code implementations24 Oct 2024 Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang

The proliferation of large language models (LLMs) has led to the adoption of Mixture-of-Experts (MoE) architectures that dynamically leverage specialized subnetworks for improved efficiency and performance.

MMLU Scheduling

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

no code implementations24 Oct 2024 Zhiwen Fan, Jian Zhang, Wenyan Cong, Peihao Wang, Renjie Li, Kairun Wen, Shijie Zhou, Achuta Kadambi, Zhangyang Wang, Danfei Xu, Boris Ivanovic, Marco Pavone, Yue Wang

To tackle the scarcity of labeled 3D semantic data and enable natural language-driven scene manipulation, we incorporate a pre-trained 2D language-based segmentation model into a 3D-consistent semantic feature field.

3D Reconstruction Attribute

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

no code implementations14 Oct 2024 Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang

Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance.

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

no code implementations14 Oct 2024 Dejia Xu, Yifan Jiang, Chen Huang, Liangchen Song, Thorsten Gernoth, Liangliang Cao, Zhangyang Wang, Hao Tang

Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene.

Image to Video Generation

Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis

no code implementations12 Oct 2024 Hongru Yang, Bhavya Kailkhura, Zhangyang Wang, Yingbin Liang

In Phase 2, the attention matrices and the MLP evolve jointly to enlarge the classification margin and reduce the loss to a near minimum value.

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

1 code implementation7 Oct 2024 Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen

As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models.

Benchmarking

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

2 code implementations30 Sep 2024 Kevin Wang, Junbo Li, Neel P. Bhatt, Yihan Xi, Qiang Liu, Ufuk Topcu, Zhangyang Wang

Recent advancements in Large Language Models (LLMs) have showcased their ability to perform complex reasoning tasks, but their effectiveness in planning remains underexplored.

Decision Making Management +1

LLM-PBE: Assessing Data Privacy in Large Language Models

1 code implementation23 Aug 2024 Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song

Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.

All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks

no code implementations20 Jul 2024 Ajay Jaiswal, Nurendra Choudhary, Ravinarayana Adkathimar, Muthu P. Alagappan, Gaurush Hiranandani, Ying Ding, Zhangyang Wang, Edward W Huang, Karthik Subbian

In this paper, we investigate how LLMs can be leveraged in a computationally efficient fashion to benefit rich graph-structured data, a modality relatively unexplored in LLM literature.

Graph Learning

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

1 code implementation15 Jul 2024 Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

Modern Large Language Models (LLMs) are composed of matrices with billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage.

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

2 code implementations11 Jul 2024 Zhenyu Zhang, Ajay Jaiswal, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore.

Quantization

Expressive Gaussian Human Avatars from Monocular RGB Video

no code implementations3 Jul 2024 Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang

Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations.

4K4DGen: Panoramic 4D Generation at 4K Resolution

no code implementations19 Jun 2024 Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

Subsequently, we propose \textbf{Dynamic Panoramic Lifting} to elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency.

4k

Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

no code implementations16 Jun 2024 Zhiwen Fan, Pu Wang, Yang Zhao, Yibo Zhao, Boris Ivanovic, Zhangyang Wang, Marco Pavone, Hao Frank Yang

Leveraging this rich dataset, we further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes, such as crash types, severity and number of injuries, based on contextual and environmental factors.

Ensemble Learning

Flextron: Many-in-One Flexible Large Language Model

no code implementations11 Jun 2024 Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov

Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical.

Language Modelling Large Language Model +1

LoCoCo: Dropping In Convolutions for Long Context Compression

1 code implementation8 Jun 2024 Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo).

4k

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

1 code implementation CVPR 2024 Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

We present Zero-Painter, a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts.

Conditional Text-to-Image Synthesis Image Generation

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

no code implementations4 Jun 2024 Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat

Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users.

Image to Video Generation

Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

no code implementations26 May 2024 Hanwen Liang, Yuyang Yin, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation.

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

1 code implementation CVPR 2024 Moreno D'Incà, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, Nicu Sebe

In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set.

Bias Detection Fairness +3

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

no code implementations CVPR 2024 Mukund Varma T, Peihao Wang, Zhiwen Fan, Zhangyang Wang, Hao Su, Ravi Ramamoorthi

In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets.

Colorization Image Colorization +3

Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study

no code implementations26 Mar 2024 Jinze Zhao, Peihao Wang, Zhangyang Wang

Specifically, we investigate the impact of the number of data samples, the total number of experts, the sparsity in expert selection, the complexity of the routing mechanism, and the complexity of individual experts.

Learning Theory

Comp4D: LLM-Guided Compositional 4D Scene Generation

no code implementations25 Mar 2024 Dejia Xu, Hanwen Liang, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Plataniotis, Zhangyang Wang

Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content.

Object Scene Generation +1

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

1 code implementation21 Mar 2024 Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

no code implementations18 Mar 2024 Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected.

Ethics Fairness +1

Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk

1 code implementation14 Mar 2024 Zhangheng Li, Junyuan Hong, Bo Li, Zhangyang Wang

While diffusion models have recently demonstrated remarkable progress in generating realistic images, privacy risks also arise: published models or APIs could generate training images and thus leak privacy-sensitive training information.

Inference Attack Membership Inference Attack

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

2 code implementations6 Mar 2024 Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, Yuandong Tian

Our approach reduces memory usage by up to 65. 5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19. 7B tokens, and on fine-tuning RoBERTa on GLUE tasks.

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

1 code implementation5 Mar 2024 Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang

To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead.

Language Modelling

Principled Architecture-aware Scaling of Hyperparameters

1 code implementation27 Feb 2024 Wuyang Chen, Junru Wu, Zhangyang Wang, Boris Hanin

However, most designs or optimization methods are agnostic to the choice of network structures, and thus largely ignore the impact of neural architectures on hyperparameters.

AutoML

Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization

1 code implementation22 Feb 2024 Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen, Yingbin Liang, Mingyuan Zhou, Zhangyang Wang

Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses.

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

1 code implementation18 Feb 2024 Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard.

Benchmarking

Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community

1 code implementation15 Feb 2024 Arman Isajanyan, Artur Shatveryan, David Kocharyan, Zhangyang Wang, Humphrey Shi

These findings highlight the relevance and effectiveness of Social Reward in assessing community appreciation for AI-generated artworks, establishing a closer alignment with users' creative goals: creating popular visual art.

Image Generation

LLaGA: Large Language and Graph Assistant

2 code implementations13 Feb 2024 Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang

Graph Neural Networks (GNNs) have empowered the advance in graph-structured data analysis.

QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

1 code implementation10 Jan 2024 Tianlong Chen, Zhenyu Zhang, Hanrui Wang, Jiaqi Gu, Zirui Li, David Z. Pan, Frederic T. Chong, Song Han, Zhangyang Wang

To address these two pain points, we propose QuantumSEA, an in-time sparse exploration for noise-adaptive quantum circuits, aiming to achieve two key objectives: (1) implicit circuits capacity during training - by dynamically exploring the circuit's sparse connectivity and sticking a fixed small number of quantum gates throughout the training which satisfies the coherence time and enjoy light noises, enabling feasible executions on real quantum devices; (2) noise robustness - by jointly optimizing the topology and parameters of quantum circuits under real device noise models.

Quantum Machine Learning

AGG: Amortized Generative 3D Gaussians for Single Image to 3D

no code implementations8 Jan 2024 Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat

To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization.

3D Generation 3D Reconstruction +2

VASE: Object-Centric Appearance and Shape Manipulation of Real Videos

no code implementations4 Jan 2024 Elia Peruzzo, Vidit Goel, Dejia Xu, Xingqian Xu, Yifan Jiang, Zhangyang Wang, Humphrey Shi, Nicu Sebe

Recently, several works tackled the video editing task fostered by the success of large-scale text-to-image generative models.

Video Editing

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

1 code implementation CVPR 2024 Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

We propose PAIR Diffusion a generic framework that enables a diffusion model to control the structure and appearance properties of each object in the image.

Object

Taming Mode Collapse in Score Distillation for Text-to-3D Generation

no code implementations CVPR 2024 Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

In this paper, we reveal that the existing score distillation-based text-to-3D generation frameworks degenerate to maximal likelihood seeking on each view independently and thus suffer from the mode collapse problem, manifesting as the Janus artifact in practice.

3D Generation Prompt Engineering +1

4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency

no code implementations28 Dec 2023 Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei

Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations, thus offering superior control over content creation.

Prompt Engineering

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

1 code implementation21 Dec 2023 Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results.

2k Image Inpainting +1

The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need

no code implementations9 Dec 2023 Tianjin Huang, Tianlong Chen, Zhangyang Wang, Shiwei Liu

Therefore, it remains unclear whether the self-attention operation is crucial for the recent advances in SSL - or CNNs can deliver the same excellence with more advanced designs, too?

Self-Supervised Learning

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

1 code implementation CVPR 2024 Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi

In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation.

3D scene Editing Natural Language Queries +2

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

1 code implementation3 Dec 2023 Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang

However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task.

Edge Detection Image Generation +1

Rethinking PGD Attack: Is Sign Function Necessary?

1 code implementation3 Dec 2023 Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang

Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

1 code implementation1 Dec 2023 Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang

Novel view synthesis from limited observations remains an important and persistent task.

Novel View Synthesis

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

1 code implementation27 Nov 2023 Junyuan Hong, Jiachen T. Wang, Chenhui Zhang, Zhangheng Li, Bo Li, Zhangyang Wang

To ensure that the prompts do not leak private information, we introduce the first private prompt generation mechanism, by a differentially-private (DP) ensemble of in-context learning with private demonstrations.

In-Context Learning Language Modelling +3

Fine-Tuning Language Models Using Formal Methods Feedback

no code implementations27 Oct 2023 Yunhao Yang, Neel P. Bhatt, Tyler Ingebrand, William Ward, Steven Carr, Zhangyang Wang, Ufuk Topcu

Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks.

Autonomous Driving

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else

no code implementations11 Oct 2023 Hazarapet Tunanyan, Dejia Xu, Shant Navasardyan, Zhangyang Wang, Humphrey Shi

To achieve this goal, we identify the limitations in the text embeddings used for the pre-trained text-to-image diffusion models.

Image Manipulation Text-to-Image Generation

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

no code implementations10 Oct 2023 Xuxi Chen, Yu Yang, Zhangyang Wang, Baharan Mirzasoleiman

Dataset distillation aims to minimize the time and memory needed for training deep networks on large datasets, by creating a small set of synthetic images that has a similar generalization performance to that of the full dataset.

Dataset Distillation

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

1 code implementation8 Oct 2023 Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size.

Network Pruning

Pose-Free Generalizable Rendering Transformer

no code implementations5 Oct 2023 Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang

To address this challenge, we introduce PF-GRT, a new Pose-Free framework for Generalizable Rendering Transformer, eliminating the need for pre-computed camera poses and instead leveraging feature-matching learned directly from data.

Generalizable Novel View Synthesis Novel View Synthesis

Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

no code implementations4 Oct 2023 Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao

Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.

Image Generation Novel View Synthesis

Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications

no code implementations2 Oct 2023 Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang

We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally displaced and hence one requires merely "inference re-direction" with input-side augmentation such as prompting, to recover the knowledge-related performance.

Compressing LLMs: The Truth is Rarely Pure and Never Simple

1 code implementation2 Oct 2023 Ajay Jaiswal, Zhe Gan, Xianzhi Du, BoWen Zhang, Zhangyang Wang, Yinfei Yang

Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs that achieve 50 - 60% sparsity and reduce the bit width to 3 or 4 bits per weight, with negligible degradation of perplexity over the uncompressed baseline.

Quantization Retrieval

Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

1 code implementation29 Sep 2023 Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang

Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude.

Quantization

Safe and Robust Watermark Injection with a Single OoD Image

1 code implementation4 Sep 2023 Shuyang Yu, Junyuan Hong, Haobo Zhang, Haotao Wang, Zhangyang Wang, Jiayu Zhou

Training a high-performance deep neural network requires large amounts of data and computational resources.

Model extraction

Robust Mixture-of-Expert Training for Convolutional Neural Networks

1 code implementation ICCV 2023 Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, huan zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu

Since the lack of robustness has become one of the main hurdles for CNNs, in this paper we ask: How to adversarially robustify a CNN-based MoE model?

Adversarial Robustness

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

1 code implementation11 Aug 2023 Stefan Abi-Karam, Rishov Sarkar, Dejia Xu, Zhiwen Fan, Zhangyang Wang, Cong Hao

In this work, we introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.

High-Level Synthesis Meta-Learning

Doubly Robust Instance-Reweighted Adversarial Training

no code implementations1 Aug 2023 Daouda Sow, Sen Lin, Zhangyang Wang, Yingbin Liang

Experiments on standard classification datasets demonstrate that our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance, and at the same time improves the robustness against attacks on the weakest data points.

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

1 code implementation ICCV 2023 Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang

Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs.

Image Restoration

Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap

no code implementations20 Jul 2023 Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang

We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks.

Image Inpainting

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

no code implementations8 Jul 2023 Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li

To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE).

Inductive Bias

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

1 code implementation5 Jul 2023 Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu

Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate NAS from the expensive training process.

Neural Architecture Search

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

1 code implementation24 Jun 2023 Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

Based on these insights, we propose Heavy Hitter Oracle (H$_2$O), a KV cache eviction policy that dynamically retains a balance of recent and H$_2$ tokens.

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication

1 code implementation18 Jun 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance.

graph partitioning Graph Sampling

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

1 code implementation18 Jun 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine.

Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images

1 code implementation13 Jun 2023 Panwang Pan, Zhiwen Fan, Brandon Y. Feng, Peihao Wang, Chenxin Li, Zhangyang Wang

The accurate estimation of six degrees-of-freedom (6DoF) object poses is essential for many applications in robotics and augmented reality.

object-detection Object Detection +1

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

1 code implementation NeurIPS 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang

Large pre-trained transformers are show-stealer in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale.

Self-Supervised Learning

Are Large Kernels Better Teachers than Transformers for ConvNets?

1 code implementation30 May 2023 Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu

We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures.

Knowledge Distillation

Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

1 code implementation30 May 2023 Rishov Sarkar, Hanxue Liang, Zhiwen Fan, Zhangyang Wang, Cong Hao

Computer vision researchers are embracing two promising paradigms: Vision Transformers (ViTs) and Multi-task Learning (MTL), which both show great performance but are computation-intensive, given the quadratic complexity of self-attention in ViT and the need to activate an entire large MTL model for one task.

High-Level Synthesis Multi-Task Learning

Towards Constituting Mathematical Structures for Learning to Optimize

1 code implementation29 May 2023 Jialin Liu, Xiaohan Chen, Zhangyang Wang, Wotao Yin, HanQin Cai

Learning to Optimize (L2O), a technique that utilizes machine learning to learn an optimization algorithm automatically from data, has gained arising attention in recent years.

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

1 code implementation CVPR 2024 Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi

Text-to-image (T2I) research has grown explosively in the past year, owing to the large-scale pre-trained diffusion models and many emerging personalization and editing approaches.

Conditional Text-to-Image Synthesis Image Generation +3

POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference

1 code implementation25 May 2023 Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Dejia Xu, Hanwen Jiang, Zhangyang Wang

To mitigate this issue, we propose a general paradigm for object pose estimation, called Promptable Object Pose Estimation (POPE).

3D geometry Object +1

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

1 code implementation28 Apr 2023 Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, Kevin Wang, Yihan Xi, Dejia Xu, Zhangyang Wang

For a complicated algorithm, its implementation by a human programmer usually starts with outlining a rough control flow followed by iterative enrichments, eventually yielding carefully generated syntactic structures and variables in a hierarchy.

Code Generation Language Modelling +1

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

1 code implementation NeurIPS 2023 Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou

Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

1 code implementation30 Mar 2023 Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi

We propose PAIR Diffusion, a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in the image.

Object

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models

1 code implementation30 Mar 2023 Eric Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, Humphrey Shi

The unlearning problem of deep learning models, once primarily an academic concern, has become a prevalent issue in the industry.

Disentanglement Memorization +1

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

1 code implementation3 Mar 2023 Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

1 code implementation2 Mar 2023 Tianlong Chen, Zhenyu Zhang, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy.

Learning to Grow Pretrained Models for Efficient Transformer Training

no code implementations2 Mar 2023 Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis.

M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation

1 code implementation28 Feb 2023 Junjie Yang, Xuxi Chen, Tianlong Chen, Zhangyang Wang, Yingbin Liang

This data-driven procedure yields L2O that can efficiently solve problems similar to those seen in training, that is, drawn from the same ``task distribution".

You Only Transfer What You Share: Intersection-Induced Graph Transfer Learning for Link Prediction

1 code implementation27 Feb 2023 Wenqing Zheng, Edward W Huang, Nikhil Rao, Zhangyang Wang, Karthik Subbian

We identify this setting as Graph Intersection-induced Transfer Learning (GITL), which is motivated by practical applications in e-commerce or academic co-authorship predictions.

Link Prediction Transfer Learning

Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

1 code implementation24 Feb 2023 Ruisi Cai, Zhenyu Zhang, Zhangyang Wang

Given a robust model trained to be resilient to one or multiple types of distribution shifts (e. g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models?

Learning to Generalize Provably in Learning to Optimize

1 code implementation22 Feb 2023 Junjie Yang, Tianlong Chen, Mingkang Zhu, Fengxiang He, DaCheng Tao, Yingbin Liang, Zhangyang Wang

While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper.

Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers

no code implementations6 Feb 2023 Shiwei Liu, Zhangyang Wang

In response, we summarize ten Q\&As of SNNs from many key aspects, including dense vs. sparse, unstructured sparse vs. structured sparse, pruning vs. sparse training, dense-to-sparse training vs. sparse-to-sparse training, static sparsity vs. dynamic sparsity, before-training/during-training vs. post-training sparsity, and many more.

General Knowledge

Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style

no code implementations CVPR 2023 Haoming Lu, Hazarapet Tunanyan, Kai Wang, Shant Navasardyan, Zhangyang Wang, Humphrey Shi

Diffusion models have demonstrated impressive capability of text-conditioned image synthesis, and broader application horizons are emerging by personalizing those pretrained diffusion models toward generating some specialized target object or style.

Disentanglement Image Generation

NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views

no code implementations CVPR 2023 Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Yi Wang, Zhangyang Wang

In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360deg views that corresponds well with the given reference image.

Denoising Depth Estimation

AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts

1 code implementation ICCV 2023 Tianlong Chen, Xuxi Chen, Xianzhi Du, Abdullah Rashwan, Fan Yang, Huizhong Chen, Zhangyang Wang, Yeqing Li

Instead of compressing multiple tasks' knowledge into a single model, MoE separates the parameter space and only utilizes the relevant model pieces given task type and its input, which provides stabilized MTL training and ultra-efficient inference.

Instance Segmentation Multi-Task Learning +3

Vision HGNN: An Image is More than a Graph of Nodes

1 code implementation ICCV 2023 Yan Han, Peihao Wang, Souvik Kundu, Ying Ding, Zhangyang Wang

In this paper, we enhance ViG by transcending conventional "pairwise" linkages and harnessing the power of the hypergraph to encapsulate image information.

graph construction Graph Neural Network +3

Pruning Before Training May Improve Generalization, Provably

no code implementations1 Jan 2023 Hongru Yang, Yingbin Liang, Xiaojie Guo, Lingfei Wu, Zhangyang Wang

It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance.

Network Pruning

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

1 code implementation30 Dec 2022 Wenqing Zheng, S P Sharan, Zhiwen Fan, Kevin Wang, Yihan Xi, Zhangyang Wang

Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes.

Reinforcement Learning (RL)

StegaNeRF: Embedding Invisible Information within Neural Radiance Fields

no code implementations ICCV 2023 Chenxin Li, Brandon Y. Feng, Zhiwen Fan, Panwang Pan, Zhangyang Wang

Recent advances in neural rendering imply a future of widespread visual data distributions through sharing NeRF model weights.

Neural Rendering

NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views

1 code implementation29 Nov 2022 Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Yi Wang, Zhangyang Wang

In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360{\deg} views that correspond well with the given reference image.

3D Reconstruction Image to 3D +3

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

1 code implementation28 Nov 2022 Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu

Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i. e., untrained networks).

Out-of-Distribution Detection

Search Behavior Prediction: A Hypergraph Perspective

1 code implementation23 Nov 2022 Yan Han, Edward W Huang, Wenqing Zheng, Nikhil Rao, Zhangyang Wang, Karthik Subbian

With these hyperedges, we augment the original bipartite graph into a new \textit{hypergraph}.

Link Prediction

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

1 code implementation19 Nov 2022 Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

3 code implementations ICCV 2023 Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, Humphrey Shi

In this work, we expand the existing single-flow diffusion pipeline into a multi-task multimodal network, dubbed Versatile Diffusion (VD), that handles multiple flows of text-to-image, image-to-text, and variations in one unified model.

Disentanglement Image Captioning +6

StyleNAT: Giving Each Head a New Perspective

2 code implementations10 Nov 2022 Steven Walton, Ali Hassani, Xingqian Xu, Zhangyang Wang, Humphrey Shi

Image generation has been a long sought-after but challenging task, and performing the generation task in an efficient manner is similarly difficult.

Face Generation

QuanGCN: Noise-Adaptive Training for Robust Quantum Graph Convolutional Networks

no code implementations9 Nov 2022 Kaixiong Zhou, Zhenyu Zhang, Shengyuan Chen, Tianlong Chen, Xiao Huang, Zhangyang Wang, Xia Hu

Quantum neural networks (QNNs), an interdisciplinary field of quantum computing and machine learning, have attracted tremendous research interests due to the specific quantum advantages.

Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization

no code implementations3 Nov 2022 Junru Wu, Yi Liang, Feng Han, Hassan Akbari, Zhangyang Wang, Cong Yu

For example, even in the commonly adopted instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos from the internet.

Contrastive Learning Triplet

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

1 code implementation26 Oct 2022 Hanxue Liang, Zhiwen Fan, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang

However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task.

Multi-Task Learning

Symbolic Distillation for Learned TCP Congestion Control

1 code implementation24 Oct 2022 S P Sharan, Wenqing Zheng, Kuo-Feng Hsu, Jiarong Xing, Ang Chen, Zhangyang Wang

At the core of our proposal is a novel symbolic branching algorithm that enables the rule to be aware of the context in terms of various network conditions, eventually converting the NN policy into a symbolic tree.

Deep Reinforcement Learning Reinforcement Learning (RL)

Signal Processing for Implicit Neural Representations

no code implementations17 Oct 2022 Dejia Xu, Peihao Wang, Yifan Jiang, Zhiwen Fan, Zhangyang Wang

We answer this question by proposing an implicit neural signal processing network, dubbed INSP-Net, via differential operators on INR.

Deblurring Denoising +1

Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices

no code implementations16 Oct 2022 Yimeng Zhang, Akshay Karkal Kamath, Qiucheng Wu, Zhiwen Fan, Wuyang Chen, Zhangyang Wang, Shiyu Chang, Sijia Liu, Cong Hao

In this paper, we propose a data-model-hardware tri-design framework for high-throughput, low-cost, and high-accuracy multi-object tracking (MOT) on High-Definition (HD) video stream.

Model Compression Multi-Object Tracking

RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging

no code implementations15 Oct 2022 Ajay Jaiswal, Kumar Ashutosh, Justin F Rousseau, Yifan Peng, Zhangyang Wang, Ying Ding

Our extensive experiments on popular medical imaging classification tasks (cardiopulmonary disease and lesion classification) using real-world datasets, show the performance benefit of RoS-KD, its ability to distill knowledge from many popular large networks (ResNet-50, DenseNet-121, MobileNet-V2) in a comparatively small network, and its robustness to adversarial attacks (PGD, FSGM).

Classification Knowledge Distillation +1

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

1 code implementation14 Oct 2022 Ajay Jaiswal, Peihao Wang, Tianlong Chen, Justin F. Rousseau, Ying Ding, Zhangyang Wang

In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs.

Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

1 code implementation12 Oct 2022 Haotao Wang, Junyuan Hong, Aston Zhang, Jiayu Zhou, Zhangyang Wang

As a result, both the stem and the classification head in the final network are hardly affected by backdoor training samples.

backdoor defense Classification +1

Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative

1 code implementation7 Oct 2022 Tianxin Wei, Yuning You, Tianlong Chen, Yang shen, Jingrui He, Zhangyang Wang

This paper targets at improving the generalizability of hypergraph neural networks in the low-label regime, through applying the contrastive learning approach from images/graphs (we refer to it as HyperGCL).

Contrastive Learning Fairness +2

DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and Temporal Relatedness

no code implementations26 Sep 2022 Zepeng Huo, Taowei Ji, Yifei Liang, Shuai Huang, Zhangyang Wang, Xiaoning Qian, Bobak Mortazavi

We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors.

Activity Recognition Denoising +3

NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes

1 code implementation19 Sep 2022 Zhiwen Fan, Peihao Wang, Yifan Jiang, Xinyu Gong, Dejia Xu, Zhangyang Wang

Our framework, called NeRF with Self-supervised Object Segmentation NeRF-SOS, couples object segmentation and neural radiance field to segment objects in any view within a scene.

Object Segmentation +2

Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

2 code implementations15 Sep 2022 Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang

Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds.

Point Cloud Segmentation