Search Results for author: Shanghang Zhang

Found 108 papers, 47 papers with code

Topology Adaptive Graph Convolutional Networks

2 code implementations • ICLR 2018 • Jian Du, Shanghang Zhang, Guanhang Wu, Jose M. F. Moura, Soummya Kar

Spectral graph convolutional neural networks (CNNs) require approximation to the convolution to alleviate the computational complexity, resulting in performance loss.

12,987

Paper
Code

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

7 code implementations • 14 Dec 2020 • Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, JianXin Li, Hui Xiong, Wancai Zhang

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning.

Ranked #1 on Time Series Forecasting on ETTh2 (336) Univariate

Multivariate Time Series Forecasting Time Series +1

4,900

Paper
Code

A Review of Single-Source Deep Unsupervised Visual Domain Adaptation

1 code implementation • 1 Sep 2020 • Sicheng Zhao, Xiangyu Yue, Shanghang Zhang, Bo Li, Han Zhao, Bichen Wu, Ravi Krishna, Joseph E. Gonzalez, Alberto L. Sangiovanni-Vincentelli, Sanjit A. Seshia, Kurt Keutzer

To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.

Unsupervised Domain Adaptation

4,885

Paper
Code

COVID-CT-Dataset: A CT Scan Dataset about COVID-19

19 code implementations • 30 Mar 2020 • Xingyi Yang, Xuehai He, Jinyu Zhao, Yichen Zhang, Shanghang Zhang, Pengtao Xie

Using this dataset, we develop diagnosis methods based on multi-task learning and self-supervised learning, that achieve an F1 of 0. 90, an AUC of 0. 98, and an accuracy of 0. 89.

Computed Tomography (CT) COVID-19 Diagnosis +2

1,608

Paper
Code

Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans

1 code implementation • medRxiv 2020 • Xuehai He, Xingyi Yang, Shanghang Zhang, Jinyu Zhao, Yichen Zhang, Eric Xing, Pengtao Xie

Besides, these works require a large number of CTs to train accurate diagnosis models, which are difficult to obtain.

COVID-19 Diagnosis Self-Supervised Learning +1

1,075

Paper
Code

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision

1 code implementation • 18 Sep 2023 • Mingjie Pan, Jiaming Liu, Renrui Zhang, Peixiang Huang, Xiaoqi Li, Bing Wang, Hongwei Xie, Li Liu, Shanghang Zhang

3D occupancy prediction holds significant promise in the fields of robot perception and autonomous driving, which quantifies 3D scenes into grid cells with semantic labels.

Autonomous Driving

385

Paper
Code

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

Ranked #2 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Classification 3D Object Detection +11

290

Paper
Code

Q-Diffusion: Quantizing Diffusion Models

1 code implementation • ICCV 2023 • Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer

We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture of the diffusion models, which compresses the noise estimation network to accelerate the generation process.

Image Generation Noise Estimation +1

262

Paper
Code

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

1 code implementation • 14 Aug 2023 • Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu

Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost.

Text Generation

173

Paper
Code

Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

4 code implementations • 26 May 2017 • Han Zhao, Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura, Geoffrey J. Gordon

As a step toward bridging the gap, we propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances.

Domain Adaptation Sentiment Analysis

107

Paper
Code

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

1 code implementation • CVPR 2023 • Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu, Yandong Guo, Shanghang Zhang

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings.

3D Object Detection object-detection +2

103

Paper
Code

Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation

1 code implementation • CVPR 2021 • Xiangyu Yue, Zangwei Zheng, Shanghang Zhang, Yang Gao, Trevor Darrell, Kurt Keutzer, Alberto Sangiovanni Vincentelli

In this paper, we propose an end-to-end Prototypical Cross-domain Self-Supervised Learning (PCS) framework for Few-shot Unsupervised Domain Adaptation (FUDA).

Ranked #6 on Semantic Segmentation on DensePASS

Contrastive Learning Self-Supervised Learning +2

Paper
Code

Instance Adaptive Self-Training for Unsupervised Domain Adaptation

2 code implementations • ECCV 2020 • Ke Mei, Chuang Zhu, Jiaqi Zou, Shanghang Zhang

In this paper, we propose an instance adaptive self-training framework for UDA on the task of semantic segmentation.

Ranked #13 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Pseudo Label Semantic Segmentation +2

Paper
Code

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

1 code implementation • CVPR 2022 • Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Xianglong Liu, Ziwei Liu

By comprehensively investigating these GE-ViTs and comparing with their corresponding CNN models, we observe: 1) For the enhanced model, larger ViTs still benefit more for the OOD generalization.

Out-of-Distribution Generalization Self-Supervised Learning

Paper
Code

MaCow: Masked Convolutional Generative Flow

2 code implementations • NeurIPS 2019 • Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy

Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.

Ranked #3 on Image Generation on CelebA 256x256

Computational Efficiency Density Estimation +1

Paper
Code

Decoupling Global and Local Representations via Invertible Generative Flows

1 code implementation • ICLR 2021 • Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy

In this work, we propose a new generative model that is capable of automatically decoupling global and local representations of images in an entirely unsupervised setting, by embedding a generative flow in the VAE framework to model the decoder.

Density Estimation Image Generation +2

Paper
Code

PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection

1 code implementation • NeurIPS 2023 • Qiang Zhou, Weize Li, Lihan Jiang, Guoliang Wang, Guyue Zhou, Shanghang Zhang, Hao Zhao

Furthermore, we provide an open-source benchmark library, including dataset and baseline methods that cover 8 anomaly detection paradigms, to facilitate future research and application in this domain.

Ranked #2 on Anomaly Detection on PAD Dataset

4k Anomaly Detection

Paper
Code

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

1 code implementation • CVPR 2023 • Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting.

3D Object Detection 3D Open-Vocabulary Object Detection +3

Paper
Code

Self-Supervised Pretraining Improves Self-Supervised Pretraining

1 code implementation • 23 Mar 2021 • Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.

Image Augmentation

Paper
Code

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

1 code implementation • CVPR 2023 • Jianyang Gu, Kai Wang, Hao Luo, Chen Chen, Wei Jiang, Yuqiang Fang, Shanghang Zhang, Yang You, Jian Zhao

Neural Architecture Search (NAS) has been increasingly appealing to the society of object Re-Identification (ReID), for that task-specific architectures significantly improve the retrieval performance.

Ranked #8 on Vehicle Re-Identification on VehicleID Large

Image Classification Neural Architecture Search +3

Paper
Code

Multi-source Distilling Domain Adaptation

1 code implementation • 22 Nov 2019 • Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer

Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA).

Ranked #4 on Multi-Source Unsupervised Domain Adaptation on Office-31

Domain Adaptation Multi-Source Unsupervised Domain Adaptation

Paper
Code

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

1 code implementation • 11 Apr 2024 • Yiwen Tang, Jiaming Liu, Dong Wang, Zhigang Wang, Shanghang Zhang, Bin Zhao, Xuelong Li

The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.

Paper
Code

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

1 code implementation • ICCV 2021 • Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving +1

Paper
Code

Understanding Traffic Density from Large-Scale Web Camera Data

1 code implementation • CVPR 2017 • Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura

Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective.

regression

Paper
Code

Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting

1 code implementation • ICLR 2022 • Shikuang Deng, Yuhang Li, Shanghang Zhang, Shi Gu

Then we introduce the temporal efficient training (TET) approach to compensate for the loss of momentum in the gradient descent with SG so that the training process can converge into flatter minima with better generalizability.

Paper
Code

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

1 code implementation • 27 Sep 2022 • Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang Zhang, Qi Zhang, Fengwei Yu, Xianglong Liu

With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices.

Quantization

Paper
Code

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

1 code implementation • 7 Jun 2023 • Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang

Note that, our method can be regarded as a novel transfer paradigm for large-scale models, delivering promising results in adaptation to continually changing distributions.

Test-time Adaptation

Paper
Code

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

1 code implementation • 29 Mar 2024 • Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

Instruction Following Language Modelling +4

Paper
Code

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

1 code implementation • 29 Nov 2023 • Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.

Image Generation Language Modelling +1

Paper
Code

Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization

1 code implementation • 30 Oct 2020 • Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer

Due to scarcity of labels on the target domain, we introduce mutual information maximization (MIM) apart from CL to exploit the features that best support the final prediction.

Contrastive Learning General Classification +3

Paper
Code

Cross-Domain Sentiment Classification with In-Domain Contrastive Learning

1 code implementation • 5 Dec 2020 • Tian Li, Xiang Chen, Shanghang Zhang, Zhen Dong, Kurt Keutzer

In this paper, we propose a contrastive learning framework for cross-domain sentiment classification.

Classification Contrastive Learning +4

Paper
Code

Contrastive Multimodal Fusion with TupleInfoNCE

1 code implementation • ICCV 2021 • Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi

Another approach is to concatenate all the modalities into a tuple and then contrast positive and negative tuple correspondences.

Ranked #70 on Semantic Segmentation on NYU Depth v2

Contrastive Learning Representation Learning +1

Paper
Code

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

1 code implementation • 17 Mar 2023 • Senqiao Yang, Jiarui Wu, Jiaming Liu, Xiaoqi Li, Qizhe Zhang, Mingjie Pan, Yulu Gan, Zehui Chen, Shanghang Zhang

The visual prompts have provided an efficient manner in addressing visual cross-domain problems.

Depth Estimation Semantic Segmentation +1

Paper
Code

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

1 code implementation • ICCV 2017 • Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura

To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion.

Paper
Code

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

1 code implementation • 20 Jun 2022 • Tian Li, Xiang Chen, Zhen Dong, Weijiang Yu, Yijun Yan, Kurt Keutzer, Shanghang Zhang

Then during training, DASK injects pivot-related knowledge graph information into source domain texts.

Domain Adaptation Sentiment Analysis +3

Paper
Code

BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection

1 code implementation • 1 Dec 2022 • Jianing Li, Ming Lu, Jiaming Liu, Yandong Guo, Li Du, Shanghang Zhang

In this paper, we propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.

3D Object Detection Autonomous Driving +4

Paper
Code

Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation

1 code implementation • 10 Oct 2022 • Yixiong Zou, Shanghang Zhang, Yuhua Li, Ruixuan Li

Few-shot class-incremental learning (FSCIL) is designed to incrementally recognize novel classes with only few training samples after the (pre-)training on base classes with sufficient samples, which focuses on both base-class performance and novel-class generalization.

Few-Shot Class-Incremental Learning Incremental Learning

Paper
Code

Annealing-Based Label-Transfer Learning for Open World Object Detection

1 code implementation • CVPR 2023 • Yuqing Ma, Hainan Li, Zhange Zhang, Jinyang Guo, Shanghang Zhang, Ruihao Gong, Xianglong Liu

To the best of our knowledge, this is the first OWOD work without manual unknown selection.

object-detection Open World Object Detection +2

Paper
Code

MTTrans: Cross-Domain Object Detection with Mean-Teacher Transformer

1 code implementation • 3 May 2022 • Jinze Yu, Jiaming Liu, Xiaobao Wei, Haoyi Zhou, Yohei Nakata, Denis Gudovskiy, Tomoyuki Okuno, JianXin Li, Kurt Keutzer, Shanghang Zhang

To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels.

Domain Adaptation Object +3

Paper
Code

Efficient Meta-Tuning for Content-aware Neural Video Delivery

1 code implementation • 20 Jul 2022 • Xiaoqi Li, Jiaming Liu, Shizun Wang, Cheng Lyu, Ming Lu, Yurong Chen, Anbang Yao, Yandong Guo, Shanghang Zhang

Our method significantly reduces the computational cost and achieves even better performance, paving the way for applying neural video delivery techniques to practical applications.

Super-Resolution

Paper
Code

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

1 code implementation • 22 Sep 2023 • Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, Shanghang Zhang

NTO3D lifts the 2D masks and features of SAM into the 3D neural field for high-quality neural target object 3D reconstruction.

3D Object Reconstruction 3D Reconstruction +1

Paper
Code

MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning

1 code implementation • 5 Dec 2023 • Qizhe Zhang, Bocheng Zou, Ruichuan An, Jiaming Liu, Shanghang Zhang

Motivated by this, we propose Mixture of Sparse Adapters, or MoSA, as a novel Adapter Tuning method to fully unleash the potential of each parameter in the adapter.

Paper
Code

Heterogenous Memory Augmented Neural Networks

1 code implementation • 17 Oct 2023 • Zihan Qiu, Zhen Liu, Shuicheng Yan, Shanghang Zhang, Jie Fu

It has been shown that semi-parametric methods, which combine standard neural networks with non-parametric components such as external memory modules and data retrieval, are particularly helpful in data scarcity and out-of-distribution (OOD) scenarios.

Retrieval

Paper
Code

Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

1 code implementation • 31 Jan 2024 • Jianing Li, Xi Nan, Ming Lu, Li Du, Shanghang Zhang

To overcome this limitation in MLLMs, we introduce Proximity Question Answering (Proximity QA), a novel framework designed to enable MLLMs to infer the proximity relationship between objects in images.

Multi-Task Learning Question Answering +1

Paper
Code

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms

1 code implementation • 19 Jun 2020 • Xingyi Yang, Xuehai He, Yuxiao Liang, Yue Yang, Shanghang Zhang, Pengtao Xie

There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.

Self-Supervised Learning Transfer Learning

Paper
Code

Modeling relation paths for knowledge base completion via joint adversarial training

1 code implementation • 14 Oct 2018 • Chen Li, Xutan Peng, Shanghang Zhang, Hao Peng, Philip S. Yu, Min He, Linfeng Du, Lihong Wang

By treating relations and multi-hop paths as two different input sources, we use a feature extractor, which is shared by two downstream components (i. e. relation classifier and source discriminator), to capture shared/similar information between them.

Knowledge Base Completion Relation

Paper
Code

Adversarial Multiple Source Domain Adaptation

no code implementations • NeurIPS 2018 • Han Zhao, Shanghang Zhang, Guanhang Wu, José M. F. Moura, Joao P. Costeira, Geoffrey J. Gordon

In this paper we propose new generalization bounds and algorithms under both classification and regression settings for unsupervised multiple source domain adaptation.

Ranked #3 on Domain Adaptation on GTA5+Synscapes to Cityscapes

Classification Domain Adaptation +5

Paper
Add Code

Learning to Understand Image Blur

no code implementations • CVPR 2018 • Shanghang Zhang, Xiaohui Shen, Zhe Lin, RadomÃr MÄch, JoÃ£o P. Costeira, JosÃ© M. F. Moura

In this paper, we propose a unified framework to estimate a spatially-varying blur map and understand its desirability in terms of image quality at the same time.

Paper
Add Code

Multiple Source Domain Adaptation with Adversarial Learning

no code implementations • ICLR 2018 • Han Zhao, Shanghang Zhang, Guanhang Wu, Jo\~{a}o P. Costeira, Jos\'{e} M. F. Moura, Geoffrey J. Gordon

We propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances.

Domain Adaptation Sentiment Analysis

Paper
Add Code

Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning

no code implementations • NeurIPS 2019 • Jian Ni, Shanghang Zhang, Haiyong Xie

In particular, the primal GAN learns to synthesize inter-class discriminative and semantics-preserving visual features from both the semantic representations of seen/unseen classes and the ones reconstructed by the dual GAN.

Generalized Zero-Shot Learning Transfer Learning

Paper
Add Code

Generalized Zero-shot ICD Coding

no code implementations • 28 Sep 2019 • Congzheng Song, Shanghang Zhang, Najmeh Sadoughi, Pengtao Xie, Eric Xing

The International Classification of Diseases (ICD) is a list of classification codes for the diagnoses.

General Classification Generalized Zero-Shot Learning +3

Paper
Add Code

Decoupling Features and Coordinates for Few-shot RGB Relocalization

no code implementations • 26 Nov 2019 • Siyan Dong, Songyin Wu, Yixin Zhuang, Kai Xu, Shanghang Zhang, Baoquan Chen

To address this issue, we approach camera relocalization with a decoupled solution where feature extraction, coordinate regression, and pose estimation are performed separately.

Camera Relocalization Pose Estimation +1

Paper
Add Code

Compositional Few-Shot Recognition with Primitive Discovery and Enhancing

no code implementations • 12 May 2020 • Yixiong Zou, Shanghang Zhang, Ke Chen, Yonghong Tian, Yao-Wei Wang, José M. F. Moura

Inspired by such capability of humans, to imitate humans' ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with two parts, i. e. primitive discovery and primitive enhancing.

Few-Shot Image Classification Few-Shot Learning +1

Paper
Add Code

Rethinking Distributional Matching Based Domain Adaptation

no code implementations • 23 Jun 2020 • Bo Li, Yezhen Wang, Tong Che, Shanghang Zhang, Sicheng Zhao, Pengfei Xu, Wei Zhou, Yoshua Bengio, Kurt Keutzer

In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods.

Domain Adaptation

Paper
Add Code

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

no code implementations • ECCV 2020 • Xinwei Sun, Yilun Xu, Peng Cao, Yuqing Kong, Lingjing Hu, Shanghang Zhang, Yizhou Wang

In this paper, we propose a novel information-theoretic approach, namely \textbf{T}otal \textbf{C}orrelation \textbf{G}ain \textbf{M}aximization (TCGM), for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) it has theoretical guarantee to identify Bayesian classifiers, i. e., the ground truth posteriors of all modalities.

Disease Prediction Emotion Recognition +1

Paper
Add Code

Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition

no code implementations • 7 Aug 2020 • Yixiong Zou, Shanghang Zhang, JianPeng Yu, Yonghong Tian, José M. F. Moura

To solve this problem, cross-domain FSL (CDFSL) is proposed very recently to transfer knowledge from general-domain base classes to special-domain novel classes.

cross-domain few-shot learning

Paper
Add Code

Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation

no code implementations • CVPR 2021 • Bo Li, Yezhen Wang, Shanghang Zhang, Dongsheng Li, Trevor Darrell, Kurt Keutzer, Han Zhao

First, we provide a finite sample bound for both classification and regression problems under Semi-DA.

regression Semi-supervised Domain Adaptation +2

Paper
Add Code

Annotation-Efficient Untrimmed Video Action Recognition

no code implementations • 30 Nov 2020 • Yixiong Zou, Shanghang Zhang, Guangyao Chen, Yonghong Tian, Kurt Keutzer, José M. F. Moura

In this paper, we target a new problem, Annotation-Efficient Video Recognition, to reduce the requirement of annotations for both large amount of samples and the action location.

Action Recognition Contrastive Learning +3

Paper
Add Code

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

no code implementations • 24 Dec 2020 • Yunze Liu, Li Yi, Shanghang Zhang, Qingnan Fan, Thomas Funkhouser, Hao Dong

Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks.

Contrastive Learning Representation Learning +1

Paper
Add Code

Online Continual Adaptation with Active Self-Training

no code implementations • 11 Jun 2021 • Shiji Zhou, Han Zhao, Shanghang Zhang, Lianzhe Wang, Heng Chang, Zhi Wang, Wenwu Zhu

Our theoretical results show that OSAMD can fast adapt to changing environments with active queries.

Paper
Add Code

Meta Learning with Minimax Regularization

no code implementations • 29 Sep 2021 • Lianzhe Wang, Shiji Zhou, Shanghang Zhang, Wenpeng Zhang, Heng Chang, Wenwu Zhu

Even though meta-learning has attracted research wide attention in recent years, the generalization problem of meta-learning is still not well addressed.

Few-Shot Learning

Paper
Add Code

2nd Place Solution for VisDA 2021 Challenge -- Universally Domain Adaptive Image Recognition

no code implementations • 27 Oct 2021 • Haojin Liao, Xiaolin Song, Sicheng Zhao, Shanghang Zhang, Xiangyu Yue, Xingxu Yao, Yueming Zhang, Tengfei Xing, Pengfei Xu, Qiang Wang

The Visual Domain Adaptation (VisDA) 2021 Challenge calls for unsupervised domain adaptation (UDA) methods that can deal with both input distribution shift and label set variance between the source and target domains.

Universal Domain Adaptation Unsupervised Domain Adaptation

Paper
Add Code

Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks

no code implementations • NeurIPS 2021 • Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, Shi Gu

Based on the introduced finite difference gradient, we propose a new family of Differentiable Spike (Dspike) functions that can adaptively evolve during training to find the optimal shape and smoothness for gradient estimation.

Ranked #4 on Event data classification on CIFAR10-DVS

Event data classification Image Classification

Paper
Add Code

Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network with Graph Representation Learning

no code implementations • 5 Jan 2022 • Xingqun Qi, Muyi Sun, Zijian Wang, Jiaming Liu, Qi Li, Fang Zhao, Shanghang Zhang, Caifeng Shan

To preserve the generated faces being more structure-coordinated, the IRSG models inter-class structural relations among every facial component by graph representation learning.

Generative Adversarial Network Graph Representation Learning +1

Paper
Add Code

UnrealNAS: Can We Search Neural Architectures with Unreal Data?

no code implementations • 4 May 2022 • Zhen Dong, Kaicheng Zhou, Guohao Li, Qiang Zhou, Mingfei Guo, Bernard Ghanem, Kurt Keutzer, Shanghang Zhang

Neural architecture search (NAS) has shown great success in the automatic design of deep neural networks (DNNs).

Neural Architecture Search

Paper
Add Code

Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

no code implementations • 5 Jul 2022 • Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

Current point-cloud detection methods have difficulty detecting the open-vocabulary objects in the real world, due to their limited generalization capability.

Cloud Detection Contrastive Learning

Paper
Add Code

Uncertainty Guided Depth Fusion for Spike Camera

no code implementations • 26 Aug 2022 • Jianing Li, Jiaming Liu, Xiaobao Wei, Jiyuan Zhang, Ming Lu, Lei Ma, Li Du, Tiejun Huang, Shanghang Zhang

In this paper, we propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse the predictions of monocular and stereo depth estimation networks for spike camera.

Autonomous Driving Stereo Depth Estimation

Paper
Add Code

Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer

1 code implementation • 26 Aug 2022 • Jiaming Liu, Qizhe Zhang, Jianing Li, Ming Lu, Tiejun Huang, Shanghang Zhang

Neuromorphic spike data, an upcoming modality with high temporal resolution, has shown promising potential in real-world applications due to its inherent advantage to overcome high-velocity motion blur.

Autonomous Driving Depth Estimation +2

Paper
Code

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

no code implementations • CVPR 2023 • Yijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang

Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution with additive noisy bias to fit a given quantizer.

Quantization

Paper
Add Code

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection

no code implementations • 30 Nov 2022 • Jiaming Liu, Rongyu Zhang, Xiaoqi Li, Xiaowei Chi, Zehui Chen, Ming Lu, Yandong Guo, Shanghang Zhang

In this paper, we propose a Multi-space Alignment Teacher-Student (MATS) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Geometric-space Aligned Student (GAS) model.

3D Object Detection Autonomous Driving +4

Paper
Add Code

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

no code implementations • CVPR 2023 • Xiaowei Chi, Jiaming Liu, Ming Lu, Rongyu Zhang, Zhaoqing Wang, Yandong Guo, Shanghang Zhang

In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world

no code implementations • CVPR 2023 • Yulu Gan, Mingjie Pan, Rongyu Zhang, Zijian Ling, Lingran Zhao, Jiaming Liu, Shanghang Zhang

To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device and improves the generalization of the device model.

Device-Cloud Collaboration object-detection +2

Paper
Add Code

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

no code implementations • 6 Dec 2022 • Lirui Xiao, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang

CSQ stabilizes the bit-level mixed-precision training process with a bi-level gradual continuous sparsification on both the bit values of the quantized weights and the bit selection in determining the quantization precision of each layer.

Quantization

Paper
Add Code

WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal

no code implementations • 24 Mar 2023 • Yulin Luo, Rui Zhao, Xiaobao Wei, Jinwei Chen, Yijie Lu, Shenghao Xie, Tianyu Wang, Ruiqin Xiong, Ming Lu, Shanghang Zhang

To this end, we propose a method called Weather-aware Multi-scale MoE (WM-MoE) based on Transformer for blind weather removal.

Autonomous Driving Contrastive Learning +1

Paper
Add Code

Chain of Thought Prompt Tuning in Vision Language Models

no code implementations • 16 Apr 2023 • Jiaxin Ge, Hongyin Luo, Siyuan Qian, Yulu Gan, Jie Fu, Shanghang Zhang

Chain of Thought is a simple and effective approximation to human reasoning process and has been proven useful for natural language processing (NLP) tasks.

Domain Generalization Image Classification +4

Paper
Add Code

Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level

no code implementations • CVPR 2023 • Lianzhe Wang, Shiji Zhou, Shanghang Zhang, Xu Chu, Heng Chang, Wenwu Zhu

Despite the broad interest in meta-learning, the generalization problem remains one of the significant challenges in this field.

Meta-Learning

Paper
Add Code

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

no code implementations • 21 May 2023 • Yijia Zhang, Lingran Zhao, Shijie Cao, WenQiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

In this study, we conduct a comparative analysis of INT and FP quantization with the same bit-width, revealing that the optimal quantization format varies across different layers due to the complexity and diversity of tensor distribution.

Quantization

Paper
Add Code

HUB: Guiding Learned Optimizers with Continuous Prompt Tuning

no code implementations • 26 May 2023 • Gaole Dai, Wei Wu, Ziyu Wang, Jie Fu, Shanghang Zhang, Tiejun Huang

By incorporating hand-designed optimizers as the second component in our hybrid approach, we are able to retain the benefits of learned optimizers while stabilizing the training process and, more importantly, improving testing performance.

Meta-Learning

Paper
Add Code

UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering

no code implementations • 15 Jun 2023 • Mingjie Pan, Li Liu, Jiaming Liu, Peixiang Huang, Longlong Wang, Shanghang Zhang, Shaoqing Xu, Zhiyi Lai, Kuiyuan Yang

In this technical report, we present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track in the nuScenes Open Dataset Challenge at CVPR 2023.

Ranked #3 on Prediction Of Occupancy Grid Maps on Occ3D-nuScenes

Prediction Of Occupancy Grid Maps

Paper
Add Code

DiffuseIR:Diffusion Models For Isotropic Reconstruction of 3D Microscopic Images

no code implementations • 21 Jun 2023 • Mingjie Pan, Yulu Gan, Fangxu Zhou, Jiaming Liu, Aimin Wang, Shanghang Zhang, Dawei Li

Since the diffusion model learns the universal structural distribution of biological tissues, which is independent of the axial resolution, DiffuseIR can reconstruct authentic images with unseen low-axial resolutions into a high-axial resolution without requiring re-training.

Super-Resolution

Paper
Add Code

PM-DETR: Domain Adaptive Prompt Memory for Object Detection with Transformers

no code implementations • 1 Jul 2023 • Peidong Jia, Jiaming Liu, Senqiao Yang, Jiarui Wu, Xiaodong Xie, Shanghang Zhang

PDM comprehensively leverages the prompt memory to extract domain-specific knowledge and explicitly constructs a long-term memory space for the data distribution, which represents better domain diversity compared to existing methods.

object-detection Object Detection

Paper
Add Code

QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection

no code implementations • ICCV 2023 • Yifan Zhang, Zhen Dong, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yuan Du, Kurt Keutzer, Li Du, Shanghang Zhang

Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements.

3D Object Detection Model Compression +2

Paper
Add Code

Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation

no code implementations • 24 Sep 2023 • Jiayi Ni, Senqiao Yang, ran Xu, Jiaming Liu, Xiaoqi Li, Wenyu Jiao, Zehui Chen, Yi Liu, Shanghang Zhang

In this paper, we propose a distribution-aware tuning (DAT) method to make the semantic segmentation CTTA efficient and practical in real-world applications.

Autonomous Driving Semantic Segmentation +1

Paper
Add Code

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

no code implementations • 20 Nov 2023 • Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang

(2) During the distillation period, a pixel-wise frequency mask is generated via Frequency Prompt, to localize those pixel of interests (PoIs) in various frequency bands.

Knowledge Distillation

Paper
Add Code

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design

no code implementations • 28 Nov 2023 • Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo

Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text.

Image Generation

Paper
Add Code

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

no code implementations • 29 Nov 2023 • Xingqun Qi, Jiahao Pan, Peng Li, Ruibin Yuan, Xiaowei Chi, Mengfei Li, Wenhan Luo, Wei Xue, Shanghang Zhang, Qifeng Liu, Yike Guo

In addition, the lack of large-scale available datasets with emotional transition speech and corresponding 3D human gestures also limits the addressing of this task.

Audio inpainting Gesture Generation

Paper
Add Code

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything

no code implementations • 28 Nov 2023 • Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang

In contrast, implicit methods learn continuous representations for segmentation, which is crucial for medical image segmentation.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

MoEC: Mixture of Experts Implicit Neural Compression

no code implementations • 3 Dec 2023 • Jianchen Zhao, Cheng-Ching Tseng, Ming Lu, Ruichuan An, Xiaobao Wei, He Sun, Shanghang Zhang

However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the partition and INRs.

Data Compression

Paper
Add Code

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

no code implementations • 14 Dec 2023 • Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Shanghang Zhang, Kurt Keutzer

In particular, we build a tree-like Split-Ensemble architecture by performing iterative splitting and pruning from a shared backbone model, where each branch serves as a submodel corresponding to a subtask.

Paper
Add Code

Gradient-based Parameter Selection for Efficient Fine-Tuning

no code implementations • 15 Dec 2023 • Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang

With the growing size of pre-trained models, full fine-tuning and storing all the parameters for various downstream tasks is costly and infeasible.

Image Classification Image Segmentation +2

Paper
Add Code

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

no code implementations • 15 Dec 2023 • Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang

In this paper, we present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation.

3D Generation Image to 3D +1

Paper
Add Code

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

no code implementations • 19 Dec 2023 • Jiaming Liu, ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts.

Self-Supervised Learning Test-time Adaptation

Paper
Add Code

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

no code implementations • 21 Dec 2023 • Senqiao Yang, Jiaming Liu, Ray Zhang, Mingjie Pan, Zoey Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Yandong Guo, Shanghang Zhang

In this paper, we introduce LiDAR-LLM, which takes raw LiDAR data as input and harnesses the remarkable reasoning capabilities of LLMs to gain a comprehensive understanding of outdoor 3D scenes.

Instruction Following Language Modelling +1

Paper
Add Code

FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

no code implementations • 22 Dec 2023 • Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang

In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets.

Ranked #3 on 3D Open-Vocabulary Object Detection on ScanNet on unseen classes

3D Object Detection 3D Open-Vocabulary Object Detection +2

Paper
Add Code

Iterative Prompt Relabeling for diffusion model with RLDF

no code implementations • 23 Dec 2023 • Jiaxin Ge, Xinyan Chen, Tianjun Zhang, Shanghang Zhang

IP-RLDF first samples a batch of images conditioned on the text, then relabels the text prompts of unmatched text-image pairs with classifier feedback.

Image Generation reinforcement-learning +2

Paper
Add Code

Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

no code implementations • 27 Dec 2023 • Rongyu Zhang, Yulin Luo, Jiaming Liu, Huanrui Yang, Zhen Dong, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Yuan Du, Shanghang Zhang

In this work, we propose an efficient MoE architecture with weight sharing across the experts.

Image Restoration Multi-Task Learning

Paper
Add Code

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations • 26 Dec 2023 • Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

Paper
Add Code

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

no code implementations • 5 Jan 2024 • Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu

In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers agents to adeptly traverse unfamiliar environments and locate objects from novel categories without prior explicit training.

Language Modelling Large Language Model

Paper
Add Code

RustNeRF: Robust Neural Radiance Field with Low-Quality Images

no code implementations • 6 Jan 2024 • Mengfei Li, Ming Lu, Xiaofang Li, Shanghang Zhang

First, existing methods assume enough high-quality images are available for training the NeRF model, ignoring real-world image degradation.

Novel View Synthesis

Paper
Add Code

VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

no code implementations • 15 Jan 2024 • Rongyu Zhang, Zefan Cai, Huanrui Yang, Zidong Liu, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Baobao Chang, Yuan Du, Li Du, Shanghang Zhang

Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks.

Computational Efficiency Image Classification

Paper
Add Code

Building Flexible Machine Learning Models for Scientific Computing at Scale

no code implementations • 25 Feb 2024 • Tianyu Chen, Haoyi Zhou, Ying Li, Hao Wang, Chonghan Gao, Shanghang Zhang, JianXin Li

Foundation models have revolutionized knowledge acquisition across domains, and our study introduces OmniArch, a paradigm-shifting approach designed for building foundation models in multi-physics scientific computing.

Zero-Shot Learning

Paper
Add Code

A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

no code implementations • 27 Feb 2024 • Zehui Chen, Qiuchen Wang, Zhenyu Li, Jiaming Liu, Shanghang Zhang, Feng Zhao

In this report, we present our solution to the multi-task robustness track of the 1st Visual Continual Learning (VCL) Challenge at ICCV 2023 Workshop.

3D Object Detection Continual Learning +5

Paper
Add Code

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

no code implementations • 29 Feb 2024 • Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu

Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI.

Attribute Collision Avoidance +2

Paper
Add Code

A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

no code implementations • 4 Jan 2024 • Rui Ma, Qiang Zhou, Bangjun Xiao, Yizhu Jin, Daquan Zhou, Xiuyu Li, Aishani Singh, Yi Qu, Kurt Keutzer, Xiaodong Xie, Jingtong Hu, Zhen Dong, Shanghang Zhang

Copyright is a legal right that grants creators the exclusive authority to reproduce, distribute, and profit from their creative works.

Text-to-Image Generation

Paper
Add Code

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

no code implementations • 21 Mar 2024 • Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, Shanghang Zhang

Second, we propose an instruction-guided latent fusion that pastes the multi-layered latent representations onto a canvas latent.

Text-to-Image Generation

Paper
Add Code

Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

no code implementations • 22 Mar 2024 • Hongzhi Gao, Zheng Chen, Zehui Chen, Lin Chen, Jiaming Liu, Shanghang Zhang, Feng Zhao

Training high-accuracy 3D detectors necessitates massive labeled 3D annotations with 7 degree-of-freedom, which is laborious and time-consuming.

3D Object Detection object-detection +2

Paper
Add Code

SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

no code implementations • 10 Apr 2024 • Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang

Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs.

Novel View Synthesis

Paper
Add Code

Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

no code implementations • 13 Apr 2024 • Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang

Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.