Search Results for author: Jian Zhang

Found 330 papers, 121 papers with code

Multi-Frame Blind Manifold Deconvolution for Rotating Synthetic Aperture Imaging

no code implementations31 Jan 2025 Dao Lin, Jian Zhang, Martin Benning

Rotating synthetic aperture (RSA) imaging system captures images of the target scene at different rotation angles by rotating a rectangular aperture.

Deblurring

AutoG: Towards automatic graph construction from tabular data

no code implementations25 Jan 2025 Zhikai Chen, Han Xie, Jian Zhang, Xiang Song, Jiliang Tang, Huzefa Rangwala, George Karypis

The absence of dedicated datasets to formalize and evaluate the effectiveness of graph construction methods, and 2.

graph construction

Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs

no code implementations24 Jan 2025 Hang Luo, Jian Zhang, Chujun Li

In knowledge-intensive tasks, especially in high-stakes domains like medicine and law, it is critical not only to retrieve relevant information but also to provide causal reasoning and explainability.

Knowledge Graphs Natural Language Understanding +5

Text-to-Image GAN with Pretrained Representations

no code implementations30 Dec 2024 Xiaozhou You, Jian Zhang

On the zero-shot text-to-image synthesis task, we achieve comparable performance with fewer model parameters, smaller training data size and faster inference speed.

Domain Generalization Image Generation +1

C2F-TP: A Coarse-to-Fine Denoising Framework for Uncertainty-Aware Trajectory Prediction

1 code implementation17 Dec 2024 Zichen Wang, Hao Miao, Senzhang Wang, Renzhi Wang, Jianxin Wang, Jian Zhang

Accurately predicting the trajectory of vehicles is critically important for ensuring safety and reliability in autonomous driving.

Autonomous Driving Denoising +2

OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation

no code implementations12 Dec 2024 Weiqi Li, Shijie Zhao, Chong Mou, Xuhan Sheng, Zhenyu Zhang, Qian Wang, Junlin Li, Li Zhang, Jian Zhang

As virtual reality gains popularity, the demand for controllable creation of immersive and dynamic omnidirectional videos (ODVs) is increasing.

Image to Video Generation

RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

no code implementations11 Dec 2024 Xuhan Sheng, Runyi Li, Bin Chen, Weiqi Li, Xu Jiang, Jian Zhang

Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), addressing the growing demand for detailed visual content across a $180^{\circ}\times360^{\circ}$ viewport.

Denoising Image Super-Resolution

RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians

1 code implementation3 Dec 2024 Qiankun Gao, Yanmin Wu, Chengxiang Wen, Jiarui Meng, Luyang Tang, Jie Chen, Ronggang Wang, Jian Zhang

Finally, we jointly learn the scene's temporal motion and refine the canonical Gaussians learned from the first two stages.

3DGS

CPA: Camera-pose-awareness Diffusion Transformer for Video Generation

no code implementations2 Dec 2024 Yuelei Wang, Jian Zhang, PengTao Jiang, Hao Zhang, Jinwei Chen, Bo Li

Despite the significant advancements made by Diffusion Transformer (DiT)-based methods in video generation, there remains a notable gap with controllable camera pose perspectives.

Text-to-Video Generation Video Generation

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking

no code implementations2 Dec 2024 Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang

To address these challenges, we propose OmniGuard, a novel augmented versatile watermarking approach that integrates proactive embedding with passive, blind extraction for robust copyright protection and tamper localization.

ARMOR: Egocentric Perception for Humanoid Robot Collision Avoidance and Motion Planning

no code implementations30 Nov 2024 Daehwa Kim, Mario Srouji, Chen Chen, Jian Zhang

We also compare our IL policy against a sampling-based motion planning expert cuRobo, showing 31. 6% less collisions, 16. 9% higher success rate, and 26x reduction in computational latency.

Collision Avoidance Imitation Learning +1

InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception

no code implementations28 Nov 2024 Haijie Li, Yanmin Wu, Jiarui Meng, Qiankun Gao, Zhiyao Zhang, Ronggang Wang, Jian Zhang

3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality.

3DGS Autonomous Driving +4

Practical Compact Deep Compressed Sensing

1 code implementation20 Nov 2024 Bin Chen, Jian Zhang

Recent years have witnessed the success of deep networks in compressed sensing (CS), which allows for a significant reduction in sampling cost and has gained growing attention since its inception.

Adversarial Diffusion Compression for Real-World Image Super-Resolution

1 code implementation20 Nov 2024 Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, Lei Zhang

Real-world image super-resolution (Real-ISR) aims to reconstruct high-resolution images from low-resolution inputs degraded by complex, unknown processes.

Decoder Denoising +1

HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting

1 code implementation12 Nov 2024 Qiankun Gao, Jiarui Meng, Chengxiang Wen, Jie Chen, Jian Zhang

The online reconstruction of dynamic scenes from multi-view streaming videos faces significant challenges in training, rendering and storage efficiency.

3DGS

DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection

no code implementations31 Oct 2024 Fan Nie, Jiangqun Ni, Jian Zhang, Bin Zhang, Weizhe Zhang

Recently, temporal inconsistency clues have been explored to improve the generalizability of deepfake video detection.

Decoder DeepFake Detection +1

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

no code implementations30 Oct 2024 Peide Huang, Yuhan Hu, Nataliya Nechyporenko, Daehwa Kim, Walter Talbott, Jian Zhang

This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication.

Diversity In-Context Learning

Local Policies Enable Zero-shot Long-horizon Manipulation

no code implementations29 Oct 2024 Murtaza Dalal, Min Liu, Walter Talbott, Chen Chen, Deepak Pathak, Jian Zhang, Ruslan Salakhutdinov

We transfer our local policies from simulation to reality and observe they can solve unseen long-horizon manipulation tasks with up to 8 stages with significant pose, object and scene configuration variation.

Motion Planning

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

1 code implementation24 Oct 2024 Zhiwen Fan, Jian Zhang, Wenyan Cong, Peihao Wang, Renjie Li, Kairun Wen, Shijie Zhou, Achuta Kadambi, Zhangyang Wang, Danfei Xu, Boris Ivanovic, Marco Pavone, Yue Wang

To tackle the scarcity of labeled 3D semantic data and enable natural language-driven scene manipulation, we incorporate a pre-trained 2D language-based segmentation model into a 3D-consistent semantic feature field.

3D Reconstruction Attribute

Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay

no code implementations16 Oct 2024 Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Jian Zhang, Xiaoguang Niu

P2Value comprehensively considers the possibility of transformers' output and pass rate and can make use of the redundant resources caused by the problem that most programs collected by LLMs fail to pass any tests.

Code Generation

Task Consistent Prototype Learning for Incremental Few-shot Semantic Segmentation

no code implementations16 Oct 2024 Wenbo Xu, Yanan Wu, Haoran Jiang, Yang Wang, Qiang Wu, Jian Zhang

Incremental Few-Shot Semantic Segmentation (iFSS) tackles a task that requires a model to continually expand its segmentation capability on novel classes using only a few annotated examples.

Few-Shot Semantic Segmentation Incremental Learning +2

Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation

1 code implementation15 Oct 2024 Zhan Fa, Shumeng Li, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi

Dual-teacher models were introduced to address this problem but often neglected the importance of maintaining teacher model diversity, leading to coupling issues among teachers.

Diversity Image Segmentation +3

Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering

1 code implementation12 Oct 2024 Ting Yu, Kunhao Fu, Jian Zhang, Qingming Huang, Jun Yu

Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal reasoning to yield precise answers.

Answer Generation Blocking +3

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

2 code implementations3 Oct 2024 Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang

The rapid development of generative AI is a double-edged sword, which not only facilitates content creation but also makes image manipulation easier and more difficult to detect.

Face Swapping Image Forgery Detection +2

Generative AI Application for Building Industry

no code implementations1 Oct 2024 Hanlong Wan, Jian Zhang, Yan Chen, Weili Xu, Fan Feng

Additionally, the study considers the broader implications of AI integration, including the development of AI-powered tools for comprehensive code compliance across various regulatory domains and the potential for AI to revolutionize workforce training through realistic simulations.

EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion

1 code implementation11 Sep 2024 Jian Zhang, Weijian Mai, Zhijun Zhang

In response to this challenge, we propose EMOdiffhead, a novel method for emotional talking head video generation that not only enables fine-grained control of emotion categories and intensities but also enables one-shot generation.

Portrait Animation Talking Head Generation +1

GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation

1 code implementation21 Aug 2024 Abiao Li, Chenlei Lv, Guofeng Mei, Yifan Zuo, Jian Zhang, Yuming Fang

The proposed network mainly consists of two principal components: a local geometric transformer and a global semantic transformer.

Point Cloud Segmentation Semantic Similarity +1

Breast tumor classification based on self-supervised contrastive learning from ultrasound videos

no code implementations20 Aug 2024 Yunxin Tang, Siyuan Tang, Jian Zhang, Hao Chen

Further, we assessed the dependence of our pretrained model on the number of labeled data and revealed that <100 samples were required to achieve an AUC of 0. 901.

Contrastive Learning Triplet

SkillMimic: Learning Reusable Basketball Skills from Demonstrations

no code implementations12 Aug 2024 Yinhuai Wang, Qihan Zhao, Runyi Yu, Ailing Zeng, Jing Lin, Zhengyi Luo, Hok Wai Tsui, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan

SkillMimic employs a unified configuration to learn diverse skills from human-ball motion datasets, with skill diversity and generalization improving as the dataset grows.

PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source Software

no code implementations24 Jul 2024 Kaixuan Li, Jian Zhang, Sen Chen, Han Liu, Yang Liu, Yixiang Chen

In this paper, we propose PatchFinder, a two-phase framework with end-to-end correlation learning for better-tracing security patches.

Re-Ranking

Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization

no code implementations21 Jul 2024 Jiajun Hu, Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

To address the above issue, we propose Parameter-Efficient Group with Orthogonal regularization (PEGO) for vision transformers, which effectively preserves the generalization ability of the pre-trained network and learns more diverse knowledge compared with conventional PEFT.

Domain Generalization parameter-efficient fine-tuning

Improving AlphaFlow for Efficient Protein Ensembles Generation

no code implementations8 Jul 2024 Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties.

Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

1 code implementation25 Jun 2024 Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang

The strong capability of large language models (LLMs) has been applied to information extraction (IE) through either retrieval augmented prompting or instruction tuning (IT).

named-entity-recognition Named Entity Recognition +2

GraphStorm: all-in-one graph machine learning framework for industry applications

1 code implementation10 Jun 2024 Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis

GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code.

graph construction

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

no code implementations4 Jun 2024 Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang

To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency.

3DGS Object

Real-Time State Modulation and Acquisition Circuit in Neuromorphic Memristive Systems

no code implementations1 Jun 2024 Shengbo Wang, Cong Li, Tongming Pu, Jian Zhang, Weihao Ma, Luigi Occhipinti, Arokia Nathan, Shuo Gao

Memristive neuromorphic systems are designed to emulate human perception and cognition, where the memristor states represent essential historical information to perform both low-level and high-level tasks.

Hybrid Fourier Score Distillation for Efficient One Image to 3D Object Generation

1 code implementation31 May 2024 Shuzhou Yang, Yu Wang, Haijie Li, Jiarui Meng, Yanmin Wu, Xiandong Meng, Jian Zhang

We note that there is a disparity between the generation priors of these two diffusion models, leading to their different appearance outputs.

3D Generation Image to 3D

Protect-Your-IP: Scalable Source-Tracing and Attribution against Personalized Generation

no code implementations26 May 2024 Runyi Li, Xuanyu Zhang, Zhipei Xu, Yongbing Zhang, Jian Zhang

With the advent of personalized generation models, users can more readily create images resembling existing content, heightening the risk of violating portrait rights and intellectual property (IP).

Attribute Incremental Learning

GS-Hider: Hiding Messages into 3D Gaussian Splatting

no code implementations24 May 2024 Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang

Therefore, ensuring the security and fidelity of the original 3D scene while embedding information into the 3DGS point cloud files is an extremely challenging task.

3DGS 3D Scene Reconstruction +4

Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

no code implementations23 May 2024 Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Geguang Pu, Yang Liu

With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance.

ReVideo: Remake a Video with Motion and Content Control

no code implementations22 May 2024 Chong Mou, Mingdeng Cao, Xintao Wang, Zhaoyang Zhang, Ying Shan, Jian Zhang

In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out from existing methods by allowing precise video editing in specific areas through the specification of both content and motion.

Video Editing Video Generation

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

1 code implementation CVPR 2024 Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot, Jian Zhang

In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings.

3D Point Cloud Classification Semantic Segmentation

Diffusion-Based Hierarchical Image Steganography

no code implementations19 May 2024 Youmin Xu, Xuanyu Zhang, Jiwen Yu, Chong Mou, Xiandong Meng, Jian Zhang

This paper introduces Hierarchical Image Steganography, a novel method that enhances the security and capacity of embedding multiple images into a single container using diffusion models.

Image Steganography

3D Shape Augmentation with Content-Aware Shape Resizing

no code implementations15 May 2024 Mingxiang Chen, Jian Zhang, Boli Zhou, Yang song

Recent advancements in deep learning for 3D models have propelled breakthroughs in generation, detection, and scene understanding.

3D Generation Scene Understanding

F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching

no code implementations1 May 2024 Shaoning Li, Yusong Wang, Mingyu Li, Jian Zhang, Bin Shao, Nanning Zheng, Jian Tang

Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties.

ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

no code implementations25 Apr 2024 Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality.

ERP

V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

no code implementations25 Apr 2024 Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

Meanwhile, we introduce a sample-level audio localization method and a cross-modal copyright extraction mechanism to couple the information of audio and video frames.

Video Editing

Face2Face: Label-driven Facial Retouching Restoration

no code implementations22 Apr 2024 Guanhua Zhao, Yu Gu, Xuhan Sheng, Yujie Hu, Jian Zhang

This poses challenges for fields that place high demands on the authenticity of photographs, such as identity verification and social media.

Image Restoration

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

no code implementations16 Apr 2024 Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks.

Denoising Domain Generalization +4

Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

1 code implementation CVPR 2024 Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

To fully utilize the information within the intermediate domain, we propose a symmetric Guidance training strategy (SymGD), which additionally offers direct guidance to unlabeled data by merging pseudo labels from intermediate samples.

Image Segmentation Segmentation +4

BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

no code implementations13 Apr 2024 Jian Zhang, Ruiteng Zhang, Xinyue Yan, Xiting Zhuang, Ruicheng Cao

When training the enhancement branch, the object detection subnet in the enhancement branch guides the image enhancement subnet to be optimized towards the direction that is most conducive to the detection task.

Image Enhancement Object +2

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

1 code implementation29 Mar 2024 Zhiwen Fan, Kairun Wen, Wenyan Cong, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang

InstantSplat adopts a self-supervised framework that bridges the gap between 2D images and 3D representations using Gaussian Bundle Adjustment (GauBA) and can be optimized in an end-to-end manner.

3D Reconstruction Novel View Synthesis +1

Invertible Diffusion Models for Compressed Sensing

1 code implementation25 Mar 2024 Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, Jian Zhang

To enable such memory-intensive end-to-end fine-tuning, we propose a novel two-level invertible design to transform both (1) multi-step sampling process and (2) noise estimation U-Net in each step into invertible networks.

Image Compressed Sensing Image Reconstruction +1

BadEdit: Backdooring large language models by model editing

1 code implementation20 Mar 2024 Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, Yang Liu

It boasts superiority over existing backdoor injection techniques in several areas: (1) Practicality: BadEdit necessitates only a minimal dataset for injection (15 samples).

Backdoor Attack knowledge editing +1

MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models

1 code implementation8 Mar 2024 Zijie Fang, Yifeng Wang, Ye Zhang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing Zhang

To tackle this challenge, we propose a framework named MamMIL for WSI analysis by cooperating the selective structured state space model (i. e., Mamba) with MIL, enabling the modeling of global instance dependencies while maintaining linear complexity.

Mamba Multiple Instance Learning +2

NetInfoF Framework: Measuring and Exploiting Network Usable Information

1 code implementation12 Feb 2024 Meng-Chieh Lee, Haiyang Yu, Jian Zhang, Vassilis N. Ioannidis, Xiang Song, Soji Adeshina, Da Zheng, Christos Faloutsos

Given a node-attributed graph, and a graph task (link prediction or node classification), can we tell if a graph neural network (GNN) will perform well?

Graph Neural Network Link Prediction +2

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

2 code implementations CVPR 2024 Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.

Image Generation

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

no code implementations CVPR 2024 Qian Wang, Weiqi Li, Chong Mou, Xinhua Cheng, Jian Zhang

In this paper, we propose a pipeline named 360-Degree Video Diffusion model (360DVD) for generating 360-degree panoramic videos based on the given prompts and motion conditions.

Video Generation

Super-Resolution Reconstruction from Bayer-Pattern Spike Streams

no code implementations CVPR 2024 Yanchen Dong, Ruiqin Xiong, Jian Zhang, Zhaofei Yu, Xiaopeng Fan, Shuyuan Zhu, Tiejun Huang

Experimental results demonstrate that the proposed scheme can reconstruct satisfactory color images with both high temporal and spatial resolution from low-resolution Bayer-pattern spike streams.

Motion Estimation Super-Resolution

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

1 code implementation CVPR 2024 Rui Zhao, Ruiqin Xiong, Jing Zhao, Jian Zhang, Xiaopeng Fan, Zhaofei Yu, Tiejun Huang

Different from traditional cameras each pixel in spike cameras records the arrival of photons continuously by firing binary spikes at an ultra-fine temporal granularity.

Attribute Image Reconstruction +1

Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy

1 code implementation31 Dec 2023 Weijian Mai, Jian Zhang, Pengfei Fang, Zhijun Zhang

This survey comprehensively examines the emerging field of AIGC-based Brain-conditional Multimodal Synthesis, termed AIGC-Brain, to delineate the current landscape and future directions.

Brain Computer Interface Survey

A Prompt Learning Framework for Source Code Summarization

1 code implementation26 Dec 2023 Tingting Xu, Yun Miao, Chunrong Fang, Hanwei Qian, Xia Feng, Zhenpeng Chen, Chong Wang, Jian Zhang, Weisong Sun, Zhenyu Chen, Yang Liu

Our comprehensive experimental results show that PromptCS significantly outperforms instruction prompting schemes (including zero-shot learning and few-shot learning) on all four widely used metrics, and is comparable to the task-oriented fine-tuning scheme.

Code Summarization Few-Shot Learning +3

Language-Assisted 3D Scene Understanding

no code implementations18 Dec 2023 Yanmin Wu, Qiankun Gao, Renrui Zhang, Jian Zhang

The scale and quality of point cloud datasets constrain the advancement of point cloud learning.

3D Object Detection 3D Semantic Segmentation +6

Neural Video Fields Editing

no code implementations12 Dec 2023 Shuzhou Yang, Chong Mou, Jiwen Yu, YuHan Wang, Xiandong Meng, Jian Zhang

Specifically, we construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames in a memory-efficient manner.

Video Editing

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection

no code implementations CVPR 2024 Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang

In the era where AI-generated content (AIGC) models can produce stunning and lifelike images, the lingering shadow of unauthorized reproductions and malicious tampering poses imminent threats to copyright integrity and information security.

Image Steganography

GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization

1 code implementation8 Dec 2023 Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang

Our method achieves state-of-the-art performance in both relighting and novel view synthesis tasks among the recently proposed inverse rendering methods while achieving real-time rendering.

Disentanglement Inverse Rendering +1

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

no code implementations7 Dec 2023 Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, Lei Zhang

To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills.

Human-Object Interaction Detection Object

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

1 code implementation6 Dec 2023 Jiwen Yu, Xiaodong Cun, Chenyang Qi, Yong Zhang, Xintao Wang, Ying Shan, Jian Zhang

For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation for ensuring the generated first frame is equal to the given generated image.

Image Animation Video Generation

SecureCut: Federated Gradient Boosting Decision Trees with Efficient Machine Unlearning

no code implementations22 Nov 2023 Jian Zhang, Bowen Li Jie Li, Chentao Wu

In response to legislation mandating companies to honor the \textit{right to be forgotten} by erasing user data, it has become imperative to enable data removal in Vertical Federated Learning (VFL) where multiple parties provide private features for model training.

Machine Unlearning Vertical Federated Learning

Generative AIBIM: An automatic and intelligent structural design pipeline integrating BIM and generative AI

1 code implementation7 Nov 2023 Zhili He, Yu-Hsing Wang, Jian Zhang

AI-based structural design represents a transformative approach that addresses the inefficiencies inherent in traditional structural design practices.

Generative Adversarial Network

Constructing Sample-to-Class Graph for Few-Shot Class-Incremental Learning

1 code implementation31 Oct 2023 Fuyuan Hu, Jian Zhang, Fan Lyu, Linyan Li, Fenglei Xu

Moreover, we design a multi-stage strategy for training S2C model, which mitigates the training challenges posed by limited data in the incremental process.

class-incremental learning Few-Shot Class-Incremental Learning +2

Multilevel Perception Boundary-guided Network for Breast Lesion Segmentation in Ultrasound Images

no code implementations23 Oct 2023 Xing Yang, Jian Zhang, Qijian Chen, Li Wang, Lihui Wang

Moreover, to improve the segmentation performance for tumor boundaries, a multi-level boundary-enhanced segmentation (BS) loss is proposed.

Lesion Segmentation Segmentation +1

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

no code implementations18 Oct 2023 Xinhua Cheng, Tianyu Yang, Jianan Wang, Yu Li, Lei Zhang, Jian Zhang, Li Yuan

Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies.

3D Generation Text to 3D

Compatible Transformer for Irregularly Sampled Multivariate Time Series

1 code implementation17 Oct 2023 Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen

To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged.

Time Series

Empirical Study of Zero-Shot NER with ChatGPT

1 code implementation16 Oct 2023 Tingyu Xie, Qi Li, Jian Zhang, Yan Zhang, Zuozhu Liu, Hongwei Wang

Large language models (LLMs) exhibited powerful capability in various natural language processing tasks.

Arithmetic Reasoning named-entity-recognition +3

Deep Unfolding Network for Image Compressed Sensing by Content-adaptive Gradient Updating and Deformation-invariant Non-local Modeling

no code implementations16 Oct 2023 Wenxue Cui, Xiaopeng Fan, Jian Zhang, Debin Zhao

In this paper, inspired by the traditional Proximal Gradient Descent (PGD) algorithm, a novel DUN for image compressed sensing (dubbed DUN-CSNet) is proposed to solve the above two issues.

Image Compressed Sensing

Multimodal Large Language Model for Visual Navigation

no code implementations12 Oct 2023 Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, BoWen Zhang, Jian Zhang

Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems.

Language Modeling Language Modelling +5

IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network

no code implementations26 Sep 2023 Qiao Yang, Yu Zhang, Zijing Zhao, Jian Zhang, Shunli Zhang

Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks.

Infrared And Visible Image Fusion

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion

no code implementations26 Sep 2023 Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i. e., yielding edge-blurring effect or unrecognizable for object detectors.

Infrared And Visible Image Fusion

TextCLIP: Text-Guided Face Image Generation And Manipulation Without Adversarial Training

no code implementations21 Sep 2023 Xiaozhou You, Jian Zhang

Text-guided image generation aimed to generate desired images conditioned on given texts, while text-guided image manipulation refers to semantically edit parts of a given image based on specified texts.

Image Manipulation text-guided-generation

Exploring Flat Minima for Domain Generalization with Large Learning Rates

no code implementations12 Sep 2023 Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

Instead, we observe that leveraging a large learning rate can simultaneously promote weight diversity and facilitate the identification of flat regions in the loss landscape.

Diversity Domain Generalization +1

sasdim: self-adaptive noise scaling diffusion model for spatial time series imputation

no code implementations5 Sep 2023 Shunyang Zhang, Senzhang Wang, Xianzhen Tan, Ruochen Liu, Jian Zhang, Jianxin Wang

Spatial time series imputation is critically important to many real applications such as intelligent transportation and air quality monitoring.

Imputation Time Series

Self-Supervised Scalable Deep Compressed Sensing

1 code implementation26 Aug 2023 Bin Chen, Xuanyu Zhang, Shuai Liu, Yongbing Zhang, Jian Zhang

Compressed sensing (CS) is a promising tool for reducing sampling costs.

Masked Cross-image Encoding for Few-shot Segmentation

no code implementations22 Aug 2023 Wenbo Xu, Huaxi Huang, Ming Cheng, Litao Yu, Qiang Wu, Jian Zhang

Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images.

Few-Shot Semantic Segmentation

DomainAdaptor: A Novel Approach to Test-time Adaptation

1 code implementation ICCV 2023 Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

To deal with the domain shift between training and test samples, current methods have primarily focused on learning generalizable features during training and ignore the specificity of unseen samples that are also critical during the test.

Specificity Test-time Adaptation

DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light Image Enhancement

no code implementations18 Aug 2023 Shuzhou Yang, Xuanyu Zhang, Yinhuai Wang, Jiwen Yu, YuHan Wang, Jian Zhang

Specifically, we adopt a naive unsupervised enhancement algorithm to realize preliminary restoration and design two zero-shot plug-and-play modules based on diffusion model to improve generalization and effectiveness.

Denoising Low-Light Image Enhancement

Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization

1 code implementation ICCV 2023 Xiran Wang, Jian Zhang, Lei Qi, Yinghuan Shi

Domain generalization (DG) is proposed to deal with the issue of domain shift, which occurs when statistical differences exist between source and target domains.

Domain Generalization Meta-Learning

EFLNet: Enhancing Feature Learning for Infrared Small Target Detection

1 code implementation27 Jul 2023 Bo Yang, Xinyu Zhang, Jian Zhang, Jun Luo, Mingliang Zhou, Yangjun Pi

To address this problem, we propose a new adaptive threshold focal loss (ATFL) function that decouples the target and the background, and utilizes the adaptive mechanism to adjust the loss weight to force the model to allocate more attention to target features.

regression

Deep Physics-Guided Unrolling Generalization for Compressed Sensing

1 code implementation18 Jul 2023 Bin Chen, Jiechong Song, Jingfen Xie, Jian Zhang

By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction.

Image Compressed Sensing Image Reconstruction

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

2 code implementations5 Jul 2023 Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang

Specifically, we construct classifier guidance based on the strong correspondence of intermediate features in the diffusion model.

Object

HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image

1 code implementation30 Jun 2023 Zhuchen Shao, Yang Chen, Hao Bian, Jian Zhang, Guojun Liu, Yongbing Zhang

Many studies adopt random sampling pre-processing strategy and WSI-level aggregation models, which inevitably lose critical prognostic information in the patient-level bag.

Multiple Instance Learning Survival Prediction +1

Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing

1 code implementation28 Jun 2023 Jiechong Song, Bin Chen, Jian Zhang

Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance.

Compressive Sensing

Infrastructure Crack Segmentation: Boundary Guidance Method and Benchmark Dataset

1 code implementation15 Jun 2023 Zhili He, Wang Chen, Jian Zhang, Yu-Hsing Wang

Cracks provide an essential indicator of infrastructure performance degradation, and achieving high-precision pixel-level crack segmentation is an issue of concern.

Crack Segmentation Segmentation

CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography

1 code implementation NeurIPS 2023 Jiwen Yu, Xuanyu Zhang, Youmin Xu, Jian Zhang

Current image steganography techniques are mainly focused on cover-based methods, which commonly have the risk of leaking secret images and poor robustness against degraded container images.

Diversity Image Steganography

On the Tool Manipulation Capability of Open-source Large Language Models

1 code implementation25 May 2023 Qiantong Xu, Fenglu Hong, Bo Li, Changran Hu, Zhengyu Chen, Jian Zhang

In this paper, we ask can we enhance open-source LLMs to be competitive to leading closed LLM APIs in tool manipulation, with practical amount of human supervision.

Cross-source Point Cloud Registration: Challenges, Progress and Prospects

no code implementations23 May 2023 Xiaoshui Huang, Guofeng Mei, Jian Zhang

The emerging topic of cross-source point cloud (CSPC) registration has attracted increasing attention with the fast development background of 3D sensor technologies.

Point Cloud Registration

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

no code implementations12 May 2023 Yanmin Wu, Yunzhou Zhang, Delong Zhu, Zhiqiang Deng, Wenkai Sun, Xin Chen, Jian Zhang

Taking into consideration the semantic invariance of objects, we convert the object map to a topological map to provide semantic descriptors to enable multi-map matching.

Decision Making Object +2

Single Node Injection Label Specificity Attack on Graph Neural Networks via Reinforcement Learning

no code implementations4 May 2023 Dayuan Chen, Jian Zhang, Yuqian Lv, Jinhuan Wang, Hongjie Ni, Shanqing Yu, Zhen Wang, Qi Xuan

Furthermore, most methods concentrate on a single attack goal and lack a generalizable adversary to develop distinct attack strategies for diverse goals, thus limiting precise control over victim model behavior in real-world scenarios.

Specificity

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention

1 code implementation Tiny Papers @ ICLR 2023 Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You

We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification.

Dialogue Act Classification Dialogue Understanding +2

Optimization-Inspired Cross-Attention Transformer for Compressive Sensing

1 code implementation CVPR 2023 Jiechong Song, Chong Mou, Shiqi Wang, Siwei Ma, Jian Zhang

And, PGCA block achieves an enhanced information interaction, which introduces the inertia force into the gradient descent step through a cross attention block.

Compressive Sensing

OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

no code implementations26 Apr 2023 Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Model A aims to enhance the feature extraction ability of 360{\deg} image positional information, while Model B further focuses on the high-frequency information of 360{\deg} images.

Image Super-Resolution Position

Large-capacity and Flexible Video Steganography via Invertible Neural Network

1 code implementation CVPR 2023 Chong Mou, Youmin Xu, Jiechong Song, Chen Zhao, Bernard Ghanem, Jian Zhang

For large-capacity, we present a reversible pipeline to perform multiple videos hiding and recovering through a single invertible neural network (INN).

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

1 code implementation ICCV 2023 Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang

1) Learning: the pre-trained model adapts to the new task by tuning an online PET module, along with our adaptation speed calibration to align different PET modules, 2) Accumulation: the task-specific knowledge learned by the online PET module is accumulated into an offline PET module through momentum update, 3) Ensemble: During inference, we respectively construct two experts with online/offline PET modules (which are favored by the novel/historical tasks) for prediction ensemble.

Continual Learning

Progressive Content-aware Coded Hyperspectral Compressive Imaging

no code implementations17 Mar 2023 Xuanyu Zhang, Bin Chen, Wenzhen Zou, Shuai Liu, Yongbing Zhang, Ruiqin Xiong, Jian Zhang

Hyperspectral imaging plays a pivotal role in a wide range of applications, like remote sensing, medicine, and cytology.

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

1 code implementation ICCV 2023 Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang

In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions.

Face Detection

Unlimited-Size Diffusion Restoration

1 code implementation1 Mar 2023 Yinhuai Wang, Jiwen Yu, Runyi Yu, Jian Zhang

Our simple, parameter-free approaches can be used not only for image restoration but also for image generation of unlimited sizes, with the potential to be a general tool for diffusion models.

Image Generation Image Restoration

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

2 code implementations16 Feb 2023 Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, XiaoHu Qie

In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly.

Image Generation Style Transfer

Cross-domain recommendation via user interest alignment

no code implementations26 Jan 2023 Chuang Zhao, Hongke Zhao, Ming He, Jian Zhang, Jianping Fan

Specifically, we first construct a unified cross-domain heterogeneous graph and redefine the message passing mechanism of graph convolutional networks to capture high-order similarity of users and items across domains.

Recommendation Systems

Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation

no code implementations ICCV 2023 Wenjie Wei, Malu Zhang, Hong Qu, Ammar Belatreche, Jian Zhang, Hong Chen

As a temporal encoding scheme for SNNs, Time-To-First-Spike (TTFS) encodes information using the timing of a single spike, which allows spiking neurons to transmit information through sparse spike trains and results in lower power consumption and higher computational efficiency compared to traditional rate-based encoding counterparts.

Computational Efficiency Image Classification

Panoptic Compositional Feature Field for Editable Scene Rendering With Network-Inferred Labels via Metric Learning

no code implementations CVPR 2023 Xinhua Cheng, Yanmin Wu, Mengxi Jia, Qian Wang, Jian Zhang

In this work, we attempt to learn an object-compositional neural implicit representation for editable scene rendering by leveraging labels inferred from the off-the-shelf 2D panoptic segmentation networks instead of the ground truth annotations.

2D Panoptic Segmentation Metric Learning +2

Latent Evolution Model for Change Point Detection in Time-varying Networks

no code implementations17 Dec 2022 Yongshun Gong, Xue Dong, Jian Zhang, Meng Chen

Our method focuses on learning the low-dimensional representations of networks and capturing the evolving patterns of these learned latent representations simultaneously.

Change Point Detection Prediction

Position Embedding Needs an Independent Layer Normalization

1 code implementation10 Dec 2022 Runyi Yu, Zhennan Wang, Yinhuai Wang, Kehan Li, Yian Zhao, Jian Zhang, Guoli Song, Jie Chen

By analyzing the input and output of each encoder layer in VTs using reparameterization and visualization, we find that the default PE joining method (simply adding the PE and patch embedding together) operates the same affine transformation to token embedding and PE, which limits the expressiveness of PE and hence constrains the performance of VTs.

Position

Self-Supervised Object Goal Navigation with In-Situ Finetuning

no code implementations9 Dec 2022 So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang

In contrast, our LocCon shows the most robust transfer in the real world among the set of models we compare to, and that the real-world performance of all models can be further improved with self-supervised LocCon in-situ training.

Contrastive Learning Navigate +2

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

4 code implementations1 Dec 2022 Yinhuai Wang, Jiwen Yu, Jian Zhang

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators.

Colorization Deblurring +7

GAN Prior based Null-Space Learning for Consistent Super-Resolution

1 code implementation24 Nov 2022 Yinhuai Wang, Yujie Hu, Jiwen Yu, Jian Zhang

Consistency and realness have always been the two critical issues of image super-resolution.

Image Super-Resolution

Complementary Labels Learning with Augmented Classes

no code implementations19 Nov 2022 Zhongnian Li, Jian Zhang, Mengting Xu, Xinzheng Xu, Daoqiang Zhang

In this paper, we propose a novel problem setting called Complementary Labels Learning with Augmented Classes (CLLAC), which brings the challenge that classifiers trained by complementary labels should not only be able to classify the instances from observed classes accurately, but also recognize the instance from the Augmented Classes in the testing phase.

Masked Vision-Language Transformers for Scene Text Recognition

1 code implementation9 Nov 2022 Jie Wu, Ying Peng, Shengming Zhang, Weigang Qi, Jian Zhang

MVLT is trained in two stages: in the first stage, we design a STR-tailored pretraining method based on a masking strategy; in the second stage, we fine-tune our model and adopt an iterative correction method to improve the performance.

Decoder Scene Text Recognition

Multi-Agent Automated Machine Learning

no code implementations CVPR 2023 Zhaozhi Wang, Kefan Su, Jian Zhang, Huizhu Jia, Qixiang Ye, Xiaodong Xie, Zongqing Lu

In this paper, we propose multi-agent automated machine learning (MA2ML) with the aim to effectively handle joint optimization of modules in automated machine learning (AutoML).

Data Augmentation Multi-agent Reinforcement Learning +1

Overlap-guided Gaussian Mixture Models for Point Cloud Registration

1 code implementation17 Oct 2022 Guofeng Mei, Fabio Poiesi, Cristiano Saltori, Jian Zhang, Elisa Ricci, Nicu Sebe

Probabilistic 3D point cloud registration methods have shown competitive performance in overcoming noise, outliers, and density variations.

Point Cloud Registration

Data Augmentation-free Unsupervised Learning for 3D Point Cloud Understanding

1 code implementation6 Oct 2022 Guofeng Mei, Cristiano Saltori, Fabio Poiesi, Jian Zhang, Elisa Ricci, Nicu Sebe, Qiang Wu

Unsupervised learning on 3D point clouds has undergone a rapid evolution, especially thanks to data augmentation-based contrastive methods.

3D Object Classification Contrastive Learning +3