Search Results for author: Shengchuan Zhang

Found 36 papers, 22 papers with code

Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting

no code implementations30 Jan 2025 Yansong Qu, Dian Chen, Xinyang Li, Xiaofan Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

It enables users to conveniently specify the desired editing region and the desired dragging direction through the input of 3D masks and pairs of control points, thereby enabling precise control over the extent of editing.

3DGS 3D scene Editing

Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

1 code implementation16 Dec 2024 Hongwei Niu, Linhuang Xie, Jianghang Lin, Shengchuan Zhang

To address these challenges, we introduce a novel framework, named SCSD for Semantic Consistency prediction and Style Diversity generalization.

Diversity Semantic Segmentation

Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection

no code implementations11 Dec 2024 Xin Chen, Liujuan Cao, Shengchuan Zhang, Xiewu Zheng, Yan Zhang

In this paper, we propose Recalibrating Attention of Industrial Anomaly Detection (RAAD), a framework that systematically decomposes and recalibrates attention maps.

Anomaly Detection Computational Efficiency +1

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

1 code implementation11 Dec 2024 Hongwei Niu, Jie Hu, Jianghang Lin, Guannan Jiang, Shengchuan Zhang

To the best of our knowledge, EOV-Seg is the first open-vocabulary panoptic segmentation framework towards efficiency, which runs faster and achieves competitive performance compared with state-of-the-art methods.

Decoder Open Vocabulary Panoptic Segmentation +2

CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection

no code implementations15 Aug 2024 Xunfa Lai, Zhiyu Yang, Jie Hu, Shengchuan Zhang, Liujuan Cao, Guannan Jiang, Zhiyu Wang, Songan Zhang, Rongrong Ji

Existing camouflaged object detection~(COD) methods depend heavily on large-scale pixel-level annotations. However, acquiring such annotations is laborious due to the inherent camouflage characteristics of the objects. Semi-supervised learning offers a promising solution to this challenge. Yet, its application in COD is hindered by significant pseudo-label noise, both pixel-level and instance-level. We introduce CamoTeacher, a novel semi-supervised COD framework, utilizing Dual-Rotation Consistency Learning~(DRCL) to effectively address these noise issues. Specifically, DRCL minimizes pseudo-label noise by leveraging rotation views' consistency in pixel-level and instance-level. First, it employs Pixel-wise Consistency Learning~(PCL) to deal with pixel-level noise by reweighting the different parts within the pseudo-label. Second, Instance-wise Consistency Learning~(ICL) is used to adjust weights for pseudo-labels, which handles instance-level noise. Extensive experiments on four COD benchmark datasets demonstrate that the proposed CamoTeacher not only achieves state-of-the-art compared with semi-supervised learning methods, but also rivals established fully-supervised learning methods. Our code will be available soon.

object-detection Object Detection +1

HRSAM: Efficient Interactive Segmentation in High-Resolution Images

1 code implementation2 Jul 2024 You Huang, Wenbin Lai, Jiayi Ji, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

Within FLA, we implement Flash Swin attention, achieving over a 35% speedup compared to traditional Swin attention, and propose a KV-only padding mechanism to enhance extrapolation.

Data Augmentation Interactive Segmentation +2

Local Manifold Learning for No-Reference Image Quality Assessment

no code implementations27 Jun 2024 Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

This crop is then used to cluster other crops from the same image as the positive class, while crops from different images are treated as negative classes to increase inter-class distance.

Contrastive Learning NR-IQA

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

1 code implementation25 Jun 2024 Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions.

3D Generation Denoising +2

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

1 code implementation CVPR 2024 You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji

First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.

Decoder Interactive Segmentation +2

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

no code implementations27 May 2024 Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane.

3DGS feature selection +3

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

no code implementations16 May 2024 Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

We present Dual3D, a novel text-to-3D generation framework that generates high-quality 3D assets from texts in only $1$ minute. The key component is a dual-mode multi-view latent diffusion model.

3D Generation Denoising +1

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code implementations24 Apr 2024 Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Decision Making Logical Reasoning +1

Multi-Modal Prompt Learning on Blind Image Quality Assessment

1 code implementation23 Apr 2024 Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.

NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection

no code implementations22 Apr 2024 Chi Huang, Xinyang Li, Yansong Qu, Changli Wu, Xiaofan Li, Shengchuan Zhang, Liujuan Cao

Previous works (e. g, NeRF-Det) have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task in indoor scenes with high amount of overlap between input images.

3D Object Detection NeRF +3

DMAD: Dual Memory Bank for Real-World Anomaly Detection

no code implementations19 Mar 2024 Jianlong Hu, Xu Chen, Zhenye Gan, Jinlong Peng, Shengchuan Zhang, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Liujuan Cao, Rongrong Ji

To address the challenge of real-world anomaly detection, we propose a new framework named Dual Memory bank enhanced representation learning for Anomaly Detection (DMAD).

Anomaly Detection Representation Learning

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

1 code implementation19 Feb 2024 Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16. 88, surpassing the state-of-the-art DSnoT with a perplexity of 75. 14.

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

no code implementations11 Dec 2023 Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji

Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task's features by reducing sensitivity to adversarial noise perturbation.

Contrastive Learning feature selection +1

Pseudo-label Alignment for Semi-supervised Instance Segmentation

1 code implementation ICCV 2023 Jie Hu, Chen Chen, Liujuan Cao, Shengchuan Zhang, Annan Shu, Guannan Jiang, Rongrong Ji

Through extensive experiments conducted on the COCO and Cityscapes datasets, we demonstrate that PAIS is a promising framework for semi-supervised instance segmentation, particularly in cases where labeled data is severely limited.

Instance Segmentation Pseudo Label +3

Geometric-aware Pretraining for Vision-centric 3D Object Detection

1 code implementation6 Apr 2023 Linyan Huang, Huijie Wang, Jia Zeng, Shengchuan Zhang, Liujuan Cao, Junchi Yan, Hongyang Li

We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach.

3D Object Detection Autonomous Driving +2

You Only Segment Once: Towards Real-Time Panoptic Segmentation

2 code implementations CVPR 2023 Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao

To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation.

Decoder Panoptic Segmentation +1

DistilPose: Tokenized Pose Regression with Heatmap Distillation

1 code implementation CVPR 2023 Suhang Ye, Yingyi Zhang, Jie Hu, Liujuan Cao, Shengchuan Zhang, Lei Shen, Jun Wang, Shouhong Ding, Rongrong Ji

Specifically, DistilPose maximizes the transfer of knowledge from the teacher model (heatmap-based) to the student model (regression-based) through Token-distilling Encoder (TDE) and Simulated Heatmaps.

Knowledge Distillation Pose Estimation +1

Automatic Network Pruning via Hilbert-Schmidt Independence Criterion Lasso under Information Bottleneck Principle

1 code implementation ICCV 2023 Song Guo, Lei Zhang, Xiawu Zheng, Yan Wang, Yuchao Li, Fei Chao, Chenglin Wu, Shengchuan Zhang, Rongrong Ji

In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach.

Network Pruning

ISTR: End-to-End Instance Segmentation with Transformers

1 code implementation3 May 2021 Jie Hu, Liujuan Cao, Yao Lu, Shengchuan Zhang, Yan Wang, Ke Li, Feiyue Huang, Ling Shao, Rongrong Ji

However, such an upgrade is not applicable to instance segmentation, due to its significantly higher output dimensions compared to object detection.

Instance Segmentation object-detection +3

Image-to-image Translation via Hierarchical Style Disentanglement

1 code implementation CVPR 2021 Xinyang Li, Shengchuan Zhang, Jie Hu, Liujuan Cao, Xiaopeng Hong, Xudong Mao, Feiyue Huang, Yongjian Wu, Rongrong Ji

Recently, image-to-image translation has made significant progress in achieving both multi-label (\ie, translation conditioned on different labels) and multi-style (\ie, generation with diverse styles) tasks.

Disentanglement Multimodal Unsupervised Image-To-Image Translation +1

Architecture Disentanglement for Deep Neural Networks

1 code implementation ICCV 2021 Jie Hu, Liujuan Cao, Qixiang Ye, Tong Tong, Shengchuan Zhang, Ke Li, Feiyue Huang, Rongrong Ji, Ling Shao

Based on the experimental results, we present three new findings that provide fresh insights into the inner logic of DNNs.

AutoML Disentanglement

Scoot: A Perceptual Metric for Facial Sketches

1 code implementation ICCV 2019 Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L. Rosin, Rongrong Ji

In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics.

Face Sketch Synthesis SSIM

Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning

1 code implementation29 Apr 2019 Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji

Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation.

Attribute Disentanglement +2

Towards Visual Feature Translation

1 code implementation CVPR 2019 Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, Qi Tian

In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems.

Translation

Generative Adversarial Learning Towards Fast Weakly Supervised Detection

no code implementations CVPR 2018 Yunhan Shen, Rongrong Ji, Shengchuan Zhang, WangMeng Zuo, Yan Wang

Without the need of annotating bounding boxes, the existing methods usually follow a two/multi-stage pipeline with an online compulsive stage to extract object proposals, which is an order of magnitude slower than fast fully supervised object detectors such as SSD [31] and YOLO [34].

Object object-detection +1

CerfGAN: A Compact, Effective, Robust, and Fast Model for Unsupervised Multi-Domain Image-to-Image Translation

no code implementations28 May 2018 Xiao Liu, Shengchuan Zhang, Hong Liu, Xin Liu, Cheng Deng, Rongrong Ji

In principle, CerfGAN contains a novel component, i. e., a multi-class discriminator (MCD), which gives the model an extremely powerful ability to match multiple translation mappings.

Attribute Decoder +5

Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

1 code implementation9 Apr 2018 Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Ming-Ming Cheng, Bo Ren, Rongrong Ji, Paul L. Rosin

However, human perception of the similarity of two sketches will consider both structure and texture as essential factors and is not sensitive to slight ("pixel-level") mismatches.

Face Sketch Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.