Search Results for author: Wenguan Wang

Found 92 papers, 61 papers with code

Saliency-Aware Geodesic Video Object Segmentation

1 code implementation CVPR 2015 Wenguan Wang, Jianbing Shen, Fatih Porikli

Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background.

Ranked #5 on Video Salient Object Detection on DAVSOD-Difficult20 (using extra training data)

Object Segmentation +3

Video Salient Object Detection via Fully Convolutional Networks

no code implementations2 Feb 2017 Wenguan Wang, Jianbing Shen, Ling Shao

This paper proposes a deep learning model to efficiently detect salient regions in videos.

Data Augmentation Object +4

Super-Trajectory for Video Segmentation

no code implementations ICCV 2017 Wenguan Wang, Jianbing Shen, Jianwen Xie, Fatih Porikli

We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory".

Clustering Segmentation +2

Selective Video Object Cutout

no code implementations28 Feb 2017 Wenguan Wang, Jianbing Shen, Fatih Porikli

Conventional video segmentation approaches rely heavily on appearance models.

Computational Efficiency Object +3

Deep Visual Attention Prediction

1 code implementation journal 2017 Wenguan Wang, Jianbing Shen

Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields.

Saliency Prediction

Deep Cropping via Attention Box Prediction and Aesthetics Assessment

no code implementations ICCV 2017 Wenguan Wang, Jianbing Shen

We model the photo cropping problem as a cascade of attention box regression and aesthetic quality classification, based on deep learning.

Examining CNN Representations with respect to Dataset Bias

no code implementations29 Oct 2017 Quanshi Zhang, Wenguan Wang, Song-Chun Zhu

We aim to discover representation flaws caused by potential dataset bias.

Attribute

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

1 code implementation CVPR 2018 Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji

Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments.

Video Saliency Detection

Learning Descriptor Networks for 3D Shape Synthesis and Analysis

1 code implementation CVPR 2018 Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu

This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.

Object

Optimizing the F-measure for Threshold-free Salient Object Detection

no code implementations ICCV 2019 Kai Zhao, Shang-Hua Gao, Wenguan Wang, Ming-Ming Cheng

By reformulating the standard F-measure we propose the relaxed F-measure which is differentiable w. r. t the posterior and can be easily appended to the back of CNNs as the loss function.

object-detection RGB Salient Object Detection +1

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

1 code implementation CVPR 2018 Wenguan Wang, Yuanlu Xu, Jianbing Shen, Song-Chun Zhu

This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e. g., fashion landmark localization and clothing category classification.

General Classification

Inferring Shared Attention in Social Scene Videos

no code implementations CVPR 2018 Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, Song-Chun Zhu

We collect a new dataset VideoCoAtt from public TV show videos, containing 380 complex video sequences with more than 492, 000 frames that include diverse social scenes for shared attention study.

Scene Understanding

Salient Object Detection Driven by Fixation Prediction

1 code implementation CVPR 2018 Wenguan Wang, Jianbing Shen, Xingping Dong, Ali Borji

Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner.

Object object-detection +3

Learning Human-Object Interactions by Graph Parsing Neural Networks

1 code implementation ECCV 2018 Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.

Human-Object Interaction Detection Object

Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection

1 code implementation ECCV 2018 Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam

This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).

 Ranked #1 on Video Salient Object Detection on UVSD (using extra training data)

Object object-detection +5

Salient Object Detection in the Deep Learning Era: An In-Depth Survey

1 code implementation19 Apr 2019 Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, Ruigang Yang

As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years.

Attribute Object +4

An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection

no code implementations CVPR 2019 Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ling Shao

The top-down process is used for coarse-to-fine saliency estimation, where high-level saliency is gradually integrated with finer lower-layer features to obtain a fine-grained result.

object-detection RGB Salient Object Detection +2

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

1 code implementation ICCV 2019 Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu

This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.

Improving Neural Machine Translation by Achieving Knowledge Transfer with Sentence Alignment Learning

no code implementations CONLL 2019 Xuewen Shi, He-Yan Huang, Wenguan Wang, Ping Jian, Yi-Kun Tang

To alleviate this problem, we propose an NMT approach that heightens the adequacy in machine translation by transferring the semantic knowledge learned from bilingual sentence alignment.

Machine Translation NMT +5

Learning Compositional Neural Information Fusion for Human Parsing

1 code implementation ICCV 2019 Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao

The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively.

Human Parsing

Human-Aware Motion Deblurring

1 code implementation ICCV 2019 Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao

This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG).

Deblurring Image Deblurring

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

1 code implementation ICCV 2019 Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, Ling Shao

Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation.

Segmentation Semantic Segmentation +4

Cascaded Human-Object Interaction Recognition

1 code implementation CVPR 2020 Tianfei Zhou, Wenguan Wang, Siyuan Qi, Haibin Ling, Jianbing Shen

The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction.

Human-Object Interaction Detection Object +1

Hierarchical Human Parsing with Typed Part-Relation Reasoning

1 code implementation CVPR 2020 Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, Ling Shao

As human bodies are underlying hierarchically structured, how to model human structures is the central theme in this task.

Human Parsing Relation

Learning Video Object Segmentation from Unlabeled Videos

1 code implementation CVPR 2020 Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi

We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.

Object Representation Learning +6

A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

1 code implementation CVPR 2020 Junbo Yin, Wenguan Wang, Qinghao Meng, Ruigang Yang, Jianbing Shen

In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure.

Metric Learning Multi-Object Tracking +3

Video Object Segmentation with Episodic Graph Memory Networks

1 code implementation ECCV 2020 Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, Luc van Gool

How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation.

Object Segmentation +4

Active Visual Information Gathering for Vision-Language Navigation

1 code implementation ECCV 2020 Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.

Vision-Language Navigation

Weakly Supervised 3D Object Detection from Lidar Point Cloud

1 code implementation ECCV 2020 Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc van Gool, Dengxin Dai

This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances.

3D Object Detection Object +1

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

5 code implementations ICCV 2021 Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool

Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.

Metric Learning Optical Character Recognition (OCR) +3

Structured Scene Memory for Vision-Language Navigation

1 code implementation CVPR 2021 Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen

Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i. e., entailing an agent to navigate 3D environments through following linguistic instructions.

Decision Making Navigate +1

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Face Forensics in the Wild

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, Jianbing Shen

On existing public benchmarks, face forgery detection techniques have achieved great success.

Multiple Instance Learning

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations CVPR 2021 Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

feature selection Referring Expression Segmentation

Collaborative Visual Navigation

1 code implementation2 Jul 2021 Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, LiWei Wang

As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques.

Multi-agent Reinforcement Learning Navigate +1

A Survey on Deep Learning Technique for Video Segmentation

1 code implementation2 Jul 2021 Tianfei Zhou, Fatih Porikli, David Crandall, Luc van Gool, Wenguan Wang

Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing.

Autonomous Driving Scene Understanding +4

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations22 Mar 2022 Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Object Segmentation +3

Visual Abductive Reasoning

1 code implementation CVPR 2022 Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.

Benchmarking Sentence +1

Deep Hierarchical Semantic Segmentation

2 code implementations CVPR 2022 Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang

In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.

Multi-Label Classification Segmentation +1

Rethinking Semantic Segmentation: A Prototype View

1 code implementation CVPR 2022 Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc van Gool

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes.

Segmentation Semantic Segmentation

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

1 code implementation CVPR 2022 Hanqing Wang, Wei Liang, Jianbing Shen, Luc van Gool, Wenguan Wang

Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions.

counterfactual Data Augmentation +3

Target-Driven Structured Transformer Planner for Vision-Language Navigation

1 code implementation19 Jul 2022 Yusheng Zhao, Jinyu Chen, Chen Gao, Wenguan Wang, Lirong Yang, Haibing Ren, Huaxia Xia, Si Liu

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.

Navigate Vision-Language Navigation

Towards Interpretable Video Super-Resolution via Alternating Optimization

1 code implementation21 Jul 2022 JieZhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc van Gool

These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences.

Deblurring Space-time Video Super-resolution +2

Semi-supervised 3D Object Detection with Proficient Teachers

1 code implementation26 Jul 2022 Junbo Yin, Jin Fang, Dingfu Zhou, Liangjun Zhang, Cheng-Zhong Xu, Jianbing Shen, Wenguan Wang

To reduce the dependence on large supervision, semi-supervised learning (SSL) based approaches have been proposed.

3D Object Detection Autonomous Driving +3

ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection

1 code implementation26 Jul 2022 Junbo Yin, Dingfu Zhou, Liangjun Zhang, Jin Fang, Cheng-Zhong Xu, Jianbing Shen, Wenguan Wang

Existing approaches for unsupervised point cloud pre-training are constrained to either scene-level or point/voxel-level instance discrimination.

3D Object Detection object-detection +2

Visual Recognition with Deep Nearest Centroids

1 code implementation15 Sep 2022 Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu

We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers.

Decision Making Image Classification +1

Learning Equivariant Segmentation with Instance-Unique Querying

1 code implementation3 Oct 2022 Wenguan Wang, James Liang, Dongfang Liu

Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings.

Instance Segmentation Semantic Segmentation

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models

2 code implementations5 Oct 2022 Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature, class).

Segmentation Semantic Segmentation

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing

no code implementations28 Oct 2022 Wenguan Wang, Yi Yang, Fei Wu

Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years.

Towards Versatile Embodied Navigation

1 code implementation30 Oct 2022 Hanqing Wang, Wei Liang, Luc van Gool, Wenguan Wang

With the emergence of varied visual navigation tasks (e. g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well.

Decision Making Vision-Language Navigation +1

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation CVPR 2023 Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

1 code implementation6 Apr 2023 Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang

To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.

Autonomous Navigation Navigate +1

Boosting Video Object Segmentation via Space-time Correspondence Learning

1 code implementation CVPR 2023 Yurong Zhang, Liulei Li, Wenguan Wang, Rong Xie, Li Song, Wenjun Zhang

Current top-leading solutions for video object segmentation (VOS) typically follow a matching-based regime: for each query frame, the segmentation mask is inferred according to its correspondence to previously processed and the first annotated frames.

Object Segmentation +3

CLUSTSEG: Clustering for Universal Segmentation

1 code implementation3 May 2023 James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang

We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i. e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.

Instance Segmentation Panoptic Segmentation +3

Segment and Track Anything

1 code implementation11 May 2023 Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

Large-Scale Person Detection and Localization using Overhead Fisheye Cameras

no code implementations ICCV 2023 Lu Yang, Liulei Li, Xueshi Xin, Yifan Sun, Qing Song, Wenguan Wang

Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras.

Human Detection

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

1 code implementation ICCV 2023 Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wenguan Wang, Siyuan Qi, Dongfang Liu

Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning.

Visual Prompt Tuning

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation ICCV 2023 Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Bird's-Eye-View Scene Graph for Vision-Language Navigation

no code implementations ICCV 2023 Rui Liu, Xiaohan Wang, Wenguan Wang, Yi Yang

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

Navigate Vision-Language Navigation

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

no code implementations ICCV 2023 Hanqing Wang, Wei Liang, Luc van Gool, Wenguan Wang

VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions.

Decision Making Navigate +1

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code implementations ICCV 2023 Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Decision Making Transfer Learning +1

Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation

no code implementations ICCV 2023 Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Recent advances in semi-supervised semantic segmentation have been heavily reliant on pseudo labeling to compensate for limited labeled data, disregarding the valuable relational knowledge among semantic concepts.

Segmentation Semi-Supervised Semantic Segmentation

ClusterFormer: Clustering As A Universal Visual Learner

1 code implementation22 Sep 2023 James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu

This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER.

Clustering Image Classification +7

LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning

no code implementations ICCV 2023 Liulei Li, Wenguan Wang, Yi Yang

Current high-performance semantic segmentation models are purely data-driven sub-symbolic approaches and blind to the structured nature of the visual world.

Segmentation Semantic Parsing +1

A Survey on 3D Gaussian Splatting

no code implementations8 Jan 2024 Guikun Chen, Wenguan Wang

The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain.

3D Reconstruction

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation16 Jan 2024 Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.

Scheduling

Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

no code implementations23 Jan 2024 Cheng Han, Qifan Wang, Yiming Cui, Wenguan Wang, Lifu Huang, Siyuan Qi, Dongfang Liu

As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning.

Transfer Learning Visual Prompt Tuning

Retrosynthesis prediction enhanced by in-silico reaction data augmentation

no code implementations31 Jan 2024 Xu Zhang, Yiming Mo, Wenguan Wang, Yi Yang

As a response, we exploit easy-to-access unpaired data (i. e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training.

Data Augmentation Retrosynthesis

Poly Kernel Inception Network for Remote Sensing Detection

1 code implementation10 Mar 2024 Xinhao Cai, Qiuxia Lai, Yuwei Wang, Wenguan Wang, Zeren Sun, Yazhou Yao

Object detection in remote sensing images (RSIs) often suffers from several increasing challenges, including the large variation in object scales and the diverse-ranging context.

Object object-detection +1

Volumetric Environment Representation for Vision-Language Navigation

no code implementations21 Mar 2024 Rui Liu, Wenguan Wang, Yi Yang

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

Multi-Task Learning Navigate +2

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

1 code implementation22 Mar 2024 Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.

Clustering Propagation for Universal Medical Image Segmentation

no code implementations25 Mar 2024 Yuhang Ding, Liulei Li, Wenguan Wang, Yi Yang

}$ This enables knowledge acquired from prior slices to assist in the segmentation of the current slice, further efficiently bridging the communication between remote slices using mere 2D networks.

Clustering Image Segmentation +4

Neural Clustering based Visual Representation Learning

no code implementations26 Mar 2024 Guikun Chen, Xia Li, Yi Yang, Wenguan Wang

In this work, we propose feature extraction with clustering (FEC), a conceptually elegant yet surprisingly ad-hoc interpretable neural clustering framework, which views feature extraction as a process of selecting representatives from data and thus automatically captures the underlying data distribution.

Clustering Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.