Search Results for author: Yifan Liu

Found 71 papers, 35 papers with code

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations ECCV 2020 Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

no code implementations17 Apr 2024 Kangning Zhang, Yingjie Qin, Ruilong Su, Yifan Liu, Jiarui Jin, Weinan Zhang, Yong Yu

After obtaining separate behavior and modal representations, we design a Behavior-Modal Alignment Module (BMA) to align and fuse the dual representations to solve the misalignment problem.

Multimodal Recommendation Representation Learning

An Aligning and Training Framework for Multimodal Recommendations

no code implementations19 Mar 2024 Yifan Liu, Kangning Zhang, Xiangyuan Ren, Yanhua Huang, Jiarui Jin, Yingjie Qin, Ruilong Su, Ruiwen Xu, Weinan Zhang

In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items.

Multimodal Recommendation

Endora: Video Generation Models as Endoscopy Simulators

no code implementations17 Mar 2024 Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan

In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation.

Data Augmentation Video Generation

Diffusion Model-Based Image Editing: A Survey

1 code implementation27 Feb 2024 Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, Liangliang Cao

In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field.

Denoising Image Inpainting +1

Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

1 code implementation8 Feb 2024 Feihu Jin, Yifan Liu, Ying Tan

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks and exhibited impressive reasoning abilities by applying zero-shot Chain-of-Thought (CoT) prompting.

Evolutionary Algorithms Sentence

EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction

1 code implementation23 Jan 2024 Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan

To adapt 3DGS for endoscopic scenes, we propose two strategies, Holistic Gaussian Initialization (HGI) and Spatio-temporal Gaussian Tracking (SGT), to handle the non-trivial Gaussian initialization and tissue deformation problems, respectively.

Depth Estimation

ICGNet: A Unified Approach for Instance-Centric Grasping

no code implementations18 Jan 2024 René Zurbrügg, Yifan Liu, Francis Engelmann, Suryansh Kumar, Marco Hutter, Vaishakh Patil, Fisher Yu

Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps.

Object Object Reconstruction +1

Inter-X: Towards Versatile Human-Human Interaction Analysis

no code implementations26 Dec 2023 Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang

We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects.

Bag of Tricks: Semi-Supervised Cross-domain Crater Detection with Poor Data Quality

no code implementations11 Dec 2023 Yifan Liu, Tiecheng Song

To obtain a better robust model with better cross-domain generalization in the presence of poor data quality, we propose the SCPQ model, in which we first propose a method for fusing shallow information using attention mechanism (FSIAM), which utilizes feature maps fused with deep convolved feature maps after fully extracting the global sensory field of shallow information via the attention mechanism module, which can fully fit the data to obtain a better sense of the domain in the presence of poor data, and thus better multiscale adaptability.

Domain Generalization Pseudo Label

HumanRecon: Neural Reconstruction of Dynamic Human Using Geometric Cues and Physical Priors

1 code implementation26 Nov 2023 Junhui Yin, Wei Yin, Hao Chen, Xuqian Ren, Zhanyu Ma, Jun Guo, Yifan Liu

These priors ensure the color rendered along rays to be robust to view direction and reduce the inherent ambiguities of density estimated along rays.

Novel View Synthesis

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

no code implementations21 Nov 2023 Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen

To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis.

Image Generation Text-to-Video Generation +1

Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation

no code implementations1 Oct 2023 Jiancheng Huang, Yifan Liu, Yi Huang, Shifeng Chen

To address the lack of labelled datasets for these seal-related tasks, we propose Seal2Real, a generative method that generates a large amount of labelled document seal data, and construct a Seal-DB dataset containing 20K images with labels.

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

no code implementations28 Sep 2023 Jiancheng Huang, Yifan Liu, Jin Qin, Shifeng Chen

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable.

Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement

no code implementations26 Sep 2023 Jiancheng Huang, Yifan Liu, Shifeng Chen

Learning-based methods have attracted a lot of research attention and led to significant improvements in low-light image enhancement.

Denoising Low-Light Image Enhancement

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

1 code implementation5 Sep 2023 Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkuan Wang, Siyuan Qi, Kangning Zhang, Weinan Zhang, Yong Yu

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers.

Code Generation Multiple-choice

Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

1 code implementation22 Aug 2023 Bingqing Zhang, Sen Wang, Yifan Liu, Brano Kusy, Xue Li, Jiajun Liu

The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process.

Object object-detection +1

BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset

1 code implementation22 Aug 2023 Biao Wu, Yutong Xie, Zeyu Zhang, Jinchao Ge, Kaspar Yaxley, Suzan Bahadir, Qi Wu, Yifan Liu, Minh-Son To

Intracranial hemorrhage (ICH) is a pathological condition characterized by bleeding inside the skull or brain, which can be attributed to various factors.

Image Segmentation Medical Image Segmentation +2

SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning

1 code implementation ICCV 2023 Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen

In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.

Open-World Instance Segmentation Segmentation +1

Category Feature Transformer for Semantic Segmentation

1 code implementation10 Aug 2023 Quan Tang, Chuanjian Liu, Fagui Liu, Yifan Liu, Jun Jiang, BoWen Zhang, Kai Han, Yunhe Wang

Aggregation of multi-stage features has been revealed to play a significant role in semantic segmentation.

Segmentation Semantic Segmentation

Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation

1 code implementation ICCV 2023 Quan Tang, BoWen Zhang, Jiajun Liu, Fagui Liu, Yifan Liu

Experiments suggest that the proposed DToP architecture reduces on average $20\% - 35\%$ of computational cost for current semantic segmentation methods based on plain vision transformers without accuracy degradation.

Image Classification Segmentation +1

BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation

1 code implementation13 Jun 2023 Liyang Liu, Zihan Wang, Minh Hieu Phan, BoWen Zhang, Jinchao Ge, Yifan Liu

Current knowledge distillation approaches in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally.

Knowledge Distillation Segmentation +1

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

1 code implementation9 Jun 2023 BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.

Continual Learning Continual Semantic Segmentation +2

Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

no code implementations4 Jun 2023 Jintao Rong, Hao Chen, Tianxiao Chen, Linlin Ou, Xinyi Yu, Yifan Liu

Prompt learning has become a popular approach for adapting large vision-language models, such as CLIP, to downstream tasks.

Classification Domain Generalization +3

Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning

1 code implementation4 May 2023 Sen Zhao, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen

Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation.

Attribute Decision Making +2

Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction

no code implementations9 Mar 2023 Caiyuan Chu, Ya Li, Yifan Liu, Jia-Chen Gu, Quan Liu, Yongxin Ge, Guoping Hu

The key to automatic intention induction is that, for any given set of new data, the sentence representation obtained by the model can be well distinguished from different labels.

Clustering Contrastive Learning +3

FedPD: Federated Open Set Recognition with Parameter Disentanglement

no code implementations ICCV 2023 Chen Yang, Meilu Zhu, Yifan Liu, Yixuan Yuan

To this end, we aim to study a novel problem of federated open-set recognition (FedOSR), which learns an open-set recognition (OSR) model under federated paradigm such that it classifies seen classes while at the same time detects unknown classes.

Disentanglement Federated Learning +1

3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection

1 code implementation ICCV 2023 Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection object-detection

3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers

1 code implementation27 Nov 2022 Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu

Although 3D measurements are not available at the inference time of monocular 3D object detection, 3DPPE uses predicted depth to approximate the real point positions.

Monocular 3D Object Detection Monocular Depth Estimation +1

Prior-enhanced Temporal Action Localization using Subject-aware Spatial Attention

no code implementations10 Nov 2022 Yifan Liu, YouBao Tang, Ning Zhang, Ruei-Sung Lin, Haoqian Wang

Temporal action localization (TAL) aims to detect the boundary and identify the class of each action instance in a long untrimmed video.

Optical Flow Estimation Temporal Action Localization

FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation

1 code implementation30 Aug 2022 Jianlong Yuan, Qian Qi, Fei Du, Zhibin Wang, Fan Wang, Yifan Liu

Inspired by the recent progress on semantic directions on feature-space, we propose to include augmentations in feature space for efficient distillation.

Knowledge Distillation Segmentation +1

Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image

1 code implementation28 Aug 2022 Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen

To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.

Depth Estimation Depth Prediction

Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation

1 code implementation24 Aug 2022 Jianlong Yuan, Jinchao Ge, Zhibin Wang, Yifan Liu

More specifically, we use the pseudo-labels generated by a mean teacher to supervise the student network to achieve a mutual knowledge distillation between the two branches.

Knowledge Distillation Pseudo Label +1

Improving Personality Consistency in Conversation by Persona Extending

1 code implementation23 Aug 2022 Yifan Liu, Wei Wei, Jiayi Liu, Xianling Mao, Rui Fang, Dangyang Chen

Endowing chatbots with a consistent personality plays a vital role for agents to deliver human-like interactions.

Chatbot Natural Language Inference +1

Semantic-guided Multi-Mask Image Harmonization

1 code implementation24 Jul 2022 Xuqian Ren, Yifan Liu

Experiments have been conducted on our constructed benchmarks to verify that our proposed operator mask-based framework can locate and modify the inharmonious regions in more complex scenes.

Image Harmonization Semantic Segmentation

Controllable Shadow Generation Using Pixel Height Maps

no code implementations12 Jul 2022 Yichen Sheng, Yifan Liu, Jianming Zhang, Wei Yin, A. Cengiz Oztireli, He Zhang, Zhe Lin, Eli Shechtman, Bedrich Benes

It can be used to calculate hard shadows in a 2D image based on the projective geometry, providing precise control of the shadows' direction and shape.

Ultra-sensitive Flexible Sponge-Sensor Array for Muscle Activities Detection and Human Limb Motion Recognition

no code implementations30 Apr 2022 Jiao Suo, Yifan Liu, Clio Cheng, Keer Wang, Meng Chen, Ho-Yin Chan, Roy Vellaisamy, Ning Xi, Vivian W. Q. Lou, Wen Jung Li

Human limb motion tracking and recognition plays an important role in medical rehabilitation training, lower limb assistance, prosthetics design for amputees, feedback control for assistive robots, etc.

Action Detection Activity Detection

The devil is in the labels: Semantic segmentation from sentences

no code implementations4 Feb 2022 Wei Yin, Yifan Liu, Chunhua Shen, Anton Van Den Hengel, Baichuan Sun

The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom.

Instance Segmentation Monocular Depth Estimation +2

Elevation Angle-Dependent 3D Trajectory Design for Aerial RIS-aided Communication

no code implementations23 Aug 2021 Yifan Liu, Bin Duo, Qingqing Wu, Xiaojun Yuan, Jun Li, Yonghui Li

This paper investigates an aerial reconfigurable intelligent surface (RIS)-aided communication system under the probabilistic line-of-sight (LoS) channel, where an unmanned aerial vehicle (UAV) equipped with an RIS is deployed to assist two ground nodes in their information exchange.

Scheduling

A Generative Adversarial Framework for Optimizing Image Matting and Harmonization Simultaneously

no code implementations13 Aug 2021 Xuqian Ren, Yifan Liu, Chunlei Song

Image matting, aiming to achieve foreground boundary details, and image harmonization, aiming to make the background compatible with the foreground, are both promising yet challenging tasks.

Image Harmonization Image Matting

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

1 code implementation NeurIPS 2021 BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.

Segmentation Semantic Segmentation +1

X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering

1 code implementation24 Jul 2021 Jingjing Jiang, Ziyi Liu, Yifan Liu, Zhixiong Nan, Nanning Zheng

In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to implicitly model the problem.

Attribute Out-of-Distribution Generalization +2

Full-Dimensional Rate Enhancement for UAV-Enabled Communications via Intelligent Omni-Surface

no code implementations5 Jun 2021 Yifan Liu, Bin Duo, Qingqing Wu, Xiaojun Yuan, Yonghui Li

This paper investigates the achievable rate maximization problem of a downlink unmanned aerial vehicle (UAV)-enabled communication system aided by an intelligent omni-surface (IOS).

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

1 code implementation ICCV 2021 Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li

Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.

Data Augmentation Image Classification +3

Generic Perceptual Loss for Modeling Structured Output Dependencies

no code implementations CVPR 2021 Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen

We hope that this simple, extended perceptual loss may serve as a generic structured-output loss that is applicable to most structured output learning tasks.

Depth Estimation Image Generation +4

A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN

no code implementations1 Mar 2021 He Zhang, Zhixiong Nan, Tao Yang, Yifan Liu, Nanning Zheng

In autonomous driving, perceiving the driving behaviors of surrounding agents is important for the ego-vehicle to make a reasonable decision.

Autonomous Driving

Channel-wise Knowledge Distillation for Dense Prediction

3 code implementations ICCV 2021 Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen

Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.

Knowledge Distillation Segmentation +1

Representative Graph Neural Network

no code implementations ECCV 2020 Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang

In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.

object-detection Object Detection +1

Crowd Counting via Hierarchical Scale Recalibration Network

no code implementations7 Mar 2020 Zhikang Zou, Yifan Liu, Shuangjie Xu, Wei Wei, Shiping Wen, Pan Zhou

Extensive experiments on crowd counting datasets (ShanghaiTech, MALL, WorldEXPO'10, and UCSD) show that our HSRNet can deliver superior results over all state-of-the-art approaches.

Crowd Counting

Efficient Semantic Video Segmentation with Per-frame Inference

1 code implementation ECCV 2020 Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.

Knowledge Distillation Optical Flow Estimation +4

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations3 Feb 2020 Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

Auxiliary Learning for Deep Multi-task Learning

no code implementations5 Sep 2019 Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin

The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.

Auxiliary Learning Depth Estimation +3

MobileFAN: Transferring Deep Hidden Representation for Face Alignment

no code implementations11 Aug 2019 Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong

To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder.

Face Alignment Facial Landmark Detection

Structured Knowledge Distillation for Semantic Segmentation

1 code implementation CVPR 2019 Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang

We further propose to distill the structured knowledge from cumbersome networks into compact networks, which is motivated by the fact that semantic segmentation is a structured prediction problem.

General Classification Image Classification +5

Structured Knowledge Distillation for Dense Prediction

1 code implementation CVPR 2019 Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen

Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem.

Depth Estimation General Classification +7

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

4 code implementations4 May 2017 Yifan Liu, Zengchang Qin, Zhenbo Luo, Hua Wang

Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a potential application in digital entertainment.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.