Search Results for author: Lu Sheng

Found 52 papers, 30 papers with code

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

no code implementations28 Mar 2024 Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments.

Motion Planning

Assessment of Multimodal Large Language Models in Alignment with Human Values

1 code implementation26 Mar 2024 Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh).

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

1 code implementation18 Mar 2024 Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways.

Instruction Following

Data-Free Generalized Zero-Shot Learning

no code implementations28 Jan 2024 Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.

Generalized Zero-Shot Learning Zero-shot Generalization

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

1 code implementation27 Dec 2023 Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

The point affinity proposed in this paper is characterized by features from multiple modalities (e. g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution.

3D Semantic Segmentation Point Cloud Segmentation +1

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

1 code implementation12 Dec 2023 Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways.

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

no code implementations11 Dec 2023 Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng

Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image.

SSIM

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

1 code implementation5 Nov 2023 Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).

Zero-shot Generalization

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

1 code implementation5 Nov 2023 Zhelun Shi, Zhipin Wang, Hongxing Fan, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models, so that ChEF can be a growing evaluation framework for the MLLM community.

Hallucination In-Context Learning +2

Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

1 code implementation4 Nov 2023 Hao Ai, Lu Sheng

Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting.

Image Generation

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

1 code implementation6 Sep 2023 Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.

Contrastive Learning Denoising +5

Distortion-aware Transformer in 360° Salient Object Detection

1 code implementation7 Aug 2023 Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.

ERP Object +3

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

1 code implementation NeurIPS 2023 Zhenfei Yin, Jiong Wang, JianJian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.

Siamese DETR

1 code implementation CVPR 2023 Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng

In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR.

MULTI-VIEW LEARNING Representation Learning

VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

1 code implementation CVPR 2023 Ziqin Wang, Bowen Cheng, Lichen Zhao, Dong Xu, Yang Tang, Lu Sheng

Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations.

 Ranked #1 on 3d scene graph generation on 3DSSG (using extra training data)

3d scene graph generation Relation

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

1 code implementation29 Jan 2023 Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

1 code implementation31 Aug 2022 ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu

In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.

Point Cloud Registration

SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

1 code implementation14 Aug 2022 Chenjian Gao, Qian Yu, Lu Sheng, Yi-Zhe Song, Dong Xu

Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape.

3D Reconstruction

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations16 Mar 2022 Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

no code implementations CVPR 2022 Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu

Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.

Attribute Dense Captioning +1

VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds

no code implementations17 Oct 2021 Guanze Liu, Yu Rong, Lu Sheng

3D human mesh recovery from point clouds is essential for various tasks, including AR/VR and human behavior understanding.

Human Mesh Recovery

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

1 code implementation CVPR 2021 Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, Dong Xu

Inspired by the back-tracing strategy in the conventional Hough voting methods, in this work, we introduce a new 3D object detection method, named as Back-tracing Representative Points Network (BRNet), which generatively back-traces the representative points from the vote centers and also revisits complementary seed points around these generated points, so as to better capture the fine local structural features surrounding the potential objects from the raw point clouds.

3D Object Detection Object +1

DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

2 code implementations18 Mar 2021 Buyu Li, Yongchi Zhao, Zhelun Shi, Lu Sheng

In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction, where the key poses are easier to be synchronized with the music beats and the parametric curves can be efficiently regressed to render fluent rhythm-aligned movements.

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

2 code implementations CVPR 2021 Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu

To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification.

Benchmarking Classification +2

StyleFormer: Real-Time Arbitrary Style Transfer via Parametric Style Composition

1 code implementation ICCV 2021 Xiaolei Wu, Zhihao Hu, Lu Sheng, Dong Xu

In this work, we propose a new feed-forward arbitrary style transfer method, referred to as StyleFormer, which can simultaneously fulfill fine-grained style diversity and semantic content coherency.

Style Transfer

3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

no code implementations ICCV 2021 Lichen Zhao, Daigang Cai, Lu Sheng, Dong Xu

Visual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world.

Object Object Proposal Generation +2

PV-NAS: Practical Neural Architecture Search for Video Recognition

no code implementations2 Nov 2020 ZiHao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao

Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability.

Neural Architecture Search Video Recognition

Adaptive Gradient Method with Resilience and Momentum

no code implementations21 Oct 2020 Jie Liu, Chen Lin, Chuming Li, Lu Sheng, Ming Sun, Junjie Yan, Wanli Ouyang

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).

Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues

2 code implementations ECCV 2020 Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, Jing Shao

As realistic facial manipulation technologies have achieved remarkable progress, social concerns about potential malicious abuse of these technologies bring out an emerging research topic of face forgery detection.

Unsupervised Domain Expansion from Multiple Sources

no code implementations26 May 2020 Jing Zhang, Wanqing Li, Lu Sheng, Chang Tang, Philip Ogunbona

Given an existing system learned from previous source domains, it is desirable to adapt the system to new domains without accessing and forgetting all the previous domains in some applications.

Domain Adaptation Unsupervised Domain Expansion

Powering One-shot Topological NAS with Stabilized Share-parameter Proxy

no code implementations ECCV 2020 Ronghao Guo, Chen Lin, Chuming Li, Keyu Tian, Ming Sun, Lu Sheng, Junjie Yan

Specifically, the difficulties for architecture searching in such a complex space has been eliminated by the proposed stabilized share-parameter proxy, which employs Stochastic Gradient Langevin Dynamics to enable fast shared parameter sampling, so as to achieve stabilized measurement of architecture performance even in search space with complex topological structures.

Neural Architecture Search

Morphing and Sampling Network for Dense Point Cloud Completion

2 code implementations30 Nov 2019 Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, Shi-Min Hu

3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.

Point Cloud Completion

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

no code implementations6 May 2019 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Context and Attribute Grounded Dense Captioning

no code implementations CVPR 2019 Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao

Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.

Attribute Dense Captioning

Video Generation from Single Semantic Label Map

2 code implementations CVPR 2019 Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.

Image Generation Image to Video Generation +1

Unsupervised Bi-directional Flow-based Video Generation from one Snapshot

no code implementations3 Mar 2019 Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy

Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.

Video Generation

Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

1 code implementation16 Sep 2018 Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan

Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs.

Classification General Classification +4

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition

no code implementations ECCV 2018 Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy

We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.

Object

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

3 code implementations CVPR 2018 Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang

Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.

Image Generation Image Reconstruction +1

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation CVPR 2018 Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Action Recognition In Videos Optical Flow Estimation +1

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

no code implementations CVPR 2017 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.